Selection Functions in Astronomical Data Modeling#

In this notebook, we provide an overview of what selection functions are, and we explain two different types of selection functions in this package that are meant to be layered.

In addition to this page, the interested readers are refered to Rix et al. 2021.

What are selection functions?#

Simply put, a selection function of a given subset gives probabilities that a source in the parent set gets included in the subset as a function of its properties such as sky coordinates, apparent magnitude, color, etc. Thus, it should always return a value between 0 and 1, and it is always concerned with a particular subset of interest defined by one or more conditions.

sf-diagram

For example, if the parent set is all that is out there and the subset under consideration is what makes it into the Gaia source table (gaia_source) of a particular data release, say, DR3, this selection function is what we will refer to as DR3 survey selection function. The selection probability of this selection function summarizes all of the complicated processes involved in data taking and processing.

When should you care about selection functions?#

In short, whenever you are interested in counting! Whenever we want to answer a question or constrain a model through data comparison, and when that model predicts densities, rates, or other incidences for objects with certain characteristics (Rix et al. 2021). If we are interested in how many of something are really there using an incomplete data, we want to know and account for how (in)complete the data is.

Two different types of selection functions: Survey and Subsample#

In order to make Gaia selection functions manageable, we divide them into two disjoint layers:

  • Survey Selection Function which is an estimate of the probability that a source is included in the Gaia’s data release (i.e., in that release’s gaia_source table) as a function of sky coordinates and Gaia G magnitudes.

  • Subsample Selection Function which is an estimate of the probability that a source already in gaia_source is included in the subsample of interest (e.g., sources with RVs). We define these selection functions as a function of sky coordinates, Gaia G, and Gaia G - Gaia G_RP color.

These two different types of selection functions are meant to be multiplied to yield the selection function for the subsample of Gaia that one is interested in. The survey selection function for a specific data release is the reusable component of the selection function regardless of what further cuts to the gaia_source table are made.

This is illustrated in the figure below in parallel to the “parent” and “sample” selection functions introduced in Rix et al. 2021.

survey-subsample

Survey Selection Functions#

For Gaia DR3 survey selection function, please refer to Empirical survey selection function for Gaia DR3. In the current version, the survey selection function is complete (\(P=1\)) at \(G<18\). See below for how the DR3 survey selection function changes with Gaia \(G\) magnitude from 18 to 22.

Animated DR3 Survey Selection Function on HEALPix order 5

Subsample Selection Functions#

For subsample selection functions, we are considering any further cuts that one might make, defined in terms of Gaia variables. Thus, generally, it is not possible for us to come up with a comprehensive set, and in the future, we would like to provide functionalities in this package for researchers to construct their own Gaia subsample selection functions as needed.

In the current version, we have a simple estimate of one particular subsample selection function, the RVS selection function for DR3. We detail the discussion of this in Constructing selection functions: DR3 RVS as an example.