Description of All YAML Options =============================== General Rules ------------- * Any key that is specific for a plot type will begin with one of the following descriptors: * ts for Timeseries * ty for Taylor * When a key is optional it will be followed by #Opt * All plots use data over the entire analysis window from the "start_time" to the "end_time" specified in the "analysis" section. * timeseries - average over window provided by "ts_avg_window" * taylor - calculated over entire analysis window * spatial_bias - average over entire analysis window unless "percentile" provided * spatial_overlay - average over entire analysis window * spatial_overlay_exceedance - number of exceedances within the analysis window * boxplot - calculated over entire analysis window * If "set_axis" = True in "data_proc" section of each "plot_grp", the y-axis for that "plot_grp" will be set based on the values specified in the "obs" section for each "variable". If "set_axis" = False, then the automatic scaling in Matplotlib will be used. "vmin_plot" and "vmax_plot" are needed for "timeseries", "spatial_overlay", and "boxplot". "vdiff_plot" is needed for "spatial_bias" plots and "ty_scale" is needed for "taylor" plots. "nlevels" or the number of levels used in the contour plot can also optionally be provided for "spatial_overlay" plot. If "set_axis" = True and the proper limits are not provided in the "obs" section, a warning will print, and the plot will be created using the automatic scaling in Matplotlib. Analysis -------- All input related to the analysis class. **start_time:** The start time in UTC of the analysis window. (e.g., "2019-08-02-12:00:00") **end_time:** The end time in UTC of the analysis window. (e.g., "2019-08-03-12:00:00") **output_dir**: This is the directory where the plots are saved. Shell variables prefixed with the ``$`` symbol, such as ``$HOME``, will be expanded. **output_dir_save**: This is an optional argument. This is the directory where the files from the 'save' argument below are saved. If this argument is not specified, output_dir_save defaults to output_dir. Shell variables prefixed with the ``$`` symbol, such as ``$HOME``, will be expanded. **output_dir_read**: This is an optional argument. This is the directory where the files from the 'read' argument below are read from. If this argument is not specified, output_dir_read defaults to output_dir. To read files directly from the path provided in 'read', set ``output_dir_read: null``. Shell variables prefixed with the ``$`` symbol, such as ``$HOME``, will be expanded. **debug:** This is an option to print out plots and more options for trouble shooting. If you want plots to print in jupyter notebooks select this to True. Set this to False, when you are submitting MELODIES MONET as a job to an HPC machine to avoid display errors. **save:** This is an optional argument. This option allows for saving attributes of the analysis class (paired, models, obs) to a file, using the analysis.save_analysis() method. Read the information for output_dir_save for information regarding the directory files are saved to. * **method:** The file format to save to. Options are 'netcdf' and 'pkl'. * **prefix:** This option should be used with method: 'netcdf'. When saving to netcdf format, a new file is made for each group (for example each model/obs pair is a new file). The prefix option adds a prefix to the filename in the format [prefix]_[group].nc4. * **output_name:** This option should be used with method: 'pkl'. Unlike with netcdf saving, pickle saving saves all groups to a single file. This option directly sets the filename that will be used for saving. * **data:** This option only works when saving with 'netcdf'. Setting data: 'all' will save all groups to netCDF files. If a subset of the groups is desired, this can be set to an iterable in the form ['group1','group2',...]. **read:** This is an optional argument. This option allows for read attributes of the analysis class (paired, models, obs) from a previously saved file, using the analysis.read_analysis() method. Read the information for output_dir_read for information regarding the directory files are read from. * **method:** The file format to read from. Options are 'netcdf' and 'pkl'. * **filenames:** The filename(s) that should be read in. For method: 'netcdf' this must be set as a dict in the form filenames: {'group1':str or iterable of filename(s) in group1, group2: str or iterable of filename(s) in group2,...}. For method: 'pkl' this must be set as either a string with the filename or as an or iterable of filenames. Wildcards will be expanded to any matching files. **add_logo:** This is an optional argument. Set this to ``false`` to forgo adding the MELODIES MONET logo to the plots. **enable_dask_progress_bars:** This is an optional argument. Set this to ``true`` to enable Dask progress bars for a `Dask local task scheduler `__ (i.e. not ``dask.distributed``). By default, this is disabled to keep logs cleaner (e.g. using :doc:`the CLI ` to run a control file in a batch job). However, you may wish to enable this for interactive use in a Jupyter notebook or other interactive Python session, as it gives you some visual indication of the progress of multi-file data loading and some parts of the processing. **pairing_kwargs:** This is an optional argument. This dictionary allows for specifying keyword arguments for pairing methods. First level should be the observation type (e.g. "sat_grid_clm", "sat_swath_clm"). Then under the observation type label provide the specific pairing options for your application. * **apply_ak:** This is an optional argument used for pairing of satellite data. When no pairing keyword arguments are specified it will default to True. This should be set to True when application of satellite averaging kernels or apriori data to model observations is desired. * **mod_to_overpass:** This is an optional argument used for pairing of satellite data. When set to True the model data will be pre-processed to the published local overpass time for the satellite. As of now, local overpass times are hard-wired. Models ------ All input for each instance of the model class. First level should be the model label. Then under each model label provide the following: **files:** The file directory location and name(s). Hotkeys are allowed. Shell variables prefixed with the ``$`` symbol, such as ``$HOME``, will be expanded. **files_vert:** This is for CMAQ only. If you want to calculate vertical info, please provide location of ``*.metcro3d.ncf`` files here. Shell variables prefixed with the ``$`` symbol, such as ``$HOME``, will be expanded. **files_surf:** This is for CMAQ only. If you want to calculate vertical info, please provide location of ``*.metcro2d.ncf`` files here. Shell variables prefixed with the ``$`` symbol, such as ``$HOME``, will be expanded. **mod_type:** The model type. Options are: "cmaq", "wrfchem", "ufs" ("rrfs" is deprecated), "gsdchem", "cesm_fv", "cesm_se", and "raqms". If you specify another name, MELODIES MONET will try to read in the data using xarray.open_mfdataset and xarray.open_dataset(). **mod_kwargs**: This is an optional dictionary to include information to provide to the model dataset reader scripts in MONETIO (``monetio/models/*_mm.py``). For example, you can provide mechanism information (e.g., mech: 'cb6r3_ae6_aq') or for some models, in order to reduce processing time, you can only pull in the surface data (e.g., surf_only: True). **radius_of_influence:** The "radius of influence" used for pairing in MONET. Typically this is set at the horizontal resolution of your model * 1.5. Setting this to a smaller value will speed up the pairing process. **apply_ak:** Removed. Instead, specify ``pairing_kwargs`` in the analysis section. **is_global:** Optional boolean argument to specify if the model dataset is global or regional. Used in some satellite pairing methods to indicate if a longitude wrap should be applied. Defaults to False when unspecified and xesmf-based satellite pairing methods will assume the model dataset is regional. **mapping:** This is the mapping dictionary for all variables to be plotted. For each observational dataset, add a mapping dictionary where the model variable name is first (i.e., key) and the observation variable name is second (i.e., value). Because the plots in MELODIES MONET will plot multiple models with one observation, the observation variables listed in the mapping dictionary must be consistent across all models. For example, if you want to plot the results of multiple model datasets against the AirNow observations for "OZONE" and "PM2.5", you must provide the model variable names for "OZONE" and "PM2.5" in the mapping dictionary for all models. Say if you only provide the model variable names for "OZONE" for one of the models, MELODIES MONET will error. Be careful that if variable names like NO are a command in python to add 'NO' to indicate that it should be interpreted as a string. For example, :: mapping: airnow: CO: 'CO' NO2: 'NO2' 'NO': 'NO' PM25_TOT: 'PM2.5' O3: 'OZONE' **projection:** In order to use the default projection for each model as defined in the map_projection function in melodies_monet/plots/surfplots.py either remove the projection setting or set to `~` or `null`. If the model does not have a default projection defined, ``ccrs.PlateCarree()`` will be used. If you would like to override the default projection for a model, you have three options: 1) Specify one of the model preset options (e.g., to use the default RAQMS projection for another model write ``projection: 'model:raqms'``). Note: For certain models, central longitude and/or central latitude are required, so check the map_projection function in melodies_monet/plots/surfplots.py and confirm the correct attributes are applied for your given model dataset. 2) Add a proj4 string or dictionary for ``cartopy.crs.Projection``. Note: If a proj4 string or dictionary is used, it must completely define an instance of ``cartopy.crs.Projection``. For example, converting ``ccrs.PlateCarree()`` to a proj4 dict results in ``{'proj': 'eqc', 'lat_ts': 0, 'lat_0': 0, 'lon_0': 0, 'x_0': 0, 'y_0': 0, 'ellps': 'WGS84', 'to_meter': 111319.490793274, 'no_defs': None, 'type': 'crs'}``, but this is not able to completely define an instance of ``cartopy.crs.Projection`` due to the ``.boundary`` attribute not yet being implemented when defining ``cartopy.crs.Projection`` from a proj4 string or dict. A string such as ``'EPSG:4326'`` will work (e.g., ``projection: 'EPSG:4326'``). 3) Add a string with a ``cartopy.crs`` command to be evaluated when defining the projection used. This string must start with 'ccrs.'. For example, ``projection: 'ccrs.PlateCarree()'``. **plot_kwargs:** This is optional. If you do not provide this, MELODIES MONET will use a default list of colors. Add a dictionary of plotting characteristics to be read in by Matplotlib. For example, :: plot_kwargs: #Opt color: 'magenta' marker: 'o' linestyle: '--' Copy that above and update the model label for all the models you would like to include in the analysis. Observations ------------ All input for each instance of the observation class. First level should be the observation label. Then under each observation label provide the following: **use_airnow:** If the observations are AirNow set to True, else set to False. Generalizing this to include other surface observations is under development. **filename:** The file directory location and name. These observations need to be preprocessed prior to incorporating them into MELODIES MONET. Shell variables prefixed with the ``$`` symbol, such as ``$HOME``, will be expanded. See :doc:`../getting_started/downloading_obs` for more details. **obs_type:** The observation type. Options are: "pt_sfc" or point surface. Adding options for Aircraft and Satellite observations are under development. **sat_type:** The satellite observation type. Options include: "mopitt_l3", "omps_l3", "omps_nm", "modis_l2", "tropomi_l2_no2", "tempo_l2_no2" and "tempo_l2_hcho". Additional options are under development. **data_proc:** This section stores all of the data processing information. * **filter_dict:** This is a dictionary used to filter the observation data prior to pairing. The keys of the dictionary should be columns of of the paired dataset which will be used in filtering. If there are multiple keys, this will loop over all of them. The value of the dict should be another dict with keys 'value' and 'oper'. 'value' can be a single value or list of values used when filtering the data. 'oper' is the operation used when comparing the dataset values. Examples of operations are ==, !=, >, >=, etc. Additionally, when comparing to a list, "oper" can be set to "isin" or "isnotin" to filter by values in the list or not in the list, respectively. Example: {'state_name':{'oper':'isin','value':['CO']}, 'WS':{'oper':'<','value':1}} **variables:** This is all optional. For each observational variable you can include the following information to handle unit conversions, min/max values, NaNs, and add optional plotting information. The obs_min, obs_max, and nan_values are set to NaN first and then the unit conversion is applied. * **unit_scale:** The value for unit conversion. * **unit_scale_method:** The method for unit conversion. Options are: Multiply = '*' , Add = '+', subtract = '-', divide = '/'. * **obs_min:** Set all values less than this value to NaN * **obs_max:** Set all values greater than this value to NaN * **nan_value:** -1.0 # Set this value to NaN * **ylabel_plot:** String to use as ylabel in plot. Useful for adding units or instrument information. * **ty_scale:** Scaling to be used in Taylor plots. * **vmin_plot:** Minimum for y-axis during plotting. To apply to a plot, change set_axis = True in plot_group. * **vmax_plot:** Maximum for y-axis during plotting. To apply to a plot, change set_axis = True in plot_group. * **vdiff_plot:** The range (+/-) to use in bias plots. To apply to a plot, change set_axis = True in plot_group. * **nlevels_plot:** The number of levels used in colorbar for contourf plot. To apply to a plot, change set_axis = True in plot_group. * **percentile_opt:** If not specified, defaults to average. If specified, value (in %) is used to calculate the percentile (e.g., 5, 50, 95). Currently only used for "spatial_bias" plots. Will work with data as is and regulatory metrics. * **regulatory:** If false (default), use data as is. If set to true, the regulatory metric is calculated as explained under :doc:`/users_guide/supported_diagnostics`. Only works for "OZONE" and "PM2.5" variables. * **ylabel_reg_plot:** String to use as ylabel in plot for regulatory calculation. Useful for adding units or instrument information. Only used if regulatory = True. * **vmin_reg_plot:** Minimum for y-axis during plotting for regulatory calculation. To apply to a plot, change set_axis = True in plot_group. Only used if regulatory = True. * **vmax_reg_plot:** Maximum for y-axis during plotting for regulatory calculation. To apply to a plot, change set_axis = True in plot_group. Only used if regulatory = True. * **vdiff_reg_plot:** The range (+/-) to use in bias plots for regulatory calculation. To apply to a plot, change set_axis = True in plot_group. Only used if regulatory = True. For example, :: PM2.5: unit_scale: 1 unit_scale_method: '*' obs_min: 0 obs_max: 100 nan_value: -1.0 ylabel_plot: 'PM2.5 (ug/m3)' ty_scale: 2.0 vmin_plot: 0.0 vmax_plot: 22.0 vdiff_plot: 15.0 nlevels_plot: 23 regulatory: True ylabel_reg_plot: 'PM2.5_24hr (ug/m3)' vmin_reg_plot: 0.0 #Opt vmax_reg_plot: 22.0 #Opt vdiff_reg_plot: 5.0 #Opt percentile_opt: 50 Copy that above and update the observation label for all the observations you would like to include in the analysis. Note that all models are paired with all observations. At this point MELODIES MONET does not pair observations with each other. Remember all of the possibilities above are optional, so feel free to only select the options you need to create your desired plot. Plots ----- All input for each plotting group. A plotting group consists of one plotting type. The plotting types are described in :doc:`/users_guide/supported_plots`. All model / observational pairs and domains specified for the plotting group will be included. You may include as many plotting groups as you like. For each plotting group, update the label and include the following information. Note: the labels need to be unique, but otherwise are not used. **type:** The plot type. Options are: "timeseries", "taylor", "spatial_bias", "spatial_overlay", "spatial_bias_exceedance", "boxplot", "multi-boxplot","csi" Note: "spatial_bias_exceedance" plots only work when regulatory = True. **fig_kwargs:** This is optional to provide a dictionary with figure characteristics to be read in by Matplotlib. For example, :: fig_kwargs: figsize: [14,6] **default_plot_kwargs:** This is optional to provide a dictionary with plotting characteristics to be read in by Matplotlib. Note that the "plot_kwargs" in the "model" section will overwrite these. This is a good method to set the line width and marker size for the plot. For example, :: default_plot_kwargs: linewidth: 2.0 markersize: 2. **text_kwargs:** This is optional to provide a dictionary with text characteristics to be read in by Matplotlib. For example, :: text_kwargs: fontsize: 18. **domain_type:** List of domain types to be plotted. These correspond with the columns in the observation file. (e.g., airnow: epa_region, state_name, siteid, etc.). For automatic EPA or Giorgi region boxes (if they are not included with the columns in the observation file), choose ``auto-region:epa`` or ``auto-region:giorgi``. Take into account that ``auto-region:epa`` is only a rough approximation, since it assumes perfect, rectangular lonlat boxes. If you only need a rectangular, lonlat box which does not cross the antimeridian, you can use ``custom:box``, which needs to be combined with the ``domain_info`` parameter and a box of ``bounds: [minlon, maxlon, minlat, maxlat]``. See :doc:`/users_guide/region_selection` for examples. If you have ``regionmask`` installed, you can also use it for advanced region support. These regions can be arbitrary, and its use require providing ``domain_type`` parameters starting with ``custom:``. There are three ways to use ``regionmask``. ``custom:polygon`` lets the user define their own polygon in the section ``domain_info``, using the keyword ``mask_info``. ``custom:defined-region`` lets the user utilize any region predefined by `regionmask `__, defined in ``domain_info`` using the keywords ``name_regiontype`` and ``region``. The third option is using the keyword `custom:file`, which is defined in ``domain_info`` with either ``mask_path:path_shapefile_or_geojson`` or ``mask_url:url_of_shapefile_or_geojson``, ``abbrevs``, ``name`` and ``region_name``. See :doc:`/users_guide/region_selection` for examples and a more detailed explanation. **domain_name:** List of domain names to be plotted. If domain_type = all, all data will be used and the domain_name is used only in the plot title. If domain_type is not equal to all, MELODIES MONET will query all of the data where domain_type is equal to domain_name. **region_name:** list of source of regions used in title. (e.g., ['epa_region']) **region_list:** list of regions we will calculate for scorecard. (e.g., ['R1','R2','R3','R4','R5','R6','R7','R8','R9','R10'] **urban_rural_name:** list of only one string input, which is variable used to determine whether urban or rural site. (e.g., ['msa_name']) **urban_rural_differentiate_value:** string of value used to determine whether variable is rural or urban. (e.g., ''). **better_or_worse_method:** string of method used to determine which models is better compared to observations. (e.g., 'RMSE', 'IOA' ,' NMB', 'NME'). choose one only for each time scorecard code run. **model_name_list:** for multi-box plot, list of observation and model names user choose to set as x-labels; for csi plot, list of model names (only) user choose to set as labels. **threshold_list:** csi plot only. list of values used as x variables. example: [10,20,30,40,50,60,70,80,90,100] **score_name:** csi plot only. list of scores user can choose to plot. examples are "Critical Success Index' 'False Alarm Rate' 'Hit Rate'. **threshold_tick_style:** csi plot only. (optional) control for spacing of threshold (x-axis) ticks. example: use ``nonlinear`` when nonlinear xticks including all thresholds are desired. Any other selection (default = None) will choose xticks that are equally spaced between min(threshold_list):max(threshold_list) and likely won't include all thresholds. **altitude_variable:** For "vert_profile" plot only. Name of altitude variable in observational dataset (e.g., altitude) **vertprofile_bins:** For "vert_profile" plot only. List of vertical bins, on which to analyze the data. Units should be identical to the units of the altitude_variable specified above (e.g., [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000] if altitude is in meters) **color_map:** For 'scatter_density' plot only. Specify a default colormap in Matplotlib (e.g., 'RdBu_r'). To use a custom colormap, leave color_map blank and provide the following options: * **colors:** Specify a list of colors (e.g., ['royalblue', 'cyan', 'yellow', 'orange']). * **over:** Specify the color for data greater than the color bar (e.g., 'red') * **under:** Specify the color for data less than the color bar (e.g., 'blue') **fill:** For 'scatter_density' plot only. Set to True if you want to to create a KDE plot (i.e., fill the area under the density curve) and set to False to create a scatter plot. **shade_lowest:** For 'scatter_density' plot and if fill is set to True above (i.e., for KDE plots only.) Set to True if you want to shade the lowest contour. **vcenter:** For 'scatter_density' plot only. Optionally, provide the central value for TwoSlopeNorm. **color_map_custom:** For 'curtain' plot only. Set to True, to use a custom Matplotlib colormap and specify “colors” and "color_levels". To use a standard Matplotlib colormap, set to False and specify a “color_map”. **colors:** For 'curtain' plot only. Used only if "color_map_custom" is set to True. Provide a list of colors (e.g., ["#ff8cff", "#dd6ff2", "#bb52e5", "#9935d8", "#7718cb", "#0000bb", "#002ccc", "#0058dd", "#0084ee", "#00afff", "#00ebff", "#27ffd7", "#63ff9b", "#a3ff5b", "#d3ff2b", "#ffff00", "#ffcf00", "#ff9f00", "#ff6f00", "#ff3f00", "#ff0000", "#d8000f", "#b2001f", "#8c002f", "#66003f", "#343434", "#606060", "#8c8c8c", "#b8b8b8", "#e4e4e4"]). **color_levels:** For 'curtain' plot only. Used only if "color_map_custom" is set to True. Define the number of distinct colors in the color bar (e.g., 30). **color_map:** For 'curtain' plot only. Used only if "color_map_custom" is set to False (e.g., 'Spectral_r'). **vmin:** For 'curtain' plot only. Set y-axis minimum limit (e.g., 50). **vmax:** For 'curtain' plot only. Set y-axis maximum limit (e.g., 1200). **num_levels:** For 'curtain' plot only. Set the number of vertical levels for interpolation (e.g., 100). **interval:** For 'curtain' plot only. Set the y-axis tick interval (e.g., 100). **pressure_units:** For 'curtain' plot only. Units to use only for the y-axis label. No conversions occur. Current options are only 'hPa' and 'Pa'. **data:** This a list of model / observation pairs to be plotted where the observation label is first and the model label is second (e.g., ['airnow_cmaq_expt', 'airnow_rrfs_13km', 'airnow_wrfchem_v4.2']) **data_proc:** This section stores all of the data processing information. * **filter_dict:** This is a dictionary used to filter the paired data sent to the plotting routine. The keys of the dictionary should be columns of of the paired dataset which will be used in filtering. If there are multiple keys, this will loop over all of them. The value of the dict should be another dict with keys 'value' and 'oper'. 'value' can be a single value or list of values used when filtering the data. 'oper' is the operation used when comparing the dataset values. Examples of operations are ==, !=, >, >=, etc. Additionally, when comparing to a list, "oper" can be set to "isin" or "isnotin" to filter by values in the list or not in the list, respectively. This cannot be specified if 'filter_string' is specified. Example: {'state_name':{'oper':'isin','value':['CO']}, 'WS':{'oper':'<','value':1}} * **filter_string:** This is a string used to filter the paired data sent to the plotting routine. The result is the same as using filter_dict. This uses the pandas query method on the paired dataset. This cannot be specified if 'filter_dict' is specified. This option is only available for surface and aircraft observations. For satellite observations, use the 'filter_dict' option instead. Example: state_name in ['CO'] and WS < 1 * **rem_obs_by_nan_pct:** Specify as dictionary with keys 'group_var', 'pct_cutoff' and 'times'. If specified, removes all instances of 'group_var' where there are > 'pct_cutoff' % NaN values. For example, with airnow sites, setting 'group_var' to 'siteid' will remove all sites with > pct_cutoff NaN values. Setting 'times' to 'hourly' will only look at values at the beginning of each hour. Set 'times' to '' if all times should be used. This calculation occurs over the entire analysis window and prior to calculating the regulatory metrics. * **rem_obs_nan:** If True, remove all points where model or obs variable is NaN. If False, remove only points where model variable is NaN. * **set_axis:** If = True, use the axis constraints described in the observation class (e.g., ty_scale, vmin_plot, vmax_plot, vdiff_plot, nlevels_plot). If = False, use automatic scaling in matplotlib. * **ts_select_time:** This is for timeseries plots only. This is the time used for averaging and plotting. Options are 'time' for UTC or 'time_local' for local time * **ts_avg_window:** This is for timeseries plots only. This is the averaging window applied to the data. No averaging done if not provided in the yaml file (i.e., ts_avg_window is optional). Averaging is done if a pandas resample rule (e.g., 'H' is hourly, 'D' is daily) is specified. * **altitude_yax2:** This is for "timeseries" plots for aircraft evaluation only. This allows users to add a secondary right y-axis to the timeseries plot. Additional YAML options are described below: * **altitude_variable:** Name of altitude variable in observational dataset (e.g., altitude) * **altitude_ticks:** Option to set tick increments (e.g., 1000) * **ylabel2:** Option to set secondary right y-axis label (e.g., Altitude (m)) * **plot_kwargs_y2:** Optional kwargs for secondary right y-axis (e.g., color: g) * **filter_dict:** Optionally provide the following options to specify the min and max values of altitude for the secondary right y-axis * **altitude:** * **oper:** Set to "between". * **value:** Set to a list of the vim and vmax (e., [2000,8000]). * **interquartile_style:** For "vert_profile" plot only. Specify 'shading' to plot shaded curves of the 25th and 75th percentile range of each vertical bin or 'box' to plot box-plots of each vertical bin. Stats ----- All input needed to calculate the statistics. The supported statistics available in MELODIES MONET are described in :doc:`/users_guide/supported_stats`. All model / observational pairs and domains specified will be included. You may include as many statistics as you like. Note however that the calculation of the statistics is relatively slow right now. Optimizing this code is under development. The statistics require positive numbers, so if you want to calculate temperature use Kelvin. Wind direction has special calculations for AirNow if the observation name is 'WD'. **stat_list:** List of acronyms of statistics to calculate as defined in :doc:`/users_guide/supported_stats`. (e.g., ['MB', 'MdnB', 'NMB', 'NMdnB','R2', 'RMSE']). A dictionary of definitions is also included in MELODIES-MONET/melodies_monet/stats/proc_stats.py. **round_output:** This is optional. This is the integer provided to Pandas round function defining the number of decimal places to which to round each value. Defaults to 3 (i.e., rounds to 3rd decimal place). **output_table:** This is optional. The statistics will always output a table in .csv format. If True, a matplotlib table figure is also output. **output_table_kwargs:** This is optional. This is a dictionary defining all of the characteristics of the matplotlib table figure. This is completely customizable because optimal sizes will depend on the number of pairs and statistics included. For example, :: output_table_kwargs: figsize: [7, 3] fontsize: 12. xscale: 1.4 yscale: 1.4 edges: 'horizontal' **domain_type:** List of domain types to be plotted. These correspond with the columns in the observation file. (e.g., airnow: epa_region, state_name, siteid, etc.). **domain_name:** List of domain names to be plotted. If domain_type = all, all data will be used and the domain_name is used only in the plot title. If domain_type is not equal to all, MELODIES MONET will query all of the data where domain_type is equal to domain_name. **data:** This a list of model / observation pairs to be plotted where the observation label is first and the model label is second (e.g., ['airnow_cmaq_expt', 'airnow_rrfs_13km', 'airnow_wrfchem_v4.2']) **data_proc:** This section stores all of the data processing information. * **filter_dict:** This is a dictionary used to filter the paired data sent to the stats routine. The keys of the dictionary should be columns of of the paired dataset which will be used in filtering. If there are multiple keys, this will loop over all of them. The value of the dict should be another dict with keys 'value' and 'oper'. 'value' can be a single value or list of values used when filtering the data. 'oper' is the operation used when comparing the dataset values. Examples of operations are ==, !=, >, >=, etc. Additionally, when comparing to a list, "oper" can be set to "isin" or "isnotin" to filter by values in the list or not in the list, respectively. This cannot be specified if 'filter_string' is specified. Example: {'state_name':{'oper':'isin','value':['CO']}, 'WS':{'oper':'<','value':1}} * **filter_string:** This is a string used to filter the paired data sent to the statistics routine. The result is the same as using filter_dict. This uses the pandas query method on the paired dataset. This cannot be specified if 'filter_dict' is specified. This option is only available for surface and aircraft observations. For satellite observations, use the 'filter_dict' option instead. Example: state_name in ['CO'] and WS < 1 * **rem_obs_by_nan_pct:** Specify as dictionary with keys 'group_var', 'pct_cutoff' and 'times'. If specified, removes all instances of 'group_var' where there are > 'pct_cutoff' % NaN values. For example, with airnow sites, setting 'group_var' to 'siteid' will remove all sites with > pct_cutoff NaN values. Setting 'times' to 'hourly' will only look at values at the beginning of each hour. Set 'times' to '' if all times should be used. This calculation occurs over the entire analysis window and prior to calculating the regulatory metrics.