{ "cells": [ { "cell_type": "markdown", "id": "f9507824", "metadata": {}, "source": [ "# Reading Paired Data" ] }, { "cell_type": "markdown", "id": "8a4d5a34-f302-4bd7-a376-500b6266945e", "metadata": {}, "source": [ "## Limitations for Reading Paired Data\n", "\n", "Saving paired datasets and reading them back in to create plots and statistics is useful and efficient in MELODIES MONET because pairing the data is the most time-consuming step and typically users like to make several modifications to optimize their plotting and statistical analysis. Currently, these saved paired files have some limitations, so please read this section closely. In MELODIES MONET, prior to pairing, the model data are read in and adjusted based on the YAML file options for each model and then the observation data are read in and adjusted based on the YAML file options for each observation. Then based on the mapping provided in the model portion of the YAML file, the data is paired and saved out to a file. When you read this file back into MELODIES MONET, the code can read in the model and observation data again, so that users can continue to make plots that do not only use the paired data like the spatial overlay plots, but the pairing step is skipped to save time. **Therefore, if you adjust anything in the model or obs parts of the YAML file like unit conversions, different number of mapped species, NaN values, etc., these updates will not be adjusted in your paired file. Thus, please make sure to use the exact same model and obs options in your YAML file for plotting as you used for pairing the data except for those specific to plotting as explained below.** If you plan to make plots separately for different mapped species, just create several paired files for each mapped species, so that you have extra flexibility.\n", "\n", "**You can change the following:**\n", "\n", "* The analysis start and end time to update the time window over which to create the plots and calculate statistics.\n", "* You can combine multiple paired files across several times periods or flights.\n", "* You can also combine multiple models paired on the same observations to evaluate across different models or sensitivity tests using the same model.\n", "* You may update anything in the plots and stats section of the YAML input file.\n", "* There are two exceptions unique to plotting in the model and obs section of the YAML file that you are able to adjust. In the model section, you can adjust the plot_kwargs section to control line/marker colors and styles and in the obs section, you can adjust the plotting descriptions for each observation variable like the y-axis labels, min/max values, and other plotting specific variables such as ty_scale.\n", "\n", "We are working to make the pairing more generalizable in version 2, so more flexibility will be available in later versions. We are also planning to add a feature where you can drop species listed in the mapping table, so that you have more options when reading in this paired file to optimize your analysis plots and statistics." ] }, { "cell_type": "markdown", "id": "c7bb29f0-7d37-4b5a-9851-8d7f1392e951", "metadata": {}, "source": [ "## First let's just import the driver." ] }, { "cell_type": "code", "execution_count": 1, "id": "3d43faf7", "metadata": {}, "outputs": [], "source": [ "from melodies_monet import driver" ] }, { "cell_type": "markdown", "id": "24cc3a8c-dafa-4dd7-9328-04a0f8b0aada", "metadata": {}, "source": [ "## Read model and observations" ] }, { "cell_type": "code", "execution_count": 2, "id": "65671ca7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wrfchem\n", "example:wrfchem:racm_esrl\n", "**** Reading WRF-Chem model output...\n", "wrfchem\n", "example:wrfchem:racm_esrl_vcp\n", "**** Reading WRF-Chem model output...\n" ] } ], "source": [ "an = driver.analysis()\n", "an.control = \"control_wrfchem_saveandread.yaml\"\n", "an.read_control()\n", "an.control_dict\n", "\n", "an.open_models()\n", "an.open_obs()" ] }, { "cell_type": "markdown", "id": "ddc902a4-7885-4c3e-b820-7096d00dddc0", "metadata": {}, "source": [ "## Read saved data using control file\n", "\n", "The driver will read the data based on the information included in the control file by calling {func}`an.read_analysis()`.\n", "\n", "In the control file analysis section, setting method to `'netcdf'` for a given attribute of the analysis class (e.g., paired, models, obs) will read NetCDF-4 files and set the appropriate attribute with the data. Filenames must be specified as a dict, with the keys being the pair name and the values being either a string with the filename to be read, or an iterable with multiple filenames to be read. If multiple files (such as several different days) are specified they will be joined by coordinates with [xarray's merge function](https://docs.xarray.dev/en/stable/generated/xarray.merge.html).\n", "\n", "In the control file analysis section, setting method to `'pkl'` for a given attribute of the analysis class (e.g., paired, models, obs) will read .pkl files and set the appropriate attribute with the data. Filenames must be specified as either a string or an iterable. If multiple files (such as several different days) are specified, they will be joined by coordinates with xarray's merge function." ] }, { "cell_type": "code", "execution_count": 3, "id": "94e9281e-0845-411b-a59d-96456ccb5a6b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reading: ./output/save_and_read/0905_airnow_RACM_ESRL.nc4\n", "Reading: ./output/save_and_read/0905_airnow_RACM_ESRL_VCP.nc4\n" ] } ], "source": [ "an.read_analysis()" ] }, { "cell_type": "code", "execution_count": 4, "id": "774455e6-6dc1-4e65-995c-3ac75cd0a9d7", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 2GB\n",
       "Dimensions:     (time: 2091, x: 3786)\n",
       "Coordinates:\n",
       "  * time        (time) datetime64[ns] 17kB 2019-09-01 ... 2019-09-30T00:30:00\n",
       "  * x           (x) int64 30kB 0 1 2 3 4 5 6 ... 3780 3781 3782 3783 3784 3785\n",
       "Data variables: (12/35)\n",
       "    BARPR       (time, x) float64 63MB ...\n",
       "    BC          (time, x) float64 63MB ...\n",
       "    CO          (time, x) float64 63MB ...\n",
       "    NO          (time, x) float64 63MB ...\n",
       "    NO2         (time, x) float64 63MB ...\n",
       "    NO2Y        (time, x) float64 63MB ...\n",
       "    ...          ...\n",
       "    cmsa_name   (x) float64 30kB ...\n",
       "    msa_code    (x) float64 30kB ...\n",
       "    msa_name    (x) <U52 787kB ...\n",
       "    state_name  (x) <U2 30kB ...\n",
       "    epa_region  (x) <U5 76kB ...\n",
       "    siteid      (x) <U12 182kB ...\n",
       "Attributes:\n",
       "    title:         \n",
       "    format:        NetCDF-4\n",
       "    date_created:  2026-05-12\n",
       "    dict_json:     {\\n    "type": "pt_sfc",\\n    "radius_of_influence": 10000...\n",
       "    group_name:    airnow_RACM_ESRL
" ], "text/plain": [ " Size: 2GB\n", "Dimensions: (time: 2091, x: 3786)\n", "Coordinates:\n", " * time (time) datetime64[ns] 17kB 2019-09-01 ... 2019-09-30T00:30:00\n", " * x (x) int64 30kB 0 1 2 3 4 5 6 ... 3780 3781 3782 3783 3784 3785\n", "Data variables: (12/35)\n", " BARPR (time, x) float64 63MB ...\n", " BC (time, x) float64 63MB ...\n", " CO (time, x) float64 63MB ...\n", " NO (time, x) float64 63MB ...\n", " NO2 (time, x) float64 63MB ...\n", " NO2Y (time, x) float64 63MB ...\n", " ... ...\n", " cmsa_name (x) float64 30kB ...\n", " msa_code (x) float64 30kB ...\n", " msa_name (x) \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 2GB\n",
       "Dimensions:     (time: 2091, x: 3786)\n",
       "Coordinates:\n",
       "  * time        (time) datetime64[ns] 17kB 2019-09-01 ... 2019-09-30T00:30:00\n",
       "  * x           (x) int64 30kB 0 1 2 3 4 5 6 ... 3780 3781 3782 3783 3784 3785\n",
       "Data variables: (12/35)\n",
       "    BARPR       (time, x) float64 63MB ...\n",
       "    BC          (time, x) float64 63MB ...\n",
       "    CO          (time, x) float64 63MB ...\n",
       "    NO          (time, x) float64 63MB ...\n",
       "    NO2         (time, x) float64 63MB ...\n",
       "    NO2Y        (time, x) float64 63MB ...\n",
       "    ...          ...\n",
       "    cmsa_name   (x) float64 30kB ...\n",
       "    msa_code    (x) float64 30kB ...\n",
       "    msa_name    (x) <U52 787kB ...\n",
       "    state_name  (x) <U2 30kB ...\n",
       "    epa_region  (x) <U5 76kB ...\n",
       "    siteid      (x) <U12 182kB ...\n",
       "Attributes:\n",
       "    title:         \n",
       "    format:        NetCDF-4\n",
       "    date_created:  2026-05-12\n",
       "    dict_json:     {\\n    "type": "pt_sfc",\\n    "radius_of_influence": 10000...\n",
       "    group_name:    airnow_RACM_ESRL_VCP
" ], "text/plain": [ " Size: 2GB\n", "Dimensions: (time: 2091, x: 3786)\n", "Coordinates:\n", " * time (time) datetime64[ns] 17kB 2019-09-01 ... 2019-09-30T00:30:00\n", " * x (x) int64 30kB 0 1 2 3 4 5 6 ... 3780 3781 3782 3783 3784 3785\n", "Data variables: (12/35)\n", " BARPR (time, x) float64 63MB ...\n", " BC (time, x) float64 63MB ...\n", " CO (time, x) float64 63MB ...\n", " NO (time, x) float64 63MB ...\n", " NO2 (time, x) float64 63MB ...\n", " NO2Y (time, x) float64 63MB ...\n", " ... ...\n", " cmsa_name (x) float64 30kB ...\n", " msa_code (x) float64 30kB ...\n", " msa_name (x) `, {attr}`an.models `, {attr}`an.obs `)." ] }, { "cell_type": "markdown", "id": "9b9f6c4e-ee74-4ebb-9a26-3a0dcda3faf5", "metadata": {}, "source": [ "```python\n", "# For netCDF files \n", "from melodies_monet.util.read_util import read_saved_data\n", "\n", "read_saved_data(\n", " analysis=an,\n", " filenames={'airnow_wrfchem_v4.2': ['0905_airnow_wrfchem_v4.2.nc4']},\n", " method='netcdf',\n", " attr='paired')\n", "```" ] }, { "cell_type": "markdown", "id": "90908ece-b080-4057-ad45-0b4b4d89fd8e", "metadata": {}, "source": [ "```python\n", "# For pickle files \n", "from melodies_monet.util.read_util import read_saved_data\n", "\n", "read_saved_data(analysis=an, filenames=['0905.pkl'], method='pkl', attr='paired')\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.15" } }, "nbformat": 4, "nbformat_minor": 5 }