Saving Paired Data

First let’s just import the driver.

from melodies_monet import driver
Please install h5py to open files from the Amazon S3 servers.
Please install h5netcdf to open files from the Amazon S3 servers.

Read model, obs and Pair the data

an = driver.analysis()
an.control = "control_wrfchem_saveandread.yaml"
an.read_control()
an.control_dict

an.open_models()
an.open_obs()
an.pair_data()
Hide code cell output
example:wrfchem:racm_esrl
**** Reading WRF-Chem model output...
example:wrfchem:racm_esrl_vcp
**** Reading WRF-Chem model output...
After pairing:                         time  BARPR   BC   CO   NO  NO2  NO2Y  NOX  NOY  OZONE  \
0       2019-09-01 00:00:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0   25.0   
1       2019-09-01 00:15:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
2       2019-09-01 00:30:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
3       2019-09-01 01:00:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0   24.0   
4       2019-09-01 01:15:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
...                     ...    ...  ...  ...  ...  ...   ...  ...  ...    ...   
7916521 2019-09-29 23:15:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
7916522 2019-09-29 23:30:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
7916523 2019-09-30 00:00:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    8.0   
7916524 2019-09-30 00:15:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
7916525 2019-09-30 00:30:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   

         ...  longitude  cmsa_name  msa_code  msa_name  state_name  \
0        ...   -52.8167       -1.0      -1.0                    CC   
1        ...   -52.8167       -1.0      -1.0                    CC   
2        ...   -52.8167       -1.0      -1.0                    CC   
3        ...   -52.8167       -1.0      -1.0                    CC   
4        ...   -52.8167       -1.0      -1.0                    CC   
...      ...        ...        ...       ...       ...         ...   
7916521  ...    69.2725       -1.0      -1.0                         
7916522  ...    69.2725       -1.0      -1.0                         
7916523  ...    69.2725       -1.0      -1.0                         
7916524  ...    69.2725       -1.0      -1.0                         
7916525  ...    69.2725       -1.0      -1.0                         

         epa_region          time_local     siteid  PM2_5_DRY  o3  
0                CA 2019-08-31 20:00:00  000010102        NaN NaN  
1                CA 2019-08-31 20:15:00  000010102        NaN NaN  
2                CA 2019-08-31 20:30:00  000010102        NaN NaN  
3                CA 2019-08-31 21:00:00  000010102        NaN NaN  
4                CA 2019-08-31 21:15:00  000010102        NaN NaN  
...             ...                 ...        ...        ...  ..  
7916521        DSUZ 2019-09-30 04:15:00  UZB010001        NaN NaN  
7916522        DSUZ 2019-09-30 04:30:00  UZB010001        NaN NaN  
7916523        DSUZ 2019-09-30 05:00:00  UZB010001        NaN NaN  
7916524        DSUZ 2019-09-30 05:15:00  UZB010001        NaN NaN  
7916525        DSUZ 2019-09-30 05:30:00  UZB010001        NaN NaN  

[7916526 rows x 36 columns]
After pairing:                         time  BARPR   BC   CO   NO  NO2  NO2Y  NOX  NOY  OZONE  \
0       2019-09-01 00:00:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0   25.0   
1       2019-09-01 00:15:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
2       2019-09-01 00:30:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
3       2019-09-01 01:00:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0   24.0   
4       2019-09-01 01:15:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
...                     ...    ...  ...  ...  ...  ...   ...  ...  ...    ...   
7916521 2019-09-29 23:15:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
7916522 2019-09-29 23:30:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
7916523 2019-09-30 00:00:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    8.0   
7916524 2019-09-30 00:15:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   
7916525 2019-09-30 00:30:00   -1.0 -1.0 -1.0 -1.0 -1.0  -1.0 -1.0 -1.0    NaN   

         ...  longitude  cmsa_name  msa_code  msa_name  state_name  \
0        ...   -52.8167       -1.0      -1.0                    CC   
1        ...   -52.8167       -1.0      -1.0                    CC   
2        ...   -52.8167       -1.0      -1.0                    CC   
3        ...   -52.8167       -1.0      -1.0                    CC   
4        ...   -52.8167       -1.0      -1.0                    CC   
...      ...        ...        ...       ...       ...         ...   
7916521  ...    69.2725       -1.0      -1.0                         
7916522  ...    69.2725       -1.0      -1.0                         
7916523  ...    69.2725       -1.0      -1.0                         
7916524  ...    69.2725       -1.0      -1.0                         
7916525  ...    69.2725       -1.0      -1.0                         

         epa_region          time_local     siteid  PM2_5_DRY  o3  
0                CA 2019-08-31 20:00:00  000010102        NaN NaN  
1                CA 2019-08-31 20:15:00  000010102        NaN NaN  
2                CA 2019-08-31 20:30:00  000010102        NaN NaN  
3                CA 2019-08-31 21:00:00  000010102        NaN NaN  
4                CA 2019-08-31 21:15:00  000010102        NaN NaN  
...             ...                 ...        ...        ...  ..  
7916521        DSUZ 2019-09-30 04:15:00  UZB010001        NaN NaN  
7916522        DSUZ 2019-09-30 04:30:00  UZB010001        NaN NaN  
7916523        DSUZ 2019-09-30 05:00:00  UZB010001        NaN NaN  
7916524        DSUZ 2019-09-30 05:15:00  UZB010001        NaN NaN  
7916525        DSUZ 2019-09-30 05:30:00  UZB010001        NaN NaN  

[7916526 rows x 36 columns]

Save data using control file

The driver will save the data based on the information included in the control file by calling an.save_analysis().

In the control file analysis section, setting method to 'netcdf' for a given attribute of the analysis class (e.g., paired, models, obs) will write netcdf4 files to the output directory. For example, when saving out paired data, it will write a separate file for each model/obs pairing. The filenames take the format <prefix>_<label>.nc4, where for example the label of a paired class may be 'airnow_RACM_ESRL' or 'airnow_RACM_ESRL_VCP'.

In the control file analysis section, setting method to 'pkl' for a given attribute of the analysis class (e.g., paired, models, obs) will write pickle files to the output directory. Unlike with the netCDF files, all pairs will be saved in the same pickle file. The output filename is set with the 'output_name' in the control file.

Note

Be careful when saving pickle files for later analysis or when files will be used by multiple users. A change to the structure of xarray objects between saving the file and reading the file (for example if the version of xarray is different) can break the functionality of reading saved pickle files with MELODIES-MONET.

an.save_analysis()
Writing: ./output/save_and_read/0905_airnow_RACM_ESRL.nc4
Writing: ./output/save_and_read/0905_airnow_RACM_ESRL_VCP.nc4

Save data without using control file

Alternatively, the same can be achieved by calling write_analysis_ncf() or write_pkl() directly. The object to save must be an attribute of the instance of the analysis class (e.g., an.paired, an.models, an.obs).

# For netCDF files 
from melodies_monet.util.write_util import write_analysis_ncf
write_analysis_ncf(obj=an.paired, output_dir='./output/save_and_read',
                                               fn_prefix='0905')
Writing: ./output/save_and_read/0905_airnow_RACM_ESRL.nc4
Writing: ./output/save_and_read/0905_airnow_RACM_ESRL_VCP.nc4
# For pickle files 
from melodies_monet.util.write_util import write_pkl
write_pkl(obj=an.paired, output_name='./output/save_and_read/0905.pkl')
Writing: ./output/save_and_read/0905.pkl