Saving Paired Data
First let’s just import the driver.
from melodies_monet import driver
Please install h5py to open files from the Amazon S3 servers.
Please install h5netcdf to open files from the Amazon S3 servers.
Read model, obs and Pair the data
an = driver.analysis()
an.control = "control_wrfchem_saveandread.yaml"
an.read_control()
an.control_dict
an.open_models()
an.open_obs()
an.pair_data()
Show code cell output
example:wrfchem:racm_esrl
**** Reading WRF-Chem model output...
example:wrfchem:racm_esrl_vcp
**** Reading WRF-Chem model output...
After pairing: time BARPR BC CO NO NO2 NO2Y NOX NOY OZONE \
0 2019-09-01 00:00:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 25.0
1 2019-09-01 00:15:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
2 2019-09-01 00:30:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
3 2019-09-01 01:00:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 24.0
4 2019-09-01 01:15:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
... ... ... ... ... ... ... ... ... ... ...
7916521 2019-09-29 23:15:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
7916522 2019-09-29 23:30:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
7916523 2019-09-30 00:00:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 8.0
7916524 2019-09-30 00:15:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
7916525 2019-09-30 00:30:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
... longitude cmsa_name msa_code msa_name state_name \
0 ... -52.8167 -1.0 -1.0 CC
1 ... -52.8167 -1.0 -1.0 CC
2 ... -52.8167 -1.0 -1.0 CC
3 ... -52.8167 -1.0 -1.0 CC
4 ... -52.8167 -1.0 -1.0 CC
... ... ... ... ... ... ...
7916521 ... 69.2725 -1.0 -1.0
7916522 ... 69.2725 -1.0 -1.0
7916523 ... 69.2725 -1.0 -1.0
7916524 ... 69.2725 -1.0 -1.0
7916525 ... 69.2725 -1.0 -1.0
epa_region time_local siteid PM2_5_DRY o3
0 CA 2019-08-31 20:00:00 000010102 NaN NaN
1 CA 2019-08-31 20:15:00 000010102 NaN NaN
2 CA 2019-08-31 20:30:00 000010102 NaN NaN
3 CA 2019-08-31 21:00:00 000010102 NaN NaN
4 CA 2019-08-31 21:15:00 000010102 NaN NaN
... ... ... ... ... ..
7916521 DSUZ 2019-09-30 04:15:00 UZB010001 NaN NaN
7916522 DSUZ 2019-09-30 04:30:00 UZB010001 NaN NaN
7916523 DSUZ 2019-09-30 05:00:00 UZB010001 NaN NaN
7916524 DSUZ 2019-09-30 05:15:00 UZB010001 NaN NaN
7916525 DSUZ 2019-09-30 05:30:00 UZB010001 NaN NaN
[7916526 rows x 36 columns]
After pairing: time BARPR BC CO NO NO2 NO2Y NOX NOY OZONE \
0 2019-09-01 00:00:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 25.0
1 2019-09-01 00:15:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
2 2019-09-01 00:30:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
3 2019-09-01 01:00:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 24.0
4 2019-09-01 01:15:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
... ... ... ... ... ... ... ... ... ... ...
7916521 2019-09-29 23:15:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
7916522 2019-09-29 23:30:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
7916523 2019-09-30 00:00:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 8.0
7916524 2019-09-30 00:15:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
7916525 2019-09-30 00:30:00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 NaN
... longitude cmsa_name msa_code msa_name state_name \
0 ... -52.8167 -1.0 -1.0 CC
1 ... -52.8167 -1.0 -1.0 CC
2 ... -52.8167 -1.0 -1.0 CC
3 ... -52.8167 -1.0 -1.0 CC
4 ... -52.8167 -1.0 -1.0 CC
... ... ... ... ... ... ...
7916521 ... 69.2725 -1.0 -1.0
7916522 ... 69.2725 -1.0 -1.0
7916523 ... 69.2725 -1.0 -1.0
7916524 ... 69.2725 -1.0 -1.0
7916525 ... 69.2725 -1.0 -1.0
epa_region time_local siteid PM2_5_DRY o3
0 CA 2019-08-31 20:00:00 000010102 NaN NaN
1 CA 2019-08-31 20:15:00 000010102 NaN NaN
2 CA 2019-08-31 20:30:00 000010102 NaN NaN
3 CA 2019-08-31 21:00:00 000010102 NaN NaN
4 CA 2019-08-31 21:15:00 000010102 NaN NaN
... ... ... ... ... ..
7916521 DSUZ 2019-09-30 04:15:00 UZB010001 NaN NaN
7916522 DSUZ 2019-09-30 04:30:00 UZB010001 NaN NaN
7916523 DSUZ 2019-09-30 05:00:00 UZB010001 NaN NaN
7916524 DSUZ 2019-09-30 05:15:00 UZB010001 NaN NaN
7916525 DSUZ 2019-09-30 05:30:00 UZB010001 NaN NaN
[7916526 rows x 36 columns]
Save data using control file
Note: This is the complete file that was loaded.
1# General Description:
2# - Any key that is specific for a plot type will begin with `ts` for timeseries, `ty` for taylor.
3# - Some keys/groups are optional.
4# - For now, all plots except time series average over the analysis window.
5# - Setting axis values
6# - If set_axis = True in data_proc section of each plot_grp,
7# the yaxis for the plot will be set based on the values
8# specified in the obs section for each variable.
9# - If set_axis is set to False, then defaults will be used.
10# - 'vmin_plot' and 'vmax_plot' are needed for
11# 'timeseries', 'spatial_overlay', and 'boxplot'.
12# - 'vdiff_plot' is needed for 'spatial_bias' plots
13# - 'ty_scale' is needed for 'taylor' plots.
14# - 'nlevels' or the number of levels used in the contour plot can also optionally be provided for spatial_overlay plot.
15# - If set_axis = True and the proper limits are not provided in the obs section,
16# a warning will print, and the plot will be created using the default limits.
17analysis:
18 start_time: "2019-09-05-06:00:00" # UTC
19 end_time: "2019-09-06-06:00:00" # UTC
20 output_dir: ./output/save_and_read # relative to the program using this control file
21 # Currently, the directory must exist or plot saving will error and fail.
22 output_dir_save: ./output/save_and_read #Opt Directory to use for melodies-monet data from 'save' below.
23 # If not specified, saved melodies-monet data stored in output_dir.
24 output_dir_read: ./output/save_and_read #Opt Directory to use for melodies-monet data from 'read' below.
25 # If not specified, reads melodies-monet data from output_dir.
26 # To not assume any directory for reading (use paths specified under 'read' directly) set output_dir_read: null
27 debug: True
28 save:
29 paired:
30 method: 'netcdf' # 'netcdf' or 'pkl'
31 prefix: '0905' # use only with method=netcdf
32 # output_name: '0905.pkl' # use only with method=pkl
33 data: 'all' # 'all' to save out all pairs or ['pair1','pair2',...] to save out specific pairs. With method='pkl' this is ignored and always saves all.
34 # models:
35 # obs:
36 read:
37 paired:
38 method: 'netcdf' # 'netcdf' or 'pkl'
39 filenames: {'airnow_RACM_ESRL':['0905_airnow_RACM_ESRL.nc4'],
40 'airnow_RACM_ESRL_VCP':['0905_airnow_RACM_ESRL_VCP.nc4']} # example for netcdf method. Uses dict of form {group1: str or iterable of filenames, group2:...}. Any wildcards will be expanded
41 # filenames: ['0904.pkl','0905.pkl'] # example for pkl method, uses str or iterable of filenames
42 # models:
43 # obs:
44
45model:
46 RACM_ESRL: # model label
47 files: example:wrfchem:racm_esrl
48 mod_type: "wrfchem"
49 mod_kwargs:
50 mech: "racm_esrl_vcp"
51 surf_only_nc: True # specify that we have only one vertical level; WRF-Chem specific
52 radius_of_influence: 12000 # meters
53 mapping: # of _model_ species name to _obs_ species name
54 airnow: # specifically for the obs labeled 'airnow'
55 PM2_5_DRY: "PM2.5"
56 o3: "OZONE"
57 projection: ~
58 plot_kwargs: # optional
59 color: "magenta"
60 marker: "s"
61 linestyle: "-"
62 RACM_ESRL_VCP:
63 files: example:wrfchem:racm_esrl_vcp
64 mod_type: "wrfchem"
65 mod_kwargs:
66 mech: "racm_esrl_vcp"
67 surf_only_nc: True
68 radius_of_influence: 12000
69 mapping:
70 airnow:
71 PM2_5_DRY: "PM2.5"
72 o3: "OZONE"
73 projection: ~
74 plot_kwargs:
75 color: "gold"
76 marker: "o"
77 linestyle: "-"
78
79obs:
80 airnow: # obs label
81 use_airnow: True
82 filename: example:airnow:2019-09
83 obs_type: pt_sfc
84 variables: # optional
85 OZONE:
86 unit_scale: 1
87 # ^ optional; Scaling factor
88 unit_scale_method: "*"
89 # ^ optional; Multiply = '*' , Add = '+', subtract = '-', divide = '/'
90 nan_value: -1.0
91 # ^ optional; When loading data, set this value to NaN
92 ylabel_plot: "Ozone (ppbv)"
93 # optional; set ylabel in order to include units and/or other info
94 vmin_plot: 15.0
95 # ^ optional; Min for y-axis during plotting.
96 # To apply to a plot, change restrict_yaxis = True.
97 vmax_plot: 55.0
98 # ^ optional; Max for y-axis during plotting.
99 # To apply to a plot, change restrict_yaxis = True.
100 vdiff_plot: 20.0
101 # ^ optional; +/- range to use in bias plots.
102 # To apply to a plot, change restrict_yaxis = True.
103 nlevels_plot: 21
104 # ^ optional; number of levels used in colorbar for contourf plot.
105 PM2.5:
106 unit_scale: 1
107 unit_scale_method: "*"
108 # obs_min: 0
109 # ^ optional; set all values less than this value to NaN
110 # obs_max: 100
111 # ^ optional; set all values greater than this value to NaN
112 nan_value: -1.0
113 # Note: The obs_min, obs_max, and nan_values are set to NaN first
114 # and then the unit conversion is applied.
115 ylabel_plot: "PM2.5 (ug/m3)"
116 ty_scale: 2.0 # optional; `ty_` indicates for Taylor diagram plot
117 vmin_plot: 0.0
118 vmax_plot: 22.0
119 vdiff_plot: 15.0
120 nlevels_plot: 23
121
122plots:
123 plot_grp1:
124 type: "timeseries" # plot type
125 fig_kwargs: # optional; to define figure options
126 figsize: [12, 6] # figure size (width, height) in inches
127 default_plot_kwargs:
128 # ^ optional; Define defaults for all plots.
129 # Important: Model kwargs overwrite these.
130 linewidth: 2.0
131 markersize: 10.
132 text_kwargs: # optional
133 fontsize: 24.
134 domain_type: ["all", "state_name", "epa_region"]
135 # ^ List of domain types: 'all' or any domain in obs file.
136 # (e.g., airnow: epa_region, state_name, siteid, etc.)
137 domain_name: ["CONUS", "CA", "R9"]
138 # ^ List of domain names. If domain_type = all,
139 # the domain name is used in the plot title.
140 data: ["airnow_RACM_ESRL", "airnow_RACM_ESRL_VCP"]
141 # ^ make this a list of pairs in obs_model
142 # where the obs is the obs label and model is the model_label
143 data_proc: # optional??
144 rem_obs_nan: True
145 # ^ True: Remove all points where model or obs variable is NaN.
146 # False: Remove only points where model variable is NaN.
147 ts_select_time: "time_local" # `ts_` indicates this is time series plot-specific
148 # ^ Time used for avg and plotting
149 # Options: 'time' for UTC or 'time_local'
150 ts_avg_window: "H"
151 # ^ Options: None for no averaging, pandas resample rule (e.g., 'H', 'D')
152 set_axis: True
153 # ^ If true, add `vmin_plot` and `vmax_plot` for each variable in obs.
154
155 plot_grp2:
156 type: "taylor"
157 fig_kwargs:
158 figsize: [8, 8]
159 default_plot_kwargs:
160 linewidth: 2.0
161 markersize: 10.
162 text_kwargs:
163 fontsize: 16.
164 domain_type: ["all"]
165 domain_name: ["CONUS"]
166 data: ["airnow_RACM_ESRL", "airnow_RACM_ESRL_VCP"]
167 data_proc:
168 rem_obs_nan: True
169 set_axis: True
170
171 plot_grp3:
172 type: "spatial_bias"
173 fig_kwargs: # optional; For all spatial plots, specify map_kwargs here too.
174 states: True # such as whether to show the state boundaries
175 figsize: [10, 5]
176 text_kwargs:
177 fontsize: 16.
178 domain_type: ["all",]
179 domain_name: ["CONUS"]
180 data: ["airnow_RACM_ESRL", "airnow_RACM_ESRL_VCP"]
181 data_proc:
182 rem_obs_nan: True
183 set_axis: True
184
185 plot_grp4:
186 type: "spatial_overlay"
187 fig_kwargs:
188 states: True
189 figsize: [10, 5]
190 text_kwargs:
191 fontsize: 16.
192 domain_type: ["all", "epa_region"]
193 domain_name: ["CONUS", "R9"]
194 data: ["airnow_RACM_ESRL", "airnow_RACM_ESRL_VCP"]
195 data_proc:
196 rem_obs_nan: True
197 set_axis: True
198
199 plot_grp5:
200 type: "boxplot"
201 fig_kwargs:
202 figsize: [8, 6]
203 text_kwargs:
204 fontsize: 20.
205 domain_type: ["all"]
206 domain_name: ["CONUS"]
207 data: ["airnow_RACM_ESRL", "airnow_RACM_ESRL_VCP"]
208 data_proc:
209 rem_obs_nan: True
210 set_axis: False
211
212stats:
213 # Stats require positive numbers, so if you want to calculate temperature use Kelvin!
214 # Wind direction has special calculations for AirNow if obs name is 'WD'
215 stat_list: ["MB", "MdnB", "R2", "RMSE"]
216 # ^ List stats to calculate. Dictionary of definitions included
217 # in submodule `plots/proc_stats`. Only stats listed below are currently working.
218 # Full calc list:
219 # ['STDO', 'STDP', 'MdnNB','MdnNE','NMdnGE',
220 # 'NO', 'NOP', 'NP', 'MO', 'MP', 'MdnO', 'MdnP',
221 # 'RM', 'RMdn', 'MB', 'MdnB', 'NMB', 'NMdnB', 'FB',
222 # 'ME','MdnE','NME', 'NMdnE', 'FE', 'R2', 'RMSE','d1',
223 # 'E1', 'IOA', 'AC']
224 round_output: 2 # optional; defaults to rounding to 3rd decimal place
225 output_table: False
226 # ^ Always outputs a .txt file.
227 # Optional to also output a Matplotlib figure table (image).
228 output_table_kwargs: # optional
229 figsize: [7, 3]
230 fontsize: 12.
231 xscale: 1.4
232 yscale: 1.4
233 edges: "horizontal"
234 domain_type: ["all"]
235 domain_name: ["CONUS"]
236 data: ["airnow_RACM_ESRL", "airnow_RACM_ESRL_VCP"]
The driver will save the data based on the information included in the control file by calling an.save_analysis()
.
In the control file analysis section, setting method to 'netcdf'
for a given attribute of the analysis class (e.g., paired, models, obs) will write netcdf4 files to the output directory. For example, when saving out paired data, it will write a separate file for each model/obs pairing. The filenames take the format <prefix>_<label>.nc4
, where for example the label of a paired class may be 'airnow_RACM_ESRL'
or 'airnow_RACM_ESRL_VCP'
.
In the control file analysis section, setting method to 'pkl'
for a given attribute of the analysis class (e.g., paired, models, obs) will write pickle files to the output directory. Unlike with the netCDF files, all pairs will be saved in the same pickle file. The output filename is set with the 'output_name'
in the control file.
Note
Be careful when saving pickle files for later analysis or when files will be used by multiple users. A change to the structure of xarray objects between saving the file and reading the file (for example if the version of xarray is different) can break the functionality of reading saved pickle files with MELODIES-MONET.
an.save_analysis()
Writing: ./output/save_and_read/0905_airnow_RACM_ESRL.nc4
Writing: ./output/save_and_read/0905_airnow_RACM_ESRL_VCP.nc4
Save data without using control file
Alternatively, the same can be achieved by calling write_analysis_ncf()
or write_pkl()
directly. The object to save must be an attribute of the instance of the analysis class (e.g., an.paired
, an.models
, an.obs
).
# For netCDF files
from melodies_monet.util.write_util import write_analysis_ncf
write_analysis_ncf(obj=an.paired, output_dir='./output/save_and_read',
fn_prefix='0905')
Writing: ./output/save_and_read/0905_airnow_RACM_ESRL.nc4
Writing: ./output/save_and_read/0905_airnow_RACM_ESRL_VCP.nc4
# For pickle files
from melodies_monet.util.write_util import write_pkl
write_pkl(obj=an.paired, output_name='./output/save_and_read/0905.pkl')
Writing: ./output/save_and_read/0905.pkl