wgrib2: -ens_processing
Introduction (wgrib2 v3.0.0)
Please don't use -ens_processing with
wgrib2 v2.0.8 because you get the wrong results if the number of threads
is not one when using a newer gcc compiler. In addtion,
75% is inaccuracte, precip trace
amounts (option = 1) is bad, and a possible seg fault could occur.
(An out-of-bounds
array value is multipled by zero, so it doesn't affect the calculation.)
So you need to use wgrib2 v3.0.0+.
The -ens_processing option takes
the ensemble member data and creates a pre-defined set of ensemble statistics.
If you want find the probability that a certain field has
values exceeding a limit, you should look at trick 65 in
the
wgrib2 tricks page.
The -ens_processing option is used
to create various ensemble statistics such as
minimum, maximum, average, spread, and various percentiles in order to describe the PDF.
The values are calculated with respect
to all the ensemble members. The calculations are different from the calculations
used by the operational ensemble forecasts done at NCEP. So fields like the various percentiles
and spread will have different values from the NCEP operational products.
The results from the -ens_processing option should not
be considered to be replacements for the official products because of the different
algorithms used. (The exception will be the future climate reanalysis produced by CPC/NCEP.)
The percentiles values were chosen because the future CPC/NCEP climate reanalysis (CORe)
will be using 80 ensemble members.
The calculated variables are
1) ensemble mean, em = sum(x(i))/n, i=1..n
2) ensemble spread, RMSE = sqrt(sum((x(i)-em)**2)/n) note: n is used rather than n-1
3) minimum = minimum value over all ensemble members (for each grid point)
4) maximum = maximum value over all ensemble members (for each grid point)
5) 10 percentile
6) 25 percentile
7) 50 percentile (median)
8) 75 percentile
9) 90 percentile
Note: these calculations done when all the ensemble members have
values. This will affect parameters like the cloud-top temperature
when some of the ensemble members are cloud free and have no cloud-
top temperatures. The -enq_qc calculates the ensemble mean and spread
ignoring the undefined values.
There are a few common ways to compute the percentile, wgrib2 uses a method
recommended by NIST:
https://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm
percentile: sort values (1..N), p=percentile/100
x(1) if p ≤ 1/(N+1)
x(p*(N+1)) if 1/(N+1) ≤ p &le N/(N+1)
x(N) if N/(N+1) ≤ p ≤ 1
x(r) = x(floor(r))(1-(r-floor(r)) + x(floor(r)+1)(r-floor(r))
for non-integral r
For the probabilities, a count is used to determine them.
As previously mentioned, the above calculations may differ from those used
in deriving the operational products.
The -ens_processing option was developed for the
future CORe reanalysis. For this reanalysis, additional fields
are enabled by the second parameter set to "1". However, users are warned
that these fields are expected to change. Generation of additional
fields can be enabled by setting the second parameter to special values.
10) probability of precipitation > 0 *
11) probability of more than a trace of precipitation (trace=TBD) *
12) probability of 2m temperature greater than 273.15K *
13) 95 percentile for WIND at 10m above ground *
* optional, the definition of trace of precipitation is ad hoc
trace can mean precipitation that will get the rain gauge wet but register as zero
trace can mean less than 0.01 inches
trace can mean less than 0.1 mm
for wgrib2, trace = accumulated precip < 0.xxx mm, or a rate < 0.xxx mm/day.
The -ens_processing is unlike most wgrib2
options in that this option can use large amounts of memory. Suppose
that you have an 80 member ensemble and are processing the tmp500 field.
In order to calculate the percentiles of the tmp500, you need keep
all 80 tmp500 fields in memory. As the size of the grid and the number
of ensemble member increases, the required memory will increase.
Code Table 4.3, Type of Generating Process
Grib contains metadata, and one piece is the "generating process".
Wgrib2 tries its best to describe how the fields were created. This
is documented in detail because if varies between PDT and center that
created the grib file.
Ensemble mean, ensemble spread, ensemble minimum, ensemble maximum:
Code Table 4.3 is preserved from the input file.
Percentiles, Probabilities:
NCEP: code table 4.3 = 199, Ensemble forecast based on counting
Other centers: code table 4.3 = 4, Ensemble forecast
The NCEP files are more descriptive because an entry in a local table was
specially created for wgrib2.
Definition of an Ensemble Member
Wgrib2 v3.0.0 defined an ensemble as having the same Product Definition Template (PDT=0, 1, 8, 11)
except for perturbation number. The initial time and forecast times could be different as long
as the verification time was the same. This allows lagged averge forecast (LAF) ensembles.
With wgrib2 v3.0.0, the type of ensemble forecast can be different.
(code table 4.6)
This means that -ens_processing can now work with a
combination of perturbed and control runs.
Limiting the PDTs to 0, 1, 8 and 11 is unnecessarily restrictive. As a result,
-ens_processing does not work on aerosols, chemical
tracers and simulated satellite data from ensemble members.
Order of Fields
The -ens_processing requires fields
to be in a specific order. The data has to be processed
sequentially. If the
-ens_processing finds a field that is
unlike the previous fields (except for ensemble member id), it
writes out the summary fields and starts processing a new variable.
The number id of the ensemble members is ignored, and is not
even necessary.
bash-4.1$ wgrib2 all.grb | head -n 60
1:0:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+1
2:83628:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+2
3:164290:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+3
4:248531:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+4
..
19:1494083:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+19
20:1574514:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+20
21:1657406:d=2018020600:VIS:surface:84 hour fcst:ENS=+1
22:1703817:d=2018020600:VIS:surface:84 hour fcst:ENS=+2
..
39:2488331:d=2018020600:VIS:surface:84 hour fcst:ENS=+19
40:2533546:d=2018020600:VIS:surface:84 hour fcst:ENS=+20
41:2579586:d=2018020600:UGRD:planetary boundary layer:84 hour fcst:ENS=+1
42:2629955:d=2018020600:UGRD:planetary boundary layer:84 hour fcst:ENS=+2
..
Order of Fields: gmerge to the rescue
The -ens_processing requires the
fields to be in a specific order (or processed in a specific order).
Suppose we have forecasts from 3 ensemble members in files fcst1, fcst2
and fcst3. Now we are going to require that the fields have the
same order in these files, and the files have no submessages.
Then we can combine these files using gmerge, the resulting file
will be in the correct order. (Gmerge has been included with wgrib2
source code for many years.) There is a minor restriction, gmerge
doesn't handle grib files which includes non-grib data.
gmerge output fcst1 fcst2 fcst3
The requirements for output to be in the right order.
- fcstX must have fields in the same order
- number of forecasts &le 200 for wgrib2 v2.0.8 distribution
- fcstX must not use submessages
- the like forecasts must have same PDT except for ensemble number
- the ensemble number is optional
- no non-grib data in grib file
You can find gmerge in the wgrib2 public distrbution under the aux_progs directory.
If your initial files are not in identical order, you could combine the forecasts
and sort the fields so that like fields are adjacent. However, sorting the individual
forecasts and then running gmerge would probably be faster.
Now that "output" has the data in the correct order, the
option -ens_processing can be used to
create the min/max/ave/spread of the ensemble.
gmerge output1 fcst1 fcst2 fcst3
wgrib2 output1 -ens_processing output2 x
note: x is a dummy argument, it may be used in the future
For faster processing, we can replace the disk file "output" by a pipe.
This method may or may not work under Windows.
gmerge - fcst1 fcst2 fcst3 | wgrib2 - -ens_processing output x
gmerge - A B C writes the output to stdout (denoted by the -)
wgrib2 - (..) reads the input grib data from stdin (denoted by the -)
NOTE 1: must use a recent version of gmerge. Old versions of gmerge
only accepted a small number of input files and did not allow
output to the pipe by "-".
NOTE 2: a common unix convention is that the "-" instead of a file name
represents either stdin or stdout. By knowing the program arguements,
one can determine whether the dash is stdin or stdout. Wgrib2 uses
this convention in most places that require a file name.
Fast Processing at the expense of Large Memory
Using gmerge, and sending the output to wgrib2 is simple. However, it
can be I/O inefficient especially for large ensembles. Consider a
small system that has 80 ensemble member data on one drive. For the 1st
field, the system has to read field-1 of file-1. For the second field,
the system has to read field-1 from file-2. For the 81st field, the
system has to read field-2 from file-1. Hopefully the disk cache was
big enough that the field-2 of file-1 was still cached. The speed of
processing depends on the size of disk cache. This example is for
a small system, what would happen on a HPC system? Disk cache is
much larger but so is the block size. If one job takes a large fraction
of the disk cache, the other jobs will be much more disk inefficient.
A better approach is to adopt the technique used in
fast averaging. You may be
limited by the amount of physical memory on the system.
cat fcst.2018122600.mem* | wgrib2 - -set_grib_type c3 \
-if (xxx1) -ens_processing out 0 -fi \
-if (xxx2) -ens_proceesing out 0 -fi \
...
-if (xxxN) -ens_proceesing out 0 -fi
This approach only has two open files at any one time, and
the files are read and written sequentially. This approach is
harder to program but is much more I/O friendly for both workstations
and HPCs. The drawback is that this approach needs to keep all
the fields in memory. (#ensemble members x #fields/file x NX x NY x 4 bytes).
Usage
-ens_processing FILE Option
FILE = output file, grib2 format
Option = 0 default
1 include probabilities (TMP2m, precip)
note: option 1 is intended for use by the future
Conventional Observation REanalysis
(CORe). The output will be determined the needs of this
reanalysis.
2 for future use or different output.
If you would like to add more output from -ens_processing, it
needs to be enabled by an option number. Ask me (WNE) for
a number.
Example
$ wgrib2 input -ens_processing output 0
1:0:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+1
2:83628:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+2
3:164290:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+3
4:248531:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+4
5:331723:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+5
6:412226:d=2018020600:PRES:mean sea level:84 hour fcst:ENS=+6
..
$ wgrib2 output
1:0:d=2018020600:PRES:mean sea level:84 hour fcst:min all members
2:130501:d=2018020600:PRES:mean sea level:84 hour fcst:max all members
3:261002:d=2018020600:PRES:mean sea level:84 hour fcst:ens mean
4:391503:d=2018020600:PRES:mean sea level:84 hour fcst:ens spread
5:497569:d=2018020600:PRES:mean sea level:84 hour fcst:25%-75% range
6:611780:d=2018020600:PRES:mean sea level:84 hour fcst:10% all members
7:742281:d=2018020600:PRES:mean sea level:84 hour fcst:90% all members
8:872782:d=2018020600:VIS:surface:84 hour fcst:min all members
9:938123:d=2018020600:VIS:surface:84 hour fcst:max all members
...
GrADS
At the time of writing (1/2018), the files that
are produced by
-ens_processing cannot be displayed
using the g2ctl/gribmap/GrADS set of programs. However, they
can be displayed by atl_g2ctl/alt_gmp/GrADS set of programs.
alt_g2ctl -short output >output.ctl
alt_gmp output.ctl
grads
See also:
-ens_qc
|