gmerge
Making Ensemble, Analysis and Forecast Means
Introduction
There are two fast techniques for making averages with wgrib2, "gmerge" and
"fast averaging".
gmerge: Input A B and C are grib analyses, 3 hours apart, chronological and
have identical field order
gmerge - A B C | wgrib2 - -ave 3hr avg.grb
avg.grb has the averages of all fields in A
process is explained later
fast averaging: Input A B and C are grib analyses,3 hours apart and chronological
cat A B C | wgrib2 - -if "FIELD1" -ave 3hr avg.grb -endif \
-if "FIELD2" -ave 3hr avg.grb -endif \
...
-if "FIELDN" -ave 3hr avg.grb -endif
avg.grb has the averages of FIELD1,..,FIELDN
The "fast averaging" technique was developed
first and requires multiple averages to be done at one time.
The "gmerge" technique was developed for creating ensemble statistics
but also applies for making averages of forecasts and analyses.
The "gmerge" technique is easier to program, uses less memory but
requires the fields in the grib files to be a fixed order. The
"fast averaging" requires that the script generate a line of code
that gets executed. This page describes the gmerge technique
and the use of -merge_fcst to convert 3 hourly
TMAX fields to make daily TMAX fields.
Gmerge is a program included with wgrib2
in the aux_progs directory.
Gmerge is used to combine grib files by
reading them in round-robin order. Assuming that the input
files are in a fixed order, the output file will have the
order appropriate for wgrib2 to create
ensemble, analysis and forecast means efficiently.
Grib files consists of message. Suppose input files have two messages.
input1: (TMP-1) (HGT-1)
input2: (TMP-2) (HGT-2)
input3: (TMP-3) (HGT-3)
Suppose that you combine input1, input2 and input3 by the "cat" command.
$ cat input1 input2 input3 >out1
out1: (TMP-1) (HGT-1) (TMP-2) (HGT-2) (TMP-3) (HGT-3)
You can use gmerge to get "round robin" order of the file
$ gmerge out input1 input2 input3
out2: (TMP-1) (TMP-2) (TMP-3) (HGT-1) (HGT-2) (HGT-3)
Suppose you want calculate the average TMP and HGT.
With out2, you read the data sequentially. With out1, you
can create an index file, and then random access read the TMP
and HGT fields. Alternatively you could adopt a solution
that keeps all partial sums in memory like "fast averaging".
Usage
gmerge OUT FILE-1 FILE-2 ... FILE-N
OUT: - stdout, dash is a commonly used convention for stdin/stdout
OUT: (name) output grib file
FILE-i grib file with no non-grib2 data
N is limited by system, one less than given by
linux: $ ulimit -n
Using gmerge only makes sense when all the input files have
(1) no non-grib2 data in them, (2) all the input files have their
fields in the same order and (3) your processing program can process
the fields in sequential manner. This seems quite restrictive;
however, it is common. In a following
section, techniques will be shown to prepare the data so that
it meets the restrictions.
Ensemble Members
Wgrib2 can be used to generate ensemble statistics for members
of the same ensemble. The fields need to be in the same order
and the only difference in the metadata can be the number of the
ensemble member. CORe used gmerge and wgrib2 for generating the
statistics.
An example from CORe (80 member ensemble)
files: core.t03z.flx.mem001,..,core.t03z.flx.mem080
$ gmerge - core.t03z.flx.mem0?? | wgrib2 - -ens_processing flxstats.grb 1
1:0:d=2023111503:DLWRF:surface:anl:ENS=+1
2:195555:d=2023111503:DLWRF:surface:anl:ENS=+2
3:391194:d=2023111503:DLWRF:surface:anl:ENS=+3
4:587071:d=2023111503:DLWRF:surface:anl:ENS=+4
5:782846:d=2023111503:DLWRF:surface:anl:ENS=+5
6:978867:d=2023111503:DLWRF:surface:anl:ENS=+6
...
9197:1010069754:d=2023111503:CPRAT:surface:0-3 hour ave fcst:ENS=+77
9198:1010134971:d=2023111503:CPRAT:surface:0-3 hour ave fcst:ENS=+78
9199:1010200392:d=2023111503:CPRAT:surface:0-3 hour ave fcst:ENS=+79
9200:1010266285:d=2023111503:CPRAT:surface:0-3 hour ave fcst:ENS=+80
$
$ wgrib2 flxstats.grb | head
1:0:d=2023111503:DLWRF:surface:anl:min all members
2:262325:d=2023111503:DLWRF:surface:anl:max all members
3:524650:d=2023111503:DLWRF:surface:anl:ens mean
4:786975:d=2023111503:DLWRF:surface:anl:ens spread
...
Analyses
Wgrib2 can be used to combine analyses into daily or monthly mean analyses.
$ gmerge - a.* | wgrib2 - -ave DT OUT.grb
Note: a.* must a list of files in chronologcal order.
To make a daily mean, you have to restrict a.* to the files for that day.
To make a monthly mean, you have to restrict a.* to the files for that month.
An Example from CORe, making a daily mean
$ gmerge - flx_19900111??_ensmean.grb | wgrib2 - -ave 3hr /tmp/ave.grb
1:0:d=1990011100:DLWRF:surface:anl:ens mean
2:168405:d=1990011103:DLWRF:surface:anl:ens mean
3:337533:d=1990011106:DLWRF:surface:anl:ens mean
4:506437:d=1990011109:DLWRF:surface:anl:ens mean
5:675601:d=1990011112:DLWRF:surface:anl:ens mean
6:844337:d=1990011115:DLWRF:surface:anl:ens mean
7:1013963:d=1990011118:DLWRF:surface:anl:ens mean
8:1182991:d=1990011121:DLWRF:surface:anl:ens mean
9:1352216:d=1990011100:ULWRF:surface:anl:ens mean
10:1514196:d=1990011103:ULWRF:surface:anl:ens mean
...
918:103736065:d=1990011115:CPRAT:surface:0-3 hour ave fcst:ens mean
919:103825356:d=1990011118:CPRAT:surface:0-3 hour ave fcst:ens mean
920:103913495:d=1990011121:CPRAT:surface:0-3 hour ave fcst:ens mean
$ wgrib2 /tmp/ave.grb
1:0:d=1990011100:DLWRF:surface:8@3 hour ave(anl),missing=0:ens mean
2:262349:d=1990011100:ULWRF:surface:8@3 hour ave(anl),missing=0:ens mean
3:524698:d=1990011100:DSWRF:surface:8@3 hour ave(anl),missing=0:ens mean
4:754279:d=1990011100:USWRF:surface:8@3 hour ave(anl),missing=0:ens mean
5:1000244:d=1990011100:UGRD:10 m above ground:8@3 hour ave(anl),missing=0:ens mean
6:1262593:d=1990011100:VGRD:10 m above ground:8@3 hour ave(anl),missing=0:ens mean
...
Forecasts
Combining forecast files into daily, weekly, monthly or seasonal is much more
involved that with the previous files. The first problem is that the forecast hour = 0
often has fewer grib messages than the following forecasts. This is common with NCEP
forecast files because accumulations and temporal averages are unavailable at t=0.
Another problem with NCEP forecast files is that accumulations, averages, maximums and
minimums do not have a simple structure. For example a simple structure if all the AVEs
were like "M-N hour ave fcst" where N=M+constant. However, in my sample GFS forecast,
the constants were either 3 or 6.
Example from NCEP's gfs system.
$ wgrib2 gfs.t00z.pgrb2.1p00.f003 -match "ULWRF:top of atmosphere:"
658:39419182:d=2025051900:ULWRF:top of atmosphere:0-3 hour ave fcst:
$ wgrib2 gfs.t00z.pgrb2.1p00.f006 -match "ULWRF:top of atmosphere:"
658:39698487:d=2025051900:ULWRF:top of atmosphere:0-6 hour ave fcst:
$ wgrib2 gfs.t00z.pgrb2.1p00.f009 -match "ULWRF:top of atmosphere:"
658:39438111:d=2025051900:ULWRF:top of atmosphere:6-9 hour ave fcst:
$ wgrib2 gfs.t00z.pgrb2.1p00.f012 -match "ULWRF:top of atmosphere:"
658:39643948:d=2025051900:ULWRF:top of atmosphere:6-12 hour ave fcst:
Finally wgrib2 can average forecasts of different leads from the same
initial conditions using the -fcst_ave option.
The accumulations/averages/maximums/minimums are treated more carefully
by the -merge_fcst option. This option takes
two adjacent acc/ave/max/min intervals and makes a longer longer
acc/ave/max/min interval.
Step 1: averages of instantaneous forecasts
Example using GFS forecast files
list=`seq -f gfs.t00z.pgrb2.1p00.f%03.0f 24 3 45`
: makes a list of files to process, forcast hours 24 27 .. 45
: list=
: gfs.t00z.pgrb2.1p00.f024
: gfs.t00z.pgrb2.1p00.f027
; gfs.t00z.pgrb2.1p00.f030
: gfs.t00z.pgrb2.1p00.f033
: gfs.t00z.pgrb2.1p00.f036
: gfs.t00z.pgrb2.1p00.f039
: gfs.t00z.pgrb2.1p00.f042
: gfs.t00z.pgrb2.1p00.f045
gmerge - $list | wgrib2 - -not ' (ave|min|max|acc) ' -fcst_ave 3hr ave.grb
: gmerge - $list process $list in round-robin order and write to stdout
: wgrib2 - process grib input from stdin
: -not ' (ave|min|max|acc) ' only process N hour fcst
: -fcst_ave 3hr ave.grb the fields are spaced very 3 hours
Step 2: Examining input acc/ave/min/max fields
Example using GFS forecast files
listflx=`seq -f gfs.t00z.pgrb2.1p00.f%03.0f 24 3 45`
gmerge - $listflx | wgrib2 - -match ' (ave|min|max|acc) ' | head
4681:285550939:d=2025051900:TMAX:2 m above ground:18-24 hour max fcst:
4682:285598150:d=2025051900:TMAX:2 m above ground:24-27 hour max fcst:
4683:285644800:d=2025051900:TMAX:2 m above ground:24-30 hour max fcst:
4684:285691467:d=2025051900:TMAX:2 m above ground:30-33 hour max fcst:
4685:285739040:d=2025051900:TMAX:2 m above ground:30-36 hour max fcst:
4686:285786354:d=2025051900:TMAX:2 m above ground:36-39 hour max fcst:
4687:285834479:d=2025051900:TMAX:2 m above ground:36-42 hour max fcst:
4688:285882184:d=2025051900:TMAX:2 m above ground:42-45 hour max fcst:
4689:285929688:d=2025051900:TMIN:2 m above ground:18-24 hour min fcst:
4690:285978003:d=2025051900:TMIN:2 m above ground:24-27 hour min fcst:
The time starts 6 hour prior to the forecast hour so we need to adjust $listflx
listflx=`seq -f gfs.t00z.pgrb2.1p00.f%03.0f 30 3 51`
gmerge - $listflx | wgrib2 - -match ' (ave|min|max|acc) ' | head
4681:285378854:d=2025051900:TMAX:2 m above ground:24-30 hour max fcst:
4682:285425521:d=2025051900:TMAX:2 m above ground:30-33 hour max fcst:
4683:285473094:d=2025051900:TMAX:2 m above ground:30-36 hour max fcst:
4684:285520408:d=2025051900:TMAX:2 m above ground:36-39 hour max fcst:
4685:285568533:d=2025051900:TMAX:2 m above ground:36-42 hour max fcst:
4686:285616238:d=2025051900:TMAX:2 m above ground:42-45 hour max fcst:
4687:285663742:d=2025051900:TMAX:2 m above ground:42-48 hour max fcst:
4688:285710641:d=2025051900:TMAX:2 m above ground:48-51 hour max fcst:
4689:285757042:d=2025051900:TMIN:2 m above ground:24-30 hour min fcst:
4690:285804816:d=2025051900:TMIN:2 m above ground:30-33 hour min fcst:
The intervals are not appropriate for -merge_fcst, try again
listflx=`seq -f gfs.t00z.pgrb2.1p00.f%03.0f 30 6 51`
$ echo $listflx
gfs.t00z.pgrb2.1p00.f030 gfs.t00z.pgrb2.1p00.f036 gfs.t00z.pgrb2.1p00.f042 gfs.t00z.pgrb2.1p00.f048
$ gmerge - $listflx | wgrib2 - -match ' (ave|min|max|acc) ' | head
2341:142428715:d=2025051900:TMAX:2 m above ground:24-30 hour max fcst:
2342:142475382:d=2025051900:TMAX:2 m above ground:30-36 hour max fcst:
2343:142522696:d=2025051900:TMAX:2 m above ground:36-42 hour max fcst:
2344:142570401:d=2025051900:TMAX:2 m above ground:42-48 hour max fcst:
2345:142617300:d=2025051900:TMIN:2 m above ground:24-30 hour min fcst:
2346:142665074:d=2025051900:TMIN:2 m above ground:30-36 hour min fcst:
2347:142712663:d=2025051900:TMIN:2 m above ground:36-42 hour min fcst:
2348:142760444:d=2025051900:TMIN:2 m above ground:42-48 hour min fcst:
2373:144099440:d=2025051900:CPRAT:surface:24-30 hour ave fcst:
2374:144168768:d=2025051900:CPRAT:surface:30-36 hour ave fcst:
The intervals are good for -merge_fcst.
Step 3a: Merge the acc/ave/min/max fields
Now the intervals for TMAX are for day 2 and can be merged
listflx=`seq -f gfs.t00z.pgrb2.1p00.f%03.0f 30 6 51`
gmerge - $listflx | wgrib2 - -match ' (ave|min|max|acc) ' -merge_fcst 4 acc.grb
wgrib2 acc.grb | head
1:0:d=2025051900:TMAX:2 m above ground:1-2 day max fcst:
2:89798:d=2025051900:TMIN:2 m above ground:1-2 day min fcst:
3:179596:d=2025051900:CPRAT:surface:1-2 day ave fcst:
4:301974:d=2025051900:PRATE:surface:1-2 day ave fcst:
5:408062:d=2025051900:APCP:surface:1-2 day acc fcst:
6:506005:d=2025051900:ACPCP:surface:1-2 day acc fcst:
7:587658:d=2025051900:WATR:surface:1-2 day acc fcst:
8:640148:d=2025051900:CSNOW:surface:1-2 day ave fcst:
9:648496:d=2025051900:CICEP:surface:1-2 day ave fcst:
10:648699:d=2025051900:CFRZR:surface:1-2 day ave fcst:
Finally combine ave.grb and acc.grb
cat ave.grb acc.grb >day2.grb
Step 3b: Average the acc/ave/min/max fields
In step 3a, the fields were merged, the time intervals
were made longer. This is more appropriate for processing
into daily files.
listflx=`seq -f gfs.t00z.pgrb2.1p00.f%03.0f 30 6 51`
gmerge - $listflx | wgrib2 - -match ' (ave|min|max|acc) ' -fcst_ave 6hr flx.grb
wgrib2 flx.grb
$ wgrib2 flx.grb | head
1:0:d=2025051900:TMAX:2 m above ground:4@6 hour ave(24-30 hour max fcst)++,missing=0:
2:89810:d=2025051900:TMIN:2 m above ground:4@6 hour ave(24-30 hour min fcst)++,missing=0:
3:179620:d=2025051900:CPRAT:surface:4@6 hour ave(24-30 hour ave fcst)++,missing=0:
4:302010:d=2025051900:PRATE:surface:4@6 hour ave(24-30 hour ave fcst)++,missing=0:
...
4@6 hour ave(24-30 hour max fcst)++,missing=0
4@6 hour ave
You average 4 fields which are separated by 6 hours.
(24-30 hour max fcst)++
The first field is the 24-30 hour max fcst, the max in the 24-30 hour forecast.
The second field has 24 and 30 incremented by 6 hours, "30-36 hour max fcst"
...
So the grib metadata precisely describes the statistical operation done in a
perhaps difficult to understand manner.
Combining Steps 3a and 3b
The above approach is not ideal for TMAX and TMIN. You will want merge
TMAX and TMIN into daily TMAX and TMIN. Then you can average the daily TMAX
and TMIN values. So you would use step 3a to make daily files by merging.
Then you use step 3b to average the daily files.
So creating forecast means is a two processes for NCEP forecast files.
First you create the means for the "N hour fcst" fields using the
-fcst_ave option. Then you have to examine
the files for acc/ave/min/max fields. This may require a different
set of files to process. Then you process the acc/ave/max/min
fields using step 3a, 3b or both.
Does my data have a constant field order
For operational NCEP models, the field order will change with
major upgrades when new fields are added to the grib file.
That can't be avoided, but how about the order in the same
same version of the model. Most of NCEP's model output is created using the
Unified Post Processor (ncep post) and the order is specified by the control file.
Removing non-grib2 data
$ wgrib2 IN.grb -grib OUT.grb
Selecting and removing fields
$ wgrib2 IN.grb -match "(A|B|C)" -not "(G|H)" -grib OUT.grb
keep A, B, or C
remove G and H
Sorting fields
$ wgrib2 IN.grb | sort (whatever) | wgrib2 IN.grb -i -grib OUT.grb
puts the fields in a sorted order
Windows Compatibility
The above examples may not work in Windows. It's a problem
of mixing text and binary I/O to stdin and stdout.
See also:
-ens_processing,
-ave,
-fcst_ave,
-merge_fcst,
|