Skip Navigation Links www.nws.noaa.gov 
NOAA logo - Click to go to the NOAA home page National Weather Service   NWS logo - Click to go to the NWS home page
Climate Prediction Center
 
 

 
About Us
   Our Mission
   Who We Are

Contact Us
   CPC Information
   CPC Web Team

 
HOME > Monitoring_and_Data > Oceanic and Atmospheric Data > Reanalysis: Atmospheric Data > wgrib2 all about OpenMP
 

wgrib2: all about OpenMP

Introduction

Wgrib2 needs to run fast, and the only way to make wgrib2 run fast is to use multiple cores. Wgrib2 uses "OpenMP" to multithread its calculations. Some parts of wgrib2 are scalar (jpeg2000, I/O) and see no speedup for using mutiple cores. While other parts (-new_grid, -ens_processing, complex unpacking) see large speedups. Much effort has been made to parallelize time-consuming parts of the code. In order to use OpenMP, the compilers must support at least OpenMP v3.1. Some additional speedups can be obtained by compiling with AVX512 or AVX2 enabled and using a version of OpenMP that supports SIMD.

To check that your wgrib2 executable is OpenMP enabled, examine the output of $ wgrib2 -config

ebis@landing2:~$ wgrib2 -config
wgrib2 v3.1.4beta1 11/2023  Wesley Ebisuzaki, Reinoud Bokhorst, John Howard, Jaakko Hyvätti, Dusan Jovic, Daniel Lee, Kristian Nilssen, Karl Pfeiffer, Pablo Romero, Manfred Schwarb, Gregor Schee, Arlindo da Silva, Niklas Sondell, Sam Trahan, George Trojan, Sergey Varlamov
..
OpenMP: control number of threads with environment variable OMP_NUM_THREADS
..
If you don't have the line "OpenMP: control ..", then your copy of wgrib2 is not OpenMP enabled.

Using OpenMP

You control the number of cores that OpenMP-enabled programs will use by the environment variable OPM_NUM_THREADS.

bash, sh
   $ export OMP_NUM_THREADS=4
csh
   $ setenv OMP_NUM_THREADS 4

There is no hard-and-fast rule for the optimum number of threads to allocate to wgrib2. For example, if I am running 4 copies of wgrib2 on a 4 core cpu, I would set OMP_NUM_THREADS to 1. If I am running wgrib2 on a 128 core CPU with 30 other users, I may set OMP_NUM_THREADS to 4. Generally the wgrib2 speed up is minimal for greater than 5 cores unless you are in heavy compute options that are well parallelized such as -new_grid and -ens_processing.

Wgrib2 uses 1 core in serial sections of the code and up to OMP_NUM_THREADS in the parallel sections of the code. Under normal situations, you want the unused cores (up to OMP_NUM_THREADS-1) to be made available for other jobs. You do this by

bash, sh
  $ export OMP_WAIT_POLICY=PASSIVE
csh
  $ setenv OMP_WAIT_POLICY PASSIVE

On a HPC node where your jobs have sole use of the (physical) CPU, you may want to set the OMP_WAIT_POLICY to ACTIVE.

-ncpu

The environment variable $OMP_NUM_THREADS can be overridden by the option -ncpu N.

SIMD

OpenMP v4.0+ supports SIMD. The current wgrib2 (5/2024) prefers multi-threading over SIMD because multi-threading can be used in more cases and SIMD has a limited vector length. There are a limited number of cases where the outer loop is parallelized by multi-threading and the inner loop is parallelized by SIMD. It is hard to generalize about speed of multi-threading vs SIMD. Multi-threading could have speed advantages when more than two memory controllers are present.

To best take advantage of SIMD, wgrib2 should be compiled with AVX-512 or AVX2 enabled (x86 cpus).

See also: -ncpu, speed,


NOAA/ National Weather Service
National Centers for Environmental Prediction
Climate Prediction Center
5830 University Research Court
College Park, Maryland 20740
Climate Prediction Center Web Team
Page last modified: June 2024
Disclaimer Privacy Policy