An Efficient Spectral Dynamical Core for Distributed Memory Computers

L. Rivier; R. Loft; Lorenzo M. Polvani

An Efficient Spectral Dynamical Core for Distributed Memory Computers
Rivier, L.
Loft, R.
Polvani, Lorenzo M.
Applied Physics and Applied Mathematics
Earth and Environmental Sciences
Persistent URL:
Book/Journal Title:
Monthly Weather Review
The practical question of whether the classical spectral transform method, widely used in atmospheric modeling, can be efficiently implemented on inexpensive commodity clusters is addressed. Typically, such clusters have limited cache and memory sizes. To demonstrate that these limitations can be overcome, the authors have built a spherical general circulation model dynamical core, called BOB (“Built on Beowulf”), which can solve either the shallow water equations or the atmospheric primitive equations in pressure coordinates. That BOB is targeted for computing at high resolution on modestly sized and priced commodity clusters is reflected in four areas of its design. First, the associated Legendre polynomials (ALPs) are computed “on the fly” using a stable and accurate recursion relation. Second, an identity is employed that eliminates the storage of the derivatives of the ALPs. Both of these algorithmic choices reduce the memory footprint and memory bandwidth requirements of the spectral transform. Third, a cache-blocked and unrolled Legendre transform achieves a high performance level that resists deterioration as resolution is increased. Finally, the parallel implementation of BOB is transposition-based, employing load-balanced, one-dimensional decompositions in both latitude and wavenumber. A number of standard tests is used to compare BOB's performance to two well-known codes—the Parallel Spectral Transform Shallow Water Model (PSTSWM) and the dynamical core of NCAR's Community Climate Model CCM3. Compared to PSTSWM, BOB shows better timing results, particularly at the higher resolutions where cache effects become important. BOB also shows better performance in its comparison with CCM3's dynamical core. With 16 processors, at a triangular spectral truncation of T85, it is roughly five times faster when computing the solution to the standard Held–Suarez test case, which involves 18 levels in the vertical. BOB also shows a significantly smaller memory footprint in these comparison tests.
Computer science
System theory
Item views
text | xml
Suggested Citation:
L. Rivier, R. Loft, Lorenzo M. Polvani, , An Efficient Spectral Dynamical Core for Distributed Memory Computers, Columbia University Academic Commons, .

Columbia University Libraries | Policies | FAQ