Archive for the ‘lustre’ Category

Building ROMIO’s Lustre driver

June 12th, 2015
Comments Off on Building ROMIO’s Lustre driver

When building the Lustre ADIO driver, one might run into a few problems.

  • caddr_t problems: 

    (” /usr/include/sys/quota.h:221: error: expected declaration specifiers or ‘…’ before ‘caddr_t'”)

    caddr_t is an old BSD-ism

  • ‘FASYNC’ undeclared:

    another old BSD-ism

  • ‘struct lov_user_md_v1’ has no member named ‘lmm_stripe_offset’

    recent versions of lustre moved this member into an anonymous union.

    These errors only show up when --enable-strict is selected.  MPICH is considering updating --enable-strict to allow for c99 and maybe even c11 features.  That would allow anonymous unions to work, but the FASYNC and caddr_t references are still going to cause issues.  Looks like you will have to build your Lustre-enabled ROMIO without --enable-strict.

    gpfs, lustre

ROMIO and Intel-MPI

June 12th, 2014
Comments Off on ROMIO and Intel-MPI

ROMIO, in various forms, provides the MPI-IO implementation for just about every MPI implementation out there.   These implementations incorporate ROMIO’s hints when they pick up our source code, but they also add additional tuning parameters via environment variables.

The Intel MPI library uses ROMIO, but configures the file-system specific drivers a bit differently.   in MPICH, we select which file system drivers to support at compile-time with the –with-file-system configure flag.  These selected drivers are compiled directly into the MPICH library.  Intel-MPI builds its  file-system drivers as loadable modules, and relies on two environment variables to enable and select the drivers


Let’s say you had a Lustre file system, like this fellow on the HDF5 mailing list.  Then you would invoke mpiexec like this:

 mpiexec -env I_MPI_EXTRA_FILESYSTEM on \
        -env I_MPI_EXTRA_FILESYSTEM_LIST lustre -n 2 ./test

I found this information in the Intel MPI library Reference Manual, which contains a ton of other tuning parameters.

(Update 12 May 2015): Intel 5.0.2 and newer have GPFS support.  One would enable it the same way with the I_MPI_EXTRA_FILESYSTEM_LIST

mpiexec -env I_MPI_EXTRA_FILESYSTEM on \

gpfs, intel-mpi, lustre, tuning

Lustre driver story

September 30th, 2010
Comments Off on Lustre driver story

ROMIO has a general-purpose file system driver we call “UFS” (for Unix File System). UFS contains no file-system-specific optimizations: just data sieving and two phase collective buffering.

The generic approach works, in that it gives correct answers, but it has two big problems when writing to Lustre:

  • When assigning the file domains, UFS simply takes the start, the end, and divides evenly over the I/O aggregators. We wrote about Wei-keng’s SC 2008 paper in this area earlier.
  • the collective buffering algorithm will do a read-modify-write if there are any holes or gaps in the request. There is a point (specific to each file system deployment) where data sieving does not win out and e.g. two large writes would be better than a read-modify-write.

We rely on the community to contribute many of the fs-specific drivers (e.g. PanFS, XFS), and through 2009 and 2010 the Lustre community did just that. Weikuan Yu did some early work while he was at ORNL. Sun’s developers contributed more improvements, including an independently-developed version of Wei-keng’s group-cyclic distribution. End-users Martin Pokorny at NRAO and Pascal Deveze at BULL contributed additional testing and patching. As a result, ROMIO ended up with an optimized Lustre driver incorporating optimizations for the two points discussed above.

Lustre users should still let us know how things are going: is collective MPI-IO working well? working poorly? The more community involvement we get, the better we can make things.