Updated documentation

The latest Darshan documentation can be found at the documentation link above.  We’ve made several improvements to the documentation, including “recipes” to help get started on various systems including Blue Gene, Cray, and Linux clusters using MPICH, OpenMPI, or Intel MPI.

Darshan 2.1.2 Release

Darshan 2.1.2 is a minor bug fix release to improve error handling in cases where Darshan is unable to write a log file.
Changelog:

  • improved error handling when writing log files.  If a write fails on any process then the log file will be deleted and a warning will be printed to stderr.

Darshan Data Repository now online

We are pleased to announce the public release of the Darshan Data Repository.  The Darshan Data Repository is a collection of anonymized I/O characterization data captured from production systems. The first data set to be made available covers three months of activity (as recorded by Darshan) from the Intrepid Blue Gene/P system at the Argonne Leadership Computing Facility.  We hope to add more data in the future.  See the Darshan Publications page for examples of analysis that can be performed with this data.

Darshan 2.1.1 Release

This release includes performance and bug fixes.  It also includes a new utility to convert Darshan log files, while also optionally anonymizing them or re-compressing them in bzip2 format.
Changelog:

  • new darshan-convert command line utility for converting existing log files, with optional anonymization and optional bzip2 compression
  • bzip2 support in command line utilities (but not in the darshan library itself)
  • updated log file format that allows for string key/value pairs to be stored in the header
  • added ability to set MPI-IO hints when writing darshan log
    • at configure time: –with-log-hints
    • at run time: DARSHAN_LOGHINTS environment variable
  • bug fix contributed by Sandra Schröder: use case-insensitive search for MPI symbols in Fortran wrapper script
  • performance bug fix: remove unecessary call to MPI_File_set_size when writing log
  • added –with-logpath-by-env configure option to allow absolute log path to be specified via environment variable

Best Paper Award, MSST 2011

A paper featuring Darshan (“Understanding and Improving Computational Science Storage Access through Continuous Characterization“) was awarded Best Paper at the 27th IEEE (MSST 2011) Symposium on Massive Storage Systems and Technologies.  The paper outlines a methodology for characterizing a large scale production workload and presents a 2 month study of I/O activity on the Intrepid Blue Gene/P system at Argonne National Laboratory.

Darshan 2.1.0 Release

This release primarily enhances portability and adds the option to use LD_PRELOAD for instrumentation rather than link time wrappers. This release does not add any new instrumentation or change the log file format.
Downloads Page
ChangeLog

  • additional environment variables to control log location, jobid and alignment parameters
  • additional configure tests to improve portability
  • bug fixes for darshan-parser –perf calculations
  • support for MPI1.x
  • support for OpenMPI
  • support for PGI and Intel compilers
  • new libdarshan.so dynamic library for use with LD_PRELOAD

Darshan 2.0.2 release

Changelog:

  • added a random identifier to job logs (to avoid collisions from multiple application instances within a single scheduler job)
  • improved installation and library path management for darshan-job-summary.pl
  • improved error handling in darshan-job-summary.pl
  • additional derived statistics categories for darshan-parser output:
    • ––all   : all sub-options are enabled
    • ––base  : darshan log field data [default]
    • ––file  : total file counts
    • ––perf  : derived perf data
    • ––total : aggregated darshan field data

Darshan 2.0.1 release

Changelog:

  • bug fix to variance/minimum calculations on shared files
  • switch to automatic generation of all MPI compiler scripts using darshan-gen-* tools
  • new run time environment variable: DARSHAN_INTERNAL_TIMING. If set at job execution time, it will cause Darshan to time its own internal data aggregation routines and print the results to stdout at rank 0.

Darshan 2.0.0 release

The Darshan 2.0.0 release is now available for download.  From a user’s perspective, the biggest difference is that you no longer have to run darshan-parser before darshan-job-summary.pl if you just want to see the summary report for a job.  The darshan-job-summary.pl script operates directly on the binary .gz files now.  We also introduced new characterization counters as well as some additional tables in the summary view.  Here is the full change log:
Changelog:

  • new output file format that is portable across architectures (NOTE: Darshan 1.x output files are incompatible with the tools in this release unless they were generated on a ppc32 architecture (Blue Gene))
  • 8 new counters that record the rank of the fastest and slowest process that opened each shared file, along with the number of seconds and number of bytes consumed by those processes.  It also reports the variance in both time and amount of data.
  • new –with-jobid-env configure argument to support recording job identifiers from different schedulers
  • job ID is now recorded within the Darshan log in addition to in the file name
  • darshan-job-summary.pl:
    • opens output files directly without using intermediate darshan-parser output
    • table showing data usage per file system
    • table showing I/O variance in shared files
  • fixes for bugs reported by Noah Watkins:
    • avoid name collision in hashing function
    • divide by zero error in darshan-job-summary.pl
Darshan 2.0.0 is now available on the download page.