The goal of the CODES project is use highly parallel simulation to explore the design of exascale storage architectures and distributed data-intensive science facilities.
Increasingly, science endeavors rely heavily on data management, analysis, and storage as part of the discovery process. To serve large communities of scientists, complex systems and instruments are deployed across multiple institutions to manage and analyze data produced from experiments, observational platforms, and computational simulation. Evaluating designs and coordinating deployment and operation of such a virtual data facility poses a significant challenge. An ability to simulate these environments would transform the approach taken to design, procurement, tuning, and upgrade of these facilities.
Our simulations build upon the Rensselaer Optimistic Simulation System (ROSS), a discrete event simulation framework that allows simulations to be run in parallel, decreasing the simulation run time of massive simulations to hours. We are using ROSS to explore topics including large-scale storage systems, I/O workloads, HPC network fabrics, distributed science systems, and data-intensive computation environments.
The CODES project is a collaboration between the Mathematics and Computer Science department at Argonne National Laboratory and Rensselaer Polytechnic Institute. We also collaborate with Lawrence Livermore National Laboratory for modeling HPC interconnect systems.
We are happy to announce the release of CODES version 1.1.0. This release comes after 6 months of the last release (1.0.0). The major updates and features include:
Addition of a separate 1-D dragonfly network model in addition to the customized version of dragonfly network (dragonfly-dally.C). Multiple routing protocols including progressive adaptive and adaptive routing are supported.
Quality of Service support with 1-D dragonfly and megafly network models. More details about using QoS can be found on the wiki page
Addition of time-stepped series instrumentation data for the layer that simulates MPI operations and protocols.
replace call for deprecated MPI_Type_hindexed
Fixed bug with mapping context which caused issues with simulating multiple ranks per node
Support for out of tree builds with megafly and 1-D dragonfly tests
Bug fix with on node and two level fat tree
Features in progress:
support for conceptual
workloads — partial support in the repo. coNCePTuaL is a domain specific language/framework for auto-generating benchmarks to measure the performance and correctness of networks.
Thank-you all for attending the 2018 CODES workshop and making it a success! This year we had 27 participants from 11 different institutions including Rensselaer, Argonne, Intel corporation, Lawrence Livermore, Lawrence Berkeley, Illinois Inst. Of Tech, Univ. of California-Davis, Florida State Univ., Rice Univ., Florida International Univ. and Univ. of Oregon. On day 1, we had a number of amazing talks by the model developers and users and on day 2, we had a tutorial about visualizing performance data from HPC simulations. Slides have been posted on the workshop website: http://press3.mcs.anl.gov/summerofcodes2018/workshop-proceedings/
We are happy to announce the release of CODES version 1.0.0. This release comes after an year of the last release (0.6.0). The major updates and features include:
- Adding support for dragonfly-plus network model. Multiple forms of routing (progressive adaptive, minimal, non-minimal-spine and leaf) have been implemented. https://xgitlab.cels.anl.gov/codes/codes/wikis/dragonfly-plus
- Adding support for express mesh network model, which can be configured as hyperX.
- Adding support for Multi-plane/rail in fat-tree via multiple single port NICs per compute node or one multi-port NIC per node.
- Adding a generic template for building new network models. For simplest case, only 2 functions and premable changes should suffice to add a new network. Updated Express Mesh network model to serve as an example. For details, see src/networks/model-net/net-template.C
- Darshan workload generator has been updated to use Darshan version 3.x.
- Network models have been updated to capture simulation statistics over virtual time using ROSS/CODES instrumentation. For details, see: https://xgitlab.cels.anl.gov/codes/codes/wikis/Using-ROSS-Instrumentation-with-CODES
- Compatible with ROSS version that enables statistics collection of simulation performance. For details see: http://carothersc.github.io/ROSS/instrumentation/instrumentation.html
- Online workload replay functionality has been added that allows SWM workloads to be simulated insitu on the network models. WIP to integrate Conceptual domain specific language for network communication.
- Multiple traffic patterns were added in the background traffic generation including stencil, all-to-all and random permutation.
- Performance tuning enabled for optimistic mode. For details, see: https://xgitlab.cels.anl.gov/codes/codes/wikis/Optimistic-Performance-Tuning-Tips
The release is available for download here.
Thanks everyone for attending the 3rd summer of CODES workshop and making it a success! This year we had 28 participants from 10 different institutions. We had a number of amazing talks by the users on their work with CODES, ROSS and TraceR and on day 2, we had a tutorial about using CODES/TraceR. The slides have been posted at https://press3.mcs.anl.gov/summerofcodes2017/workshop-proceedings/
We are happy to announce the recent release of CODES version 0.6.0! This release comes after an year since the last release so there have been significant changes and additions. We are listing the major changes here (See docs/RELEASE_NOTES for full list of changes):
- C++ models can now be built and integrated with CODES. The new dragonfly model
has been implemented in C++.
- CODES can now replay collective operations by using CoRTex — a library for translating collectives to point to point operations.
- Dragonfly network model based on Cray XC topology has been added. The model can
use the network configurations of Theta and Edison systems. Custom network
configurations can be generated using C scripts.
- Fat tree network model with support for adaptive and static routing has been
added. The model can support both full and pruned fat tree configurations.
- Test suite has been extended — tests for DUMPI trace replay have been added.
- MPI rendezvous protocol can now be replayed in addition to the eager protocol.
The transition point for switching between the two protocols is configurable.
- Background network communication using uniform random workload can now be
generated. The traffic generation gets automatically shut off when the main workload
- Compatible with the most recent ROSS version that has GVT/real time sampling enabled.
The release can be downloaded from http://ftp.mcs.anl.gov/pub/CODES/releases/codes-0.6.0.tar.gz
We had a number of CODES related activities at Supercomputing this year! The research work presented used the CODES/ROSS framework to evaluate HPC network and storage systems, study application interference, enable efficient collective communication, and perform visual analysis of simulations. Details can be found at CODES@SC16.
Thanks everyone for attending this year’s Summer of CODES workshop! We had a great attendance and excellent talks. Slides will be posted in the coming days as I receive them, at the workshop website, http://press3.mcs.anl.gov/summerofcodes2016/.
I am happy to announce the release of CODES 0.5.2! This release, prepared during this year’s Summer of CODES workshop (post incoming), is primarily a bug-fix release that includes a few minor API additions and updates to the latest ROSS API. Full release notes can be found at doc/RELEASE_NOTES in the release tarball.
Downloads can be found at http://www.mcs.anl.gov/projects/codes/downloads/.
We are happy to announce the release of CODES 0.5.1! This release is a hotfix for latency calculation on the dragonfly, along with a few trivia missed in the past release. Downloads can be found at http://www.mcs.anl.gov/projects/codes/downloads/.
We are happy to announce the release of CODES 0.5.0! It’s been a full year since the last content release and there have been a massive set of changes! See doc/RELEASE_NOTES for a full list – this post covers a few of the major ones. Downloads can be found at http://press3.mcs.anl.gov/codes/downloads/
- codes-base and codes-net have been merged, greatly simplifying the build process. The new repository is at https://xgitlab.cels.anl.gov/codes/codes.
- we’ve added a Slim Fly network topology simulator, corresponding to the Wolfe et al. paper “Modeling a Million-node Slim Fly Network using Parallel Discrete-event Simulation” at SIGSIM-PADS’16.
- the dragonfly and torus networks have seen many improvements, including updates to the credit-based flow control mechanism, in-depth data gathering on both terminals and routers, and the ability to periodically sample the terminal/router states for gathering time-series data.
- workload processing (and the MPI simulation layer) have been greatly enhanced, improving task to LP mapping flexibility and allowing concurrently running workloads.
- a number of new APIs, including a “mapping context” construct for building more flexible implicit LP->LP mappings (now used in model-net, the local storage LP, and the resource LP) and a mechanism for RPC/callback-oriented event control flow (see the resource LP API and implementation for how this works).
Feedback is welcome!