The Small Linux for Big Computers


The ZeptoOS project consists of a number of components and subprojects.  An overview of major ZeptoOS areas of activity is provided below.

ZeptoOS Compute Node Linux

Linux kernel is a popular choice for many uses these days; using it for high performance computing (HPC) is also common. However, Linux kernel has been designed for general-purpose multitasking environment and has some performance issues for HPC usage. ZeptoOS team addresses such issues to improve the usability of Linux kernel in HPC. One of the big issues in Linux that we have addressed is memory access overhead of using virtual memory. At least on some architectures, virtual memory can dramatically slow down memory access. We introduced in ZeptoOS Compute Node Linux a special memory region called big memory, which successfully gets rid of this memory overhead. We continue to study and improve the Linux kernel for HPC usage and release our improvements in the ZeptoOS Compute Node Linux.

BGP Communication Software Stack

HPC applications require communication libraries such as MPI. We have ported BGP communication software stack from IBM compute node kernel (CNK) environment.  Due to major differences between the operating system kernels, porting was not easy. We maintain Linux kernel drivers for BGP-specific hardware and user space codes (part of BGP SPI and DCMF) to support ZeptoOS Compute Node Linux.


Petascale architectures decompose functions across multiple kinds of nodes. Compute nodes cannot do everything on their own – they need to delegate some system calls and file or I/O operations to specialized I/O nodes and management nodes. ZOID is an open source function call forwarding software that can be optimized for collective behavior and adjustable consistency semantics.

On Blue Gene, ZOID acts as a functional replacement for IBM’s CIOD when using the ZeptoOS Compute Node Linux, in some cases offering significant performance improvements thanks to its high-performance, multithreaded architecture. ZOID is also easily extensible, making it possible to forward custom function calls between compute nodes and I/O nodes. This capability can be used for, e.g., real-time data streaming.

The Selfish Benchmark Suite

Massively parallel computers use compute node operating systems that are either special purpose light-weight kernels or mostly commodity kernels that have been downsized to reduce extraneous activity. For operating systems that support multi-processing or interrupts, the user’s application may share the CPU cycles with other processes, kernel tasks, and device drivers. From the user’s perspective, any CPU cycles diverted from their application reduce the maximum achievable performance and processor efficiency. We call these detours. In some cases, they can dramatically affect the performance of collective operations.

The Selfish Benchmark Suite is designed to measure the detours, the fraction of time the CPU spends executing instructions not part of the user’s application. The info can be “recorded” and played back, inserting detours into an application to explore system performance.


Kernels need Tuning and Analysis Utilites just like user codes. The TAU tooklit from University of Oregon provides tracing, profiling, and all sorts of other tools for parallel HPC programs, and the KTAU extention takes those tools all the way down to the Linux kernel, linking user-level performance data with kernel performance. Users can now peer through their application down to the operating system, and see what is happening.