HPC Edge Services

HEP/HPC Edge Services

The CCE is actively involved in developing a common Edge Service interface to which HEP experiments and users can connect and run jobs on DOE HPC resources. The current code stack can be found on our GITHUB REPO. This work was presented at CHEP2015 (PRESENTATION) and ICHEP2016 (PRESENTATION) .

The Edge Service relies on two component services, Argo & Balsam. Both services are written as light-weight python apps built on top of the Django framework. Balsam runs on the resource to which jobs will be submitted such as a supercomputer or a HTCondor cluster. The scheduler interaction is abstracted as a plugin for different schedulers, e.g. SLURM, HTCondor, Cobalt. Argo can run anywhere, but is typically placed outside of the resources being used. Users submit a job or sequential jobs to Argo via a message queue system. Argo then submits the jobs to the destination resource using message queues to communicate with the local instance of Balsam. Argo pulls data in from the users if specified, and Balsam pulls/pushs data in and out of the local resource from Argo’s storage.

A diagram of the HPC Edge Service.
A diagram of the HPC Edge Service.

There are a number of places where security was considered. First, nothing the user provides via the job submission or data transfer is ever executed on the command line. This avoids command injection or bringing in rogue code. All applications must be preloaded on the local resources and then registered in a DB with Balsam. If the application specified by the user is not in this DB then the job fails. Second, All message queues are secured using key/cert pairs to ensure no rouge job submission is allowed. Only those allowed to access the message queues can submit jobs.

An HEP Collision Point