Petrel Data Service Pilot
The Petrel Data Service pilot provides a mechanism for Argonne researchers and ALCF users to store their data and trivially share with collaborators, without the burden of local account management. This system has been developed and is being operated via collaboration between Argonne Leadership Computing Facility and Globus team. Petrel leverages storage and infrastructure provided by the ALCF, and Globus Transfer and Sharing services for researchers to store large research data, move data in and out of the system, and to make (subsets) of the data available to their collaborators.Note that there is no compute associated with this – the data will need to be staged to a compute machine for further analysis.This document outlines the steps to create and manage an allocation on the Petrel Data Service.
Setting up a new project
Allocation Process and Model
Any Argonne PI can apply for a allocation on the Petrel project by providing some basic information on the project. Once approved, a project space will be created for the PI and they will be invited to join a project group to manage the allocation. The project group can be used by the PI to assign project manager roles to other users. A project manager has access to the project space and can allow other users read or read/write access to the project space.
In addition to the above, the PI and project managers can choose subsets of the project space and give access to other users on the subset.
New Allocation Request
To request a new project, a PI needs to apply for it using the following steps:
- Click here to request a allocation. If you don’t have a Globus account, you will need to Sign Up for a Globus account.
- Choose the option to “Join the Group” and complete the form.
- The “Current Project Name” you provide will be used as names for the various project allocation.
- Once you submit your request, you will receive a confirmation email that your request has been received.
- Your request will be evaluated and you will be notified whether your application has been accepted or rejected. If accepted, you will added to a group and can proceed with next steps.
New Project Setup
Once your allocation has been approved, you will receive an invite to another group that you will have rights to manage, and will need to complete the following steps to setup your project
- You will receive a project group invitation and will need to accept it.
- You will then be able to login to Globus to check your project endpoint. A name of the endpoint will be petrel#<projectname>. E.g., for the ‘speedpage’ user who named his project, ‘testbed’, a name of a default project shared endpoint is petrel#testbed.
- Note: if your project name had whitespaces, they will be removed, and the string will be converted to lower case.
Adding new Managers
To add new managers, invite user to the <projectname> group via email address. You can do that using the “Members” tab on your group’s page. Once the user accepts the invite and is added to the group, you can grant the “Manager” role by clicking the pen icon on the right side of the user row and selecting the “Manager” radio button. A user with the “Manager” role can invite other users to the project group, accept membership requests, and share data from the project space with other users and groups.
Sharing data from Petrel
Sharing all of your project space other users
To share all of your project space with other users, use the following steps:
- Open https://app.globus.org/groups in your browser and find your project group.
- In the “Members” tab click “Invite people to this group” and invite Globus users you would like to share your entire project space.
Sharing some subset of your project space with other users or groups
To share selected folders with other users, use the following steps:
- Open https://app.globus.org/transfer, select the petrel#<projectname> endpoint.
- Click “permissions” on the menu in the right corner of the pane.
- Enter or browse a path and a user/group you want to share the path with.
Transferring Data to and from your project space
Using the Globus website:
To transfer data from/to the project space, go to https://app.globus.org/transfer, select the petrel#<projectname> endpoint from the drop down menu on one side of the window, choose another endpoint from the other drop down, you would like to transfer data to/from, and submit the transfer by pressing one of the two large arrow buttons in the center of the screen.
Using a command line script
Script let’s the user transfer or synchronize between two endpoints. It can be used in two modes:
- batch mode: runs in the background. can be used to setup a cron job to keep two directories in sync.
- interactive mode: user can submit transfers.
- Download the mirror script tarball from here:
wget https://s3-us-west2.amazonaws.com/mirror.script/mirror.tgzcurl -Ov https://s3-us-west-2.amazonaws.com/mirror.script/mirror.tgz
- After downloading the mirror script tarball, you will need to unpack it:
tar xf mirror.tgz
- After that, change directory into the mirror script package directory:
- Change permission on script:
chmod +x mirror.py
This section discusses using the script to keep two directories in sync across two endpoints. Specific use case maybe data collected at one endpoint needs to be transferred to another. The script can be setup as a cron job, at desired frequency depending on delay that can be tolerated and the directories will be kept in sync.
To prepare the script to be run as a cron/batch job, you’ll first need to run it in batch_setup mode. This will prompt you for the details needed to run the transfer job(s) you wish to submit and will save the config information so it can be used by the script when it is run in batch_start mode:./mirror.py –batch_setup
Follow the instructions and your config will be created for your transfer job(s).Examples:[ranantha@rachanalaptop:~/MyFolders/work/petrel/mirror] ./mirror.py –batch_setupEnter Globus Credentials:Enter Username: rananthaEnter Password:Enter source endpoint name: go#ep1Enter source endpoint path: fooEnter destination endpoint name: go#ep2Enter destination endpoint path:Enter transfer deadline in hours (enter ‘0’ for none): 0Add another transfer (y/n): nToken saved to: /Users/ranantha/.globusonline/mirror/mirror_token_fileConfig saved to: /Users/ranantha/.globusonline/mirror/mirror.cfg[ranantha@rachanalaptop:~/MyFolders/work/petrel/mirror]
You can then submit the transfer job(s) you’ve configured by running the script in batch_start mode:./mirror.py –batch_start
This can be setup as a cron task or a windows ?? at interval of choice to keep in sync.
You can monitor the progress of your job by logging into the Globus website and going to this URL:https://app.globus.org/activity
The script also has an interactive mode that you can use for testing your job configuration. Interactive mode will prompt you for all the same values that config mode does but, unlike batch_setup mode, will actually submit the job and will *not* save any of the config data you’ve entered into the config files.
Transfers, transfer errors, and other data are logged to the ~/.globusonline/mirror/mirror.log file.