@@ -88,6 +88,37 @@ In order to access the container, you will need to add a personal access token f
...
@@ -88,6 +88,37 @@ In order to access the container, you will need to add a personal access token f
### Pre-parse output for Python
### Pre-parse output for Python
### Parallel Transfer Using s5cmd
In cases where a large amount of data needs to be offloaded from GPFS to LTS, Globus is not sufficiently performant. Instead, the `s5cmd` parallel transfer tool should be used. Scripts for this purpose are located in `transfer-gpfs-with-s5cmd`. The shell script reads a formatted GPFS parquet dataset and finds the files located in a given directory. Those files are divided into groups, and a throttled array job is submitted where each task transfers each batch.
This script uses the `gpfs-policy` container so no environment setup is needed. An AWS CLI credentials file is required. The default location is in `${HOME}/.aws/credentials` and has the following form:
```
# Default profile #
[default]
aws_access_key_id = <lts_access_key>
aws_secret_access_key = <lts_secret_key>
```
More than 1 profile can be added to the same credentials file as long as the profile names in `[]` are unique. The `default` profile is used unless specified otherwise.