Skip to content
Snippets Groups Projects

Draft: Partition parquet dataset for sync with s5cmd

Closed Matthew K Defenderfer requested to merge partition-parquet-dataset into main
1 file
+ 23
0
Compare changes
  • Side-by-side
  • Inline
+ 23
0
@@ -61,6 +61,29 @@ The ouput file is an unsorted list of files in uncompressed ASCII. Further proc
## Processing the output file
### Gitlab Container Registry Authentication
Some automated tools in this repository use a container for processing to simplify environment management. The container itself is built using the spec in ./Dockerfile and available at `gitlab.rc.uab.edu:4567/<group>/gpfs-policy`. The shell scripts will automatically download this container if it is not detected in the working directory.
In order to access the container, you will need to add a personal access token from your Gitlab account and register it to Singularity since the repository is not public. Do the following:
1. Go to `gitlab.rc.uab.edu` and sign in with your BlazerID and password
2. Click your user icon in the top left and select `Preferences`
3. Select `Access Tokens` in the left menu
4. Click `Add New Token`
1. Give the token a name (i.e. `Cheaha Singularity`)
2. Remove the expiration date. It will default to one year from registration
3. Add the `read_registry` scope
4. Select `Create personal access token`
5. In the green box, click the button to copy the token
6. On Cheaha, create a text file in a secure spot and paste the token there.
7. Run `cat <token_file> | singularity registry login -u $USER --password-stdin docker://gitlab.rc.uab.edu:4567`
1. If it succeeds, you will see a message similar to `INFO: Token stored in $HOME/.singularity/docker-config.json`
8. Delete the text file with the access token
9. Test access by pulling the container. For example:
- `singularity pull gpfs.sif docker://gitlab.rc.uab.edu:4567/<owner>/gpfs-policy:latest`
- Replace `<owner>` with the owner of the repo you're pulling from. Containers built in forks of a repository will have their containers in the forked registry
### Split and compress
### Pre-parse output for Python
Loading