From 5cc47f5f834584659d774454fcf3fe4a826153a0 Mon Sep 17 00:00:00 2001 From: Matthew K Defenderfer <mdefende@uab.edu> Date: Mon, 16 Sep 2024 17:56:57 -0500 Subject: [PATCH] add info on tld --- README.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 4e6a6c1..59783c7 100644 --- a/README.md +++ b/README.md @@ -24,10 +24,10 @@ Note: The command is aligned to run on specific nodes by way of arguments to mma A list policy can be executed using `run-submit-pol-job.py` using the following command: ``` bash -run-submit-pol-job.py [-h] [-o OUTDIR] [-f LOG_PREFIX] [--with-dirs] - [-N NODES] [-c CORES] [-p PARTITION] [-t TIME] - [-m MEM_PER_CPU] - device +sudo run-submit-pol-job.py [-h] [-o OUTDIR] [-f LOG_PREFIX] [--with-dirs] + [-N NODES] [-c CORES] [-p PARTITION] [-t TIME] + [-m MEM_PER_CPU] + device ``` - `outdir`: specifies the directory the output log should be saved to. Defaults to `/data/rc/gpfs-policy/data` @@ -125,6 +125,10 @@ All other options control the array job resources. Default values are as follows The default resources can parse 5 million line files in approximately 3 minutes so should cover all common use cases. +For all policies run on filesets in `/data/user`, `/data/project`, `/home`, or `/scratch` will automatically have their "top-level directory" (`tld`) computed and added to the parquet output. This is defined as the directory just under any of those specified filesets. For example, a file with path `/data/project/datascienceteam/example.txt` will have `tld` set to `datascienceteam`. + +Any files in a directory outside those specified filesets will have `tld` set to `None`. + ## Running reports ### Disk usage by top level directies -- GitLab