diff --git a/README.md b/README.md index 4e6a6c105080e4c561573e2edd8f7465fd1465cd..59783c7c8faa99ebfef070cf98c02b3507edd0b9 100644 --- a/README.md +++ b/README.md @@ -24,10 +24,10 @@ Note: The command is aligned to run on specific nodes by way of arguments to mma A list policy can be executed using `run-submit-pol-job.py` using the following command: ``` bash -run-submit-pol-job.py [-h] [-o OUTDIR] [-f LOG_PREFIX] [--with-dirs] - [-N NODES] [-c CORES] [-p PARTITION] [-t TIME] - [-m MEM_PER_CPU] - device +sudo run-submit-pol-job.py [-h] [-o OUTDIR] [-f LOG_PREFIX] [--with-dirs] + [-N NODES] [-c CORES] [-p PARTITION] [-t TIME] + [-m MEM_PER_CPU] + device ``` - `outdir`: specifies the directory the output log should be saved to. Defaults to `/data/rc/gpfs-policy/data` @@ -125,6 +125,10 @@ All other options control the array job resources. Default values are as follows The default resources can parse 5 million line files in approximately 3 minutes so should cover all common use cases. +For all policies run on filesets in `/data/user`, `/data/project`, `/home`, or `/scratch` will automatically have their "top-level directory" (`tld`) computed and added to the parquet output. This is defined as the directory just under any of those specified filesets. For example, a file with path `/data/project/datascienceteam/example.txt` will have `tld` set to `datascienceteam`. + +Any files in a directory outside those specified filesets will have `tld` set to `None`. + ## Running reports ### Disk usage by top level directies