Resructure data dir and policy output names
The current naming scheme and structure for output policy files is somewhat confusing coming in. Most of the interaction we have had in the past with these data have been in the symlink directories with naming scheme list-policy_<device>_<date>
that point to a directory already containing the chunked and parquet-converted policy data. However, the naming scheme for the raw policy logs is not similar to the previous directory name making it difficult to orient when starting from the initial policy run step. Having essentially 3 entries in the data
directory for each policy run increases clutter (the initial policy log, the directory with the chunks, and the symlink pointing to the chunk dir). Instead, I propose we organize the directory to where a single subdirectory contains all relevant files for each policy run. The directory name would be descriptive of the type of policy, the device the policy was applied to, and the corresponding job ID and run datetime. The raw policy log would be named similarly and stored in the top level of the subdirectory. The split parquet dataset would be given its own subdirectory at the same level of the policy log. See below for an example.
/data/rc/gpfs-policy/data/
└── list-policy_<job_id>_<device>_%Y%m%dT%H%M%S_<policy_type>/
├── list-policy_<job_id>_<device>_%Y%m%dT%H%M%S_<policy_type>.list.gather-info.gz
├── [gz-chunks]
└── parquet/
├── list-000.parquet
├── list-001.parquet
└── ...
This would necessitate multiple changes for run-mmpol.sh
. An initial look suggests the following:
- Probably converting to
getopt
to pass options instead of relying on environment variable inheritance. While not necessary for the restructuring, it would improve clarity - Need to actually apply the file tag. The current output log only has the job ID as an identifier (ex.
list-29582179.list.gather-info
). I don't see anything resembling the tag in the file names in/data/rc/gpfs-policy/data
.- It's verified the
mv
command in line 57 is not being run. See the end of/data/rc/list-gpfs-dirs/src/run-policy/out/pol-29582179-list-path-external-scratch.out
where it only saysoutfile=
and[[ '' != '' ]]
. If anything was assigned tooutfile
, it would appear in the log.
- It's verified the
- No idea what
LIST_OUTPUT_FILE
is referring to since that string doesn't appear in thelist-path-external
orlist-path-dirplus
policy definitions.-M
is just a string replacement in the policy definition based on what's passed to it. Not sure that line is doing anything - Need to check
mmapplypolicy
to see exactly how to get the name of the log file. If that's not possible, can just continue to use the current bones and then perform all of the renaming and organization after the fact.