Add CLI functionality for log preprocessing
Prior versions of the package used raw shell scripts to split and convert the raw GPFS log to a usable parquet dataset. Instead, these scripts have been converted to Python submodules to allow use within the Python REPL as well as from the shell CLI. Additionally, each command can submit a separate batch job to perform the processing if desired by the user. This allows for a bit more flexibility in how someone wants to run the preprocessing steps.
Minor changes:
- Added CLI commands
split-log
andconvert-to-parquet
to replacesplit-info-file.sh
andrun-convert-to-parquet.sh
, respectively. - Converted contents of
split-info-file
to be interfaced through Python. The actual processing is still done via bash through the subprocess module - Added options to run each command using either the local compute resources or submitted through a batch job.
- Add imports to
policy
forsplit
,compress_logs
, andconvert
Ancillary:
- Moved
parse_scontrol
fromcompute.utils
toutils
- Removed
create-symlinks.sh
script since CLI functions are defined inpyproject.toml
now
Edited by Matthew K Defenderfer