Skip to content

Add CLI functionality for log preprocessing

Prior versions of the package used raw shell scripts to split and convert the raw GPFS log to a usable parquet dataset. Instead, these scripts have been converted to Python submodules to allow use within the Python REPL as well as from the shell CLI. Additionally, each command can submit a separate batch job to perform the processing if desired by the user. This allows for a bit more flexibility in how someone wants to run the preprocessing steps.

Minor changes:

  • Added CLI commands split-log and convert-to-parquet to replace split-info-file.sh and run-convert-to-parquet.sh, respectively.
  • Converted contents of split-info-file to be interfaced through Python. The actual processing is still done via bash through the subprocess module
  • Added options to run each command using either the local compute resources or submitted through a batch job.
  • Add imports to policy for split, compress_logs, and convert

Ancillary:

  • Moved parse_scontrol from compute.utils to utils
  • Removed create-symlinks.sh script since CLI functions are defined in pyproject.toml now
Edited by Matthew K Defenderfer

Merge request reports

Loading