Prior versions of the package used raw shell scripts to split and convert the raw GPFS log to a usable parquet dataset. Instead, these scripts have been converted to Python submodules to allow use within the Python REPL as well as from the shell CLI. Additionally, each command can submit a separate batch job to perform the processing if desired by the user. This allows for a bit more flexibility in how someone wants to run the preprocessing steps.
Minor changes:
split-log
and convert-to-parquet
to replace split-info-file.sh
and run-convert-to-parquet.sh
, respectively.split-info-file
to be interfaced through Python. The actual processing is still done via bash through the subprocess modulepolicy
for split
, compress_logs
, and convert
Ancillary:
parse_scontrol
from compute.utils
to utils
create-symlinks.sh
script since CLI functions are defined in pyproject.toml
now