Automate conversion of GPFS policy outputs to parquet without Jupyter
Compare changes
- Matthew K Defenderfer authored
+ 18
− 0
@@ -65,6 +65,24 @@ The ouput file is an unsorted list of files in uncompressed ASCII. Further proc
Processing GPFS log outputs is controlled by the `run-convert-to-parquet.sh` script and assumes the GPFS log has been split into a number of files of the form `list-XXX.gz` where `XXX` is an incrementing numeric index. This creates an array job where each task in the array reads the quoted text in one file, parses it into a dataframe, and exports it as a parquet file with the name `list-XXX.parquet`.