- Jul 26, 2024
-
-
John-Paul Robinson authored
This is intended to be run on URL encoded output lines from a gpfs list policy run. It creates panda structures that are then saved as parquet format for ease of downstream processing. Can be run in parallel across many inputs by wrapping with papermill and have upstream split the input file.
-
John-Paul Robinson authored
Makes it slightly more resilient and forms the foundation for the parquet notebook.
-
- Dec 03, 2022
-
-
John-Paul authored
This simplifies later report running to consolidating parsing and dataframe create operations into a single batch.
-