- Aug 31, 2024
-
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
set the policy file to either list-path-external or list-path-dirplus, then pass as a path to submit-pol-job
-
Matthew K Defenderfer authored
add option to run any policy file so Ops is not restricted to only list-path-external and list-path-dirplus
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
- Aug 30, 2024
-
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
- Aug 28, 2024
-
-
John-Paul Robinson authored
Automate conversion of GPFS policy outputs to parquet without Jupyter See merge request !8
-
- Aug 21, 2024
-
-
Matthew K Defenderfer authored
-
- Aug 20, 2024
-
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
- Aug 15, 2024
-
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
convert to running from SIF container instead of bare metal conda env. add option to specify container
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
Matthew K Defenderfer authored
-
- Aug 14, 2024
-
-
Matthew K Defenderfer authored
-
- Aug 02, 2024
-
-
John-Paul Robinson authored
Create distinct listing that includes all object types on gpfs. Copy of the the list-path-external with the additional option.
-
- Jul 30, 2024
-
-
John-Paul Robinson authored
These nodebooks use dataframes build from parquet files to sanity check file listings from multiple sources.
-
- Jul 26, 2024
-
-
John-Paul Robinson authored
These are use to create specific reports by opening the notebook then copying it and modifying the parameters to a specific policy run data set. Their utility may be limited based on current parquet pipelines.
-
John-Paul Robinson authored
-
John-Paul Robinson authored
This is intended to be run on URL encoded output lines from a gpfs list policy run. It creates panda structures that are then saved as parquet format for ease of downstream processing. Can be run in parallel across many inputs by wrapping with papermill and have upstream split the input file.
-
John-Paul Robinson authored
Makes it slightly more resilient and forms the foundation for the parquet notebook.
-
John-Paul Robinson authored
Parameters hard coded to split on 50k enteries per file which provides reasonable parallel read efficiency for downstream tasks. Put splits in a .d subdir of file name.
-
- Jul 25, 2024
-
-
Mike Hanby authored
-
- May 22, 2024
-
-
John-Paul Robinson authored
Provide details on running a policy script with the provided workflow framework. Information on the list-policy-external policy and generated output. Outline of future docs for preprocessing and reports.
-
- Jun 30, 2023
-
-
John-Paul Robinson authored
Resolve "create script wrappers to run notebooks in parallel" Closes #5 See merge request !7
-
John-Paul Robinson authored
-
John-Paul Robinson authored
-
- Dec 03, 2022
-
-
John-Paul Robinson authored
Resolve "Create report to aggregate stored data by year of last access" Closes #4 See merge request !6
-
John-Paul authored
The report accepts a top-level-dir pattern and reports aggregation stats for files below that level
-
John-Paul Robinson authored
Merge branch '3-create-a-pickling-script-to-parse-raw-text-into-pickled-pandas-dataframes' into 'main' Create notebook to pickle list policy output files Closes #3 See merge request !5
-