- Aug 02, 2024
-
-
John-Paul Robinson authored
Create distinct listing that includes all object types on gpfs. Copy of the the list-path-external with the additional option.
-
- Jul 30, 2024
-
-
John-Paul Robinson authored
These nodebooks use dataframes build from parquet files to sanity check file listings from multiple sources.
-
- Jul 26, 2024
-
-
John-Paul Robinson authored
These are use to create specific reports by opening the notebook then copying it and modifying the parameters to a specific policy run data set. Their utility may be limited based on current parquet pipelines.
-
John-Paul Robinson authored
-
John-Paul Robinson authored
This is intended to be run on URL encoded output lines from a gpfs list policy run. It creates panda structures that are then saved as parquet format for ease of downstream processing. Can be run in parallel across many inputs by wrapping with papermill and have upstream split the input file.
-
John-Paul Robinson authored
Makes it slightly more resilient and forms the foundation for the parquet notebook.
-
John-Paul Robinson authored
Parameters hard coded to split on 50k enteries per file which provides reasonable parallel read efficiency for downstream tasks. Put splits in a .d subdir of file name.
-
- Jul 25, 2024
-
-
Mike Hanby authored
-
- May 22, 2024
-
-
John-Paul Robinson authored
Provide details on running a policy script with the provided workflow framework. Information on the list-policy-external policy and generated output. Outline of future docs for preprocessing and reports.
-
- Jun 30, 2023
-
-
John-Paul Robinson authored
Resolve "create script wrappers to run notebooks in parallel" Closes #5 See merge request !7
-
John-Paul Robinson authored
-
John-Paul Robinson authored
-
- Dec 03, 2022
-
-
John-Paul Robinson authored
Resolve "Create report to aggregate stored data by year of last access" Closes #4 See merge request !6
-
John-Paul authored
The report accepts a top-level-dir pattern and reports aggregation stats for files below that level
-
John-Paul Robinson authored
Merge branch '3-create-a-pickling-script-to-parse-raw-text-into-pickled-pandas-dataframes' into 'main' Create notebook to pickle list policy output files Closes #3 See merge request !5
-
John-Paul authored
This simplifies later report running to consolidating parsing and dataframe create operations into a single batch.
-
- Dec 02, 2022
-
-
John-Paul Robinson authored
-
John-Paul Robinson authored
Takes a directory of atime reports generated by array job and merges into a single per-user report.
-
John-Paul Robinson authored
The report atime generator reads a provided input and generates a per-user atime report. The wrapper sbatch allows running the script in an array job to support scaling across large data sets split into many files.
-
John-Paul Robinson authored
Figure out logic to parse list files and produce atime summaries.
-
John-Paul Robinson authored
-
John-Paul Robinson authored
Feat parsable file names Closes #2 See merge request !2
-
John-Paul Robinson authored
The defer option with a list generates the list file outputs according to the format described here: https://www.ibm.com/docs/en/spectrum-scale/4.2.0?topic=pools-file-list-format The file and ouput args control the location of the resulting file.
-
John-Paul Robinson authored
Pass the output are to the sbatch via env params.
-
John-Paul Robinson authored
Remove the hard-coded external script both because it is not needed and because defer mode produces list output files directly.
-
- Aug 29, 2022
-
-
John-Paul Robinson authored
The external script will be called with TEST parameter if the policy is executed in test or prepare mode. It will only get the LIST files to concat the ouput when the policy is fully executed.
-
John-Paul Robinson authored
The policy file has both a LIST generation rule and a EXTERNAL LIST consumer rule. It uses the ESCAPE term to URL encode the ouput generated by the LIST. The external target script simply concats LIST files sent to it. The mmapplypolicy command adds an additonal variable JOBID to allow that to be passed as an OPT to the external script. This let's us distinguish the ouput of one job from the next.
-
John-Paul Robinson authored
The mmpolicyapply list output file naming is not well documented. Add job-specific subdir for global and scratch paths to isolate expected files from a job. Rename final tagged result and move it to the provided global output dir and clean up the temporary dir after file completes. Include potential for no output if there are zero matches.
-
John-Paul Robinson authored
The FILEPATH variable limits included files to only the provided variable to the policy.
-
John-Paul Robinson authored
Add variable arg to mmapplypol to pass the filesystem path as FILEPATH. The supports policies that want to use this variable in their rules or other parts of the policy file. It is a strict string substitution.
-
- Aug 25, 2022
-
-
John-Paul Robinson authored
Add column and update logic to add username values from the the system password database.
-
John-Paul Robinson authored
-
John-Paul Robinson authored
Limit field splitting to one equals sign to separate just the field name and avoid splitting values with equals signs embedded. Add explict unix line terminator to avoid line breaks on carriage-returns that may be embedding in field values.
-
John-Paul Robinson authored
-
- Apr 05, 2022
-
-
John-Paul Robinson authored
-
- Mar 14, 2022
-
-
John-Paul Robinson authored
New eigth parameter for run time of job request.
-
John-Paul Robinson authored
This allows policy files for any gpfs filesystem to be run by the run scripts by using a variable for filesystem rather than hard coding. Keep default value set to scratch for compatability with existing callers. Now can include a seventh arg to invoke policy file on any gpfs filesystem.
-
- Feb 28, 2022
-
-
John-Paul Robinson authored
-
John-Paul Robinson authored
-
John-Paul Robinson authored
Resolve "LIST parameter is used incorrectly as a file tag" Closes #1 See merge request !1
-