Skip to content
Snippets Groups Projects
README.md 5.87 KiB

Information Lifecycle Managment (ILM) via GPFS policy engine

The GPFS policy engine is well described in this white paper. A good presentation overview of the policy file is here. The relavent documentation is available from IBM.

This project focuses on scheduled execution of lifecyle policies to gather and process data about file system objects and issue actions against those objects based on policy.

Running a policy

A policy is executed in the context of a SLURM batch job reservation using the submit-pol-job script:

submit-pol-job <outdir> <policy> <nodecount> <corespernode> <ram> <partition> <time>

Where the positional arguments are:

  • outdir - the directory for the output files, should be global to cluster (e.g. /scratch of the user running the job)
  • policy - path to the GPFS policy to execute (e.g. in ./policy directory)
  • nodecount - number of nodes in the cluster that will run the policy
  • corespernode - number of cores on each node to reserve
  • ram - ram per core, can use "G" for gigabytes
  • partition - the partition to submit the job
  • time - the time in minutes to reserve for the job

Note: the resource reservation is imperfect. The job wrapper calls a script run-mmpol.sh which is responsible for executing the mmapplypolicy command.

The command is aligned to run on specific nodes by way of arguments to mmapplypolicy. The command is technically not run inside of the job reservation so the resource constraints are imperfect. The goal is to use the scheduler to ensure the policy run does not conflict with existing resource allocations on the cluster.

Running the policy "list-policy-external"

The list-policy-external policy provides an efficient tool to gather file stat data into a URL-encoded ASCII text file. The output file can then be processed by down-stream to create reports on storage patterns and use.

An example invocation would be:

submit-pol-job /path/to/output/dir \
     /absolute/path/policy/list-path-external \
	 4 24 4G partition_name \
	 /path/to/listed/dir \
	 180

Some things to keep in mind:

  • the submit-pol-job script may need a ./ prefix if it is not in your path.
  • use absolute paths for all directory arguments to avoid potential confusion
  • make sure the output dir has sufficient space to hold the resulting file listing (It could be 100's of Gigabytes for a large collection of files.)

The slurm job output file will be local to the directory from which this command executed. It can be watched to observe progress in the generation of the file list. A listing of 100's of millions of files may take a couple of hours to generate and consume serveral hundred gigabytes for the output file.

The output file in /path/to/output/dir is named as follows

  • a prefix of "list-${SLURM_JOBID}"
  • ".list" for the name of the policy rule type of "list"
  • a tag for the list name name defined in the policy file, "list-gather" for list-path-external policy

The output file contains one line per file object stored under the /path/to/listed/dir. No directories or non-file objects are included in this listing. Each entry is a space-seperated set of file attributes selected by the SHOW command in the LIST rule. Entries are encoded according to RFC3986 URI percent encoding. This means all spaces and special characters will be encoded, making it easy to split lines into fields using the space separator.