diff --git a/README.md b/README.md
index 92b962c80a28270509698c3e3e30dfadd3a0459c..cf758985e3e65cc7757d83723328a17ab06c88c6 100644
--- a/README.md
+++ b/README.md
@@ -75,6 +75,27 @@ The output file contains one line per file object stored under the `device`.  No
 
 ### Split and compress
 
+Policy outputs generated using `list-path-external` or `list-path-dirplus` can be split into multiple smaller log files to facilitate out-of-memory computation for very large filesets using tools such as dask. The policy output can be split and compressed using the `src/split-info-file.sh` script. See the following for usage:
+
+```bash
+./split-info-file.sh [ -h ] [ -l | --lines ] [ -o | --outdir ] 
+                     [ -n | --ntasks ] [ -p | --partition] [ -t | --time ] [ -m | --mem ]  
+                     log
+```
+
+- `lines`: the max number of lines to include in each split file. Defaults to 5000000
+- `outdir`: directory to store the split files in. Defaults to ${log}.d in log's parent directory.
+- `log`: path to the GPFS policy log. Can be either uncompressed or `gzip` compressed
+
+All other options specify job resource parameters. Defaults are as follows:
+
+- `ntasks`: 4
+- `partition`: `amd-hdr100`
+- `time`: `12:00:00`
+- `mem`: `16G`
+
+Split files will have the form `${outdir}/list-XXX.gz` where XXX is an incrementing index. Files are automatically compressed.
+
 ### Pre-parse output for Python
 
 Processing GPFS log outputs is controlled by the `run-convert-to-parquet.sh` script and assumes the GPFS log has been split into a number of files of the form `list-XXX.gz` where `XXX` is an incrementing numeric index. This creates an array job where each task in the array reads the quoted text in one file, parses it into a dataframe, and exports it as a parquet file with the name `list-XXX.parquet`.