From 56e8efd811b1a81c854c987a9af55f875a950883 Mon Sep 17 00:00:00 2001 From: Matthew K Defenderfer <mdefende@uab.edu> Date: Fri, 10 Jan 2025 01:08:58 -0600 Subject: [PATCH 1/4] add example aggregation notebook --- gpfs-aggregation.ipynb | 233 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 233 insertions(+) create mode 100644 gpfs-aggregation.ipynb diff --git a/gpfs-aggregation.ipynb b/gpfs-aggregation.ipynb new file mode 100644 index 0000000..d7f3797 --- /dev/null +++ b/gpfs-aggregation.ipynb @@ -0,0 +1,233 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example GPFS Workflow\n", + "\n", + "This notebook aims to give an example GPFS workflow using an early version of the `rc_gpfs` package. This will start with a raw GPFS log file moving through stages for splitting and conversion to parquet using the Python API, although both steps could be done using the built-in CLI as well. Once the parquet dataset is created, it will create the basic `tld` summarized report on file count and storage use. From there, some example plots showing breakdowns of storage used by file age will be created.\n", + "\n", + "This notebook assumes being run in a job with at least one GPU, although some options can be changed to use a CPU-only backend. A GPU is recommended for analyzing larger logs, such as for `/data/project`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Suggested Compute Resources\n", + "\n", + "For analyzing a `/data/project` report, I suggest the following resources:\n", + "\n", + "- ntasks = 2\n", + "- cpus-per-task = 32\n", + "- ntasks-per-socket = 1\n", + "- gres = gpu:2\n", + "- partition = amperenodes\n", + "- mem = 256G\n", + "\n", + "When using more than 1 GPU, specifying 1 task per socket is highly recommended to guarantee matching the GPU socket socket affinity on the current A100 DGX. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Package and Input Setup" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "\n", + "# Import the three main Python API submodules for analyzing a report\n", + "from rc_gpfs import process, policy, report" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# File patch setup\n", + "gpfs_log_root = Path('/data/rc/gpfs-policy/data/list-policy_data-project_list-path-external_slurm-31035593_2025-01-05T00:00:24/')\n", + "raw_log = gpfs_log_root.joinpath('raw','list-policy_data-project_list-path-external_slurm-31035593_2025-01-05T00:00:24.gz')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Log Preprocessing\n", + "\n", + "This section shows how to use the `policy` module to split the large GPFS log into smaller parts and then convert each part to a separate parquet file for analysis elsewhere. You can specify different filepaths for the outputs of each stage if you want, but it's generally easier to let the functions use the standard log directory structure seen below:\n", + "\n", + "``` text\n", + ".\n", + "└── log_root/\n", + " ├── raw/\n", + " │ └── raw_log.gz\n", + " ├── chunks/\n", + " │ ├── list-000.gz\n", + " │ ├── list-001.gz\n", + " │ └── ...\n", + " ├── parquet/\n", + " │ ├── list-000.parquet\n", + " │ ├── list-001.parquet\n", + " │ └── ...\n", + " ├── reports/\n", + " └── misc/\n", + "```\n", + "\n", + "The directory structure will be automatically created as needed by each function. It's generally easier to not specify output paths for consistency in organization." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Split" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO: Checking for pigz\n", + "INFO: pigz found at /home/mdefende/bin/pigz\n", + "INFO: 120 logs found. Beginning compression\n" + ] + } + ], + "source": [ + "# Split raw log\n", + "policy.split(log=raw_log)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Convert\n", + "\n", + "Opposed to split, it's much faster to submit the parquet conversion as an array job (built into the `cli` submodule), but it's possible to do here via the multiprocessing library as well." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "from multiprocessing import Pool\n", + "from rc_gpfs.utils import parse_scontrol\n", + "\n", + "cores,_ = parse_scontrol()\n", + "split_files = list(gpfs_log_root.joinpath('chunks').glob('*.gz'))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with Pool(cores) as pool:\n", + " pool.map(policy.convert,split_files)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Aggregate" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "pq_path = gpfs_log_root.joinpath('parquet')\n", + "delta_vals = range(0,210,30)\n", + "delta_unit = 'D'\n", + "report_name = 'agg_by_tld_atime.parquet'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When choosing what sort of compute we want to use, we can let the package infer the backend based on the in-memory dataset size as well as the available compute resources, but it's easier in this case to specify that we want to use `dask_cuda`. Backend initialization happens later in the notebook" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "# Compute backend setup\n", + "with_cuda = True\n", + "with_dask = True" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "ename": "TypeCheckError", + "evalue": "argument \"delta_vals\" (range) did not match any element in the union:\n int: is not an instance of int\n list[int]: is not a list\n NoneType: is not an instance of NoneType", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeCheckError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[16], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43maggregate_gpfs_dataset\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdataset_path\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpq_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdelta_vals\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdelta_vals\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdelta_unit\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdelta_unit\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mreport_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreport_name\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwith_cuda\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mwith_cuda\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwith_dask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mwith_dask\u001b[49m\u001b[43m)\u001b[49m\n", + "File \u001b[0;32m~/.conda/envs/gpfs/lib/python3.11/site-packages/rc_gpfs/process/process.py:56\u001b[0m, in \u001b[0;36maggregate_gpfs_dataset\u001b[0;34m(dataset_path, run_date, delta_vals, delta_unit, time_val, report_dir, report_name, n_workers, with_cuda, with_dask, **kwargs)\u001b[0m\n\u001b[1;32m 49\u001b[0m \u001b[38;5;28;01mpass\u001b[39;00m\n\u001b[1;32m 51\u001b[0m \u001b[38;5;66;03m# ENH: Refactor this, it should not automatically create the compute backend, that should be passed to it. Creating the \u001b[39;00m\n\u001b[1;32m 52\u001b[0m \u001b[38;5;66;03m# backend is beyond the scope of the process module and this function specifically. Can wrap this with the backend \u001b[39;00m\n\u001b[1;32m 53\u001b[0m \u001b[38;5;66;03m# creation in a CLI command later for convenience, but that shouldn't be baked in\u001b[39;00m\n\u001b[1;32m 55\u001b[0m \u001b[38;5;129m@typechecked\u001b[39m\n\u001b[0;32m---> 56\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21maggregate_gpfs_dataset\u001b[39m(\n\u001b[1;32m 57\u001b[0m dataset_path: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m Path,\n\u001b[1;32m 58\u001b[0m run_date: pd\u001b[38;5;241m.\u001b[39mTimestamp \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 59\u001b[0m delta_vals: \u001b[38;5;28mint\u001b[39m \u001b[38;5;241m|\u001b[39m \u001b[38;5;28mlist\u001b[39m[\u001b[38;5;28mint\u001b[39m] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 60\u001b[0m delta_unit: Literal[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mD\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mW\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mM\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mY\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 61\u001b[0m time_val: Literal[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124maccess\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmodify\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcreate\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124maccess\u001b[39m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m 62\u001b[0m report_dir: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m Path \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 63\u001b[0m report_name: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m Path \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 64\u001b[0m n_workers: \u001b[38;5;28mint\u001b[39m \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 65\u001b[0m with_cuda: Literal[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124minfer\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28mbool\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124minfer\u001b[39m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m 66\u001b[0m with_dask: Literal[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124minfer\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28mbool\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124minfer\u001b[39m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m 67\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs\n\u001b[1;32m 68\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 69\u001b[0m \u001b[38;5;66;03m# Input checking\u001b[39;00m\n\u001b[1;32m 70\u001b[0m dataset_path \u001b[38;5;241m=\u001b[39m _check_dataset_path(dataset_path)\n\u001b[1;32m 71\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m dataset_path\u001b[38;5;241m.\u001b[39mis_file():\n", + "File \u001b[0;32m~/.conda/envs/gpfs/lib/python3.11/site-packages/typeguard/_functions.py:137\u001b[0m, in \u001b[0;36mcheck_argument_types\u001b[0;34m(func_name, arguments, memo)\u001b[0m\n\u001b[1;32m 134\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m exc\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 137\u001b[0m \u001b[43mcheck_type_internal\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mannotation\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmemo\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m TypeCheckError \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[1;32m 139\u001b[0m qualname \u001b[38;5;241m=\u001b[39m qualified_name(value, add_class_prefix\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n", + "File \u001b[0;32m~/.conda/envs/gpfs/lib/python3.11/site-packages/typeguard/_checkers.py:946\u001b[0m, in \u001b[0;36mcheck_type_internal\u001b[0;34m(value, annotation, memo)\u001b[0m\n\u001b[1;32m 944\u001b[0m checker \u001b[38;5;241m=\u001b[39m lookup_func(origin_type, args, extras)\n\u001b[1;32m 945\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m checker:\n\u001b[0;32m--> 946\u001b[0m \u001b[43mchecker\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43morigin_type\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmemo\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 947\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[1;32m 949\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m isclass(origin_type):\n", + "File \u001b[0;32m~/.conda/envs/gpfs/lib/python3.11/site-packages/typeguard/_checkers.py:454\u001b[0m, in \u001b[0;36mcheck_uniontype\u001b[0;34m(value, origin_type, args, memo)\u001b[0m\n\u001b[1;32m 451\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m 452\u001b[0m \u001b[38;5;28;01mdel\u001b[39;00m errors \u001b[38;5;66;03m# avoid creating ref cycle\u001b[39;00m\n\u001b[0;32m--> 454\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m TypeCheckError(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdid not match any element in the union:\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;132;01m{\u001b[39;00mformatted_errors\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", + "\u001b[0;31mTypeCheckError\u001b[0m: argument \"delta_vals\" (range) did not match any element in the union:\n int: is not an instance of int\n list[int]: is not a list\n NoneType: is not an instance of NoneType" + ] + } + ], + "source": [ + "process.aggregate_gpfs_dataset(dataset_path=pq_path, delta_vals=delta_vals, delta_unit=delta_unit, report_name=report_name, with_cuda=with_cuda, with_dask=with_dask)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} -- GitLab From 2238dc415846b9ef1f66963c70e40a2b0c7f37da Mon Sep 17 00:00:00 2001 From: Matthew K Defenderfer <mdefende@uab.edu> Date: Fri, 10 Jan 2025 01:48:11 -0600 Subject: [PATCH 2/4] add missing plotly dependency --- deps.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/deps.yml b/deps.yml index 6024fa3..d7f0098 100644 --- a/deps.yml +++ b/deps.yml @@ -182,6 +182,7 @@ dependencies: - pillow=11.0.0=py311h49e9ac3_0 - pip=24.3.1=pyh8b19718_2 - platformdirs=4.3.6=pyhd8ed1ab_1 + - plotly=5.24.1=pyhd8ed1ab_1 - prompt-toolkit=3.0.48=pyha770c72_1 - psutil=6.1.0=py311h9ecbd09_0 - pthread-stubs=0.4=hb9d3cd8_1002 @@ -215,6 +216,7 @@ dependencies: - spdlog=1.14.1=hed91bc2_1 - stack_data=0.6.3=pyhd8ed1ab_1 - tblib=3.0.0=pyhd8ed1ab_1 + - tenacity=9.0.0=pyhd8ed1ab_1 - tk=8.6.13=noxft_h4845f30_101 - toolz=1.0.0=pyhd8ed1ab_1 - tornado=6.4.2=py311h9ecbd09_0 -- GitLab From 32d6b172877a4573294a514c16aed4fff7cbb8bd Mon Sep 17 00:00:00 2001 From: Matthew K Defenderfer <mdefende@uab.edu> Date: Fri, 10 Jan 2025 02:22:04 -0600 Subject: [PATCH 3/4] bugfix: fix case where exponent could exceed 4 and trigger an index error when choosing the proper storage unit --- src/rc_gpfs/report/plotting.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/rc_gpfs/report/plotting.py b/src/rc_gpfs/report/plotting.py index 73c4c12..968bcf1 100644 --- a/src/rc_gpfs/report/plotting.py +++ b/src/rc_gpfs/report/plotting.py @@ -18,7 +18,7 @@ def choose_appropriate_storage_unit(size,starting_unit='B'): except (ValueError): raise(f"{starting_unit} is not a valid storage unit. Choose from 'B','kB','MB','GB', or 'TB'") - while ((size/1024) >= 1) & (exp <= 4): + while ((size/1024) >= 1) & (exp < 4): size = size/1024 exp += 1 return exp,units[exp] -- GitLab From 8786103f08e2b47c68d28f99657f090778ec8e87 Mon Sep 17 00:00:00 2001 From: Matthew K Defenderfer <mdefende@uab.edu> Date: Fri, 10 Jan 2025 02:31:06 -0600 Subject: [PATCH 4/4] finish off basic storage analysis, still missing commentary --- gpfs-aggregation.ipynb | 1063 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 1031 insertions(+), 32 deletions(-) diff --git a/gpfs-aggregation.ipynb b/gpfs-aggregation.ipynb index d7f3797..7b6392a 100644 --- a/gpfs-aggregation.ipynb +++ b/gpfs-aggregation.ipynb @@ -38,19 +38,20 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "# Import the three main Python API submodules for analyzing a report\n", - "from rc_gpfs import process, policy, report" + "from rc_gpfs import process, policy\n", + "from rc_gpfs.report import plotting" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -96,19 +97,9 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "INFO: Checking for pigz\n", - "INFO: pigz found at /home/mdefende/bin/pigz\n", - "INFO: 120 logs found. Beginning compression\n" - ] - } - ], + "outputs": [], "source": [ "# Split raw log\n", "policy.split(log=raw_log)" @@ -155,13 +146,13 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "pq_path = gpfs_log_root.joinpath('parquet')\n", - "delta_vals = range(0,210,30)\n", - "delta_unit = 'D'\n", + "delta_vals = list(range(0,21,6))\n", + "delta_unit = 'M'\n", "report_name = 'agg_by_tld_atime.parquet'" ] }, @@ -174,7 +165,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -185,28 +176,1036 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 28, "metadata": {}, "outputs": [ { - "ename": "TypeCheckError", - "evalue": "argument \"delta_vals\" (range) did not match any element in the union:\n int: is not an instance of int\n list[int]: is not a list\n NoneType: is not an instance of NoneType", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mTypeCheckError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[16], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43maggregate_gpfs_dataset\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdataset_path\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpq_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdelta_vals\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdelta_vals\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdelta_unit\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdelta_unit\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mreport_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mreport_name\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwith_cuda\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mwith_cuda\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwith_dask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mwith_dask\u001b[49m\u001b[43m)\u001b[49m\n", - "File \u001b[0;32m~/.conda/envs/gpfs/lib/python3.11/site-packages/rc_gpfs/process/process.py:56\u001b[0m, in \u001b[0;36maggregate_gpfs_dataset\u001b[0;34m(dataset_path, run_date, delta_vals, delta_unit, time_val, report_dir, report_name, n_workers, with_cuda, with_dask, **kwargs)\u001b[0m\n\u001b[1;32m 49\u001b[0m \u001b[38;5;28;01mpass\u001b[39;00m\n\u001b[1;32m 51\u001b[0m \u001b[38;5;66;03m# ENH: Refactor this, it should not automatically create the compute backend, that should be passed to it. Creating the \u001b[39;00m\n\u001b[1;32m 52\u001b[0m \u001b[38;5;66;03m# backend is beyond the scope of the process module and this function specifically. Can wrap this with the backend \u001b[39;00m\n\u001b[1;32m 53\u001b[0m \u001b[38;5;66;03m# creation in a CLI command later for convenience, but that shouldn't be baked in\u001b[39;00m\n\u001b[1;32m 55\u001b[0m \u001b[38;5;129m@typechecked\u001b[39m\n\u001b[0;32m---> 56\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21maggregate_gpfs_dataset\u001b[39m(\n\u001b[1;32m 57\u001b[0m dataset_path: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m Path,\n\u001b[1;32m 58\u001b[0m run_date: pd\u001b[38;5;241m.\u001b[39mTimestamp \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 59\u001b[0m delta_vals: \u001b[38;5;28mint\u001b[39m \u001b[38;5;241m|\u001b[39m \u001b[38;5;28mlist\u001b[39m[\u001b[38;5;28mint\u001b[39m] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 60\u001b[0m delta_unit: Literal[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mD\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mW\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mM\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mY\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 61\u001b[0m time_val: Literal[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124maccess\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmodify\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcreate\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124maccess\u001b[39m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m 62\u001b[0m report_dir: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m Path \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 63\u001b[0m report_name: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m Path \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 64\u001b[0m n_workers: \u001b[38;5;28mint\u001b[39m \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 65\u001b[0m with_cuda: Literal[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124minfer\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28mbool\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124minfer\u001b[39m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m 66\u001b[0m with_dask: Literal[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124minfer\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28mbool\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124minfer\u001b[39m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m 67\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs\n\u001b[1;32m 68\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 69\u001b[0m \u001b[38;5;66;03m# Input checking\u001b[39;00m\n\u001b[1;32m 70\u001b[0m dataset_path \u001b[38;5;241m=\u001b[39m _check_dataset_path(dataset_path)\n\u001b[1;32m 71\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m dataset_path\u001b[38;5;241m.\u001b[39mis_file():\n", - "File \u001b[0;32m~/.conda/envs/gpfs/lib/python3.11/site-packages/typeguard/_functions.py:137\u001b[0m, in \u001b[0;36mcheck_argument_types\u001b[0;34m(func_name, arguments, memo)\u001b[0m\n\u001b[1;32m 134\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m exc\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 137\u001b[0m \u001b[43mcheck_type_internal\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mannotation\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmemo\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m TypeCheckError \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[1;32m 139\u001b[0m qualname \u001b[38;5;241m=\u001b[39m qualified_name(value, add_class_prefix\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n", - "File \u001b[0;32m~/.conda/envs/gpfs/lib/python3.11/site-packages/typeguard/_checkers.py:946\u001b[0m, in \u001b[0;36mcheck_type_internal\u001b[0;34m(value, annotation, memo)\u001b[0m\n\u001b[1;32m 944\u001b[0m checker \u001b[38;5;241m=\u001b[39m lookup_func(origin_type, args, extras)\n\u001b[1;32m 945\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m checker:\n\u001b[0;32m--> 946\u001b[0m \u001b[43mchecker\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43morigin_type\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmemo\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 947\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[1;32m 949\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m isclass(origin_type):\n", - "File \u001b[0;32m~/.conda/envs/gpfs/lib/python3.11/site-packages/typeguard/_checkers.py:454\u001b[0m, in \u001b[0;36mcheck_uniontype\u001b[0;34m(value, origin_type, args, memo)\u001b[0m\n\u001b[1;32m 451\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m 452\u001b[0m \u001b[38;5;28;01mdel\u001b[39;00m errors \u001b[38;5;66;03m# avoid creating ref cycle\u001b[39;00m\n\u001b[0;32m--> 454\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m TypeCheckError(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdid not match any element in the union:\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;132;01m{\u001b[39;00mformatted_errors\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", - "\u001b[0;31mTypeCheckError\u001b[0m: argument \"delta_vals\" (range) did not match any element in the union:\n int: is not an instance of int\n list[int]: is not a list\n NoneType: is not an instance of NoneType" + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO: 120 parquet files found in dataset\n", + "cores: 32 \n", + "mem: 256.0 \n", + "ngpus: 1 \n", + "vram: 80.0\n", + "Successfully imported 'dask_cuda' into the main namespace as 'dask_cuda'.\n", + "Successfully imported 'dask.dataframe' into the main namespace as 'dd'.\n", + "Executed hook: dask.config.set with args: ({'dataframe.backend': 'cudf'},) and kwargs: {}\n", + "INFO: Only dask.dataframe has been loaded automatically. All other dask modules must be loaded manually in order to use them.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:bokeh.server.util:Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO: Closing Dask client and cluster\n" ] } ], "source": [ "process.aggregate_gpfs_dataset(dataset_path=pq_path, delta_vals=delta_vals, delta_unit=delta_unit, report_name=report_name, with_cuda=with_cuda, with_dask=with_dask)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plotting" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.read_parquet(gpfs_log_root.joinpath('reports',report_name))\n", + "df.columns = ['tld','dt_grp','bytes','file_count']\n", + "agg = df.groupby('dt_grp',observed=True)[['bytes','file_count']].sum().reset_index()" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "exp,unit = plotting.choose_appropriate_storage_unit(agg['bytes'])\n", + "agg[unit] = agg['bytes']/(1024**exp)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "agg[['file_count_cum',f'{unit}_cum']] = agg[['file_count',unit]].cumsum()\n", + "agg[[unit,f'{unit}_cum']] = agg[[unit,f'{unit}_cum']].round(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "marker": { + "color": "#636EFA" + }, + "name": "Raw", + "text": [ + "5.86", + "737.41", + "433.43", + "274.48", + "1,907.07" + ], + "textposition": "outside", + "type": "bar", + "x": [ + "<0M", + "0M-6M", + "6M-12M", + "12M-18M", + ">18M" + ], + "y": [ + 5.856, + 737.407, + 433.425, + 274.477, + 1907.067 + ] + }, + { + "marker": { + "color": "#EF553B" + }, + "mode": "lines+markers+text", + "name": "Cumulative", + "text": [ + null, + "743.26", + "1,176.69", + "1,451.16", + "3,358.23" + ], + "textposition": "top center", + "type": "scatter", + "x": [ + "<0M", + "0M-6M", + "6M-12M", + "12M-18M", + ">18M" + ], + "y": [ + 5.856, + 743.263, + 1176.688, + 1451.165, + 3358.232 + ] + } + ], + "layout": { + "hovermode": false, + "margin": { + "b": 20, + "l": 40, + "r": 40, + "t": 100 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "white", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "white", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "#C8D4E3", + "linecolor": "#C8D4E3", + "minorgridcolor": "#C8D4E3", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "#C8D4E3", + "linecolor": "#C8D4E3", + "minorgridcolor": "#C8D4E3", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "white", + "showlakes": true, + "showland": true, + "subunitcolor": "#C8D4E3" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "white", + "polar": { + "angularaxis": { + "gridcolor": "#EBF0F8", + "linecolor": "#EBF0F8", + "ticks": "" + }, + "bgcolor": "white", + "radialaxis": { + "gridcolor": "#EBF0F8", + "linecolor": "#EBF0F8", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "white", + "gridcolor": "#DFE8F3", + "gridwidth": 2, + "linecolor": "#EBF0F8", + "showbackground": true, + "ticks": "", + "zerolinecolor": "#EBF0F8" + }, + "yaxis": { + "backgroundcolor": "white", + "gridcolor": "#DFE8F3", + "gridwidth": 2, + "linecolor": "#EBF0F8", + "showbackground": true, + "ticks": "", + "zerolinecolor": "#EBF0F8" + }, + "zaxis": { + "backgroundcolor": "white", + "gridcolor": "#DFE8F3", + "gridwidth": 2, + "linecolor": "#EBF0F8", + "showbackground": true, + "ticks": "", + "zerolinecolor": "#EBF0F8" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "#DFE8F3", + "linecolor": "#A2B1C6", + "ticks": "" + }, + "baxis": { + "gridcolor": "#DFE8F3", + "linecolor": "#A2B1C6", + "ticks": "" + }, + "bgcolor": "white", + "caxis": { + "gridcolor": "#DFE8F3", + "linecolor": "#A2B1C6", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "#EBF0F8", + "linecolor": "#EBF0F8", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "#EBF0F8", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "#EBF0F8", + "linecolor": "#EBF0F8", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "#EBF0F8", + "zerolinewidth": 2 + } + } + }, + "title": { + "font": { + "size": 24 + }, + "x": 0.5, + "xanchor": "center" + }, + "xaxis": { + "title": {} + }, + "yaxis": { + "title": {} + } + } + }, + "text/html": [ + "<div> <div id=\"12792956-d50c-4269-bc9c-d71798e414de\" class=\"plotly-graph-div\" style=\"height:525px; width:100%;\"></div> <script type=\"text/javascript\"> require([\"plotly\"], function(Plotly) { window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById(\"12792956-d50c-4269-bc9c-d71798e414de\")) { Plotly.newPlot( \"12792956-d50c-4269-bc9c-d71798e414de\", [{\"marker\":{\"color\":\"#636EFA\"},\"name\":\"Raw\",\"text\":[\"5.86\",\"737.41\",\"433.43\",\"274.48\",\"1,907.07\"],\"textposition\":\"outside\",\"x\":[\"\\u003c0M\",\"0M-6M\",\"6M-12M\",\"12M-18M\",\"\\u003e18M\"],\"y\":[5.856,737.407,433.425,274.477,1907.067],\"type\":\"bar\"},{\"marker\":{\"color\":\"#EF553B\"},\"mode\":\"lines+markers+text\",\"name\":\"Cumulative\",\"text\":[null,\"743.26\",\"1,176.69\",\"1,451.16\",\"3,358.23\"],\"textposition\":\"top center\",\"x\":[\"\\u003c0M\",\"0M-6M\",\"6M-12M\",\"12M-18M\",\"\\u003e18M\"],\"y\":[5.856,743.263,1176.688,1451.165,3358.232],\"type\":\"scatter\"}], {\"template\":{\"data\":{\"barpolar\":[{\"marker\":{\"line\":{\"color\":\"white\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"barpolar\"}],\"bar\":[{\"error_x\":{\"color\":\"#2a3f5f\"},\"error_y\":{\"color\":\"#2a3f5f\"},\"marker\":{\"line\":{\"color\":\"white\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"bar\"}],\"carpet\":[{\"aaxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"#C8D4E3\",\"linecolor\":\"#C8D4E3\",\"minorgridcolor\":\"#C8D4E3\",\"startlinecolor\":\"#2a3f5f\"},\"baxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"#C8D4E3\",\"linecolor\":\"#C8D4E3\",\"minorgridcolor\":\"#C8D4E3\",\"startlinecolor\":\"#2a3f5f\"},\"type\":\"carpet\"}],\"choropleth\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"choropleth\"}],\"contourcarpet\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"contourcarpet\"}],\"contour\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"contour\"}],\"heatmapgl\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"heatmapgl\"}],\"heatmap\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"heatmap\"}],\"histogram2dcontour\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"histogram2dcontour\"}],\"histogram2d\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"histogram2d\"}],\"histogram\":[{\"marker\":{\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"histogram\"}],\"mesh3d\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"mesh3d\"}],\"parcoords\":[{\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"parcoords\"}],\"pie\":[{\"automargin\":true,\"type\":\"pie\"}],\"scatter3d\":[{\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatter3d\"}],\"scattercarpet\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattercarpet\"}],\"scattergeo\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattergeo\"}],\"scattergl\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattergl\"}],\"scattermapbox\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattermapbox\"}],\"scatterpolargl\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterpolargl\"}],\"scatterpolar\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterpolar\"}],\"scatter\":[{\"fillpattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2},\"type\":\"scatter\"}],\"scatterternary\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterternary\"}],\"surface\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"surface\"}],\"table\":[{\"cells\":{\"fill\":{\"color\":\"#EBF0F8\"},\"line\":{\"color\":\"white\"}},\"header\":{\"fill\":{\"color\":\"#C8D4E3\"},\"line\":{\"color\":\"white\"}},\"type\":\"table\"}]},\"layout\":{\"annotationdefaults\":{\"arrowcolor\":\"#2a3f5f\",\"arrowhead\":0,\"arrowwidth\":1},\"autotypenumbers\":\"strict\",\"coloraxis\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"colorscale\":{\"diverging\":[[0,\"#8e0152\"],[0.1,\"#c51b7d\"],[0.2,\"#de77ae\"],[0.3,\"#f1b6da\"],[0.4,\"#fde0ef\"],[0.5,\"#f7f7f7\"],[0.6,\"#e6f5d0\"],[0.7,\"#b8e186\"],[0.8,\"#7fbc41\"],[0.9,\"#4d9221\"],[1,\"#276419\"]],\"sequential\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"sequentialminus\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]},\"colorway\":[\"#636efa\",\"#EF553B\",\"#00cc96\",\"#ab63fa\",\"#FFA15A\",\"#19d3f3\",\"#FF6692\",\"#B6E880\",\"#FF97FF\",\"#FECB52\"],\"font\":{\"color\":\"#2a3f5f\"},\"geo\":{\"bgcolor\":\"white\",\"lakecolor\":\"white\",\"landcolor\":\"white\",\"showlakes\":true,\"showland\":true,\"subunitcolor\":\"#C8D4E3\"},\"hoverlabel\":{\"align\":\"left\"},\"hovermode\":\"closest\",\"mapbox\":{\"style\":\"light\"},\"paper_bgcolor\":\"white\",\"plot_bgcolor\":\"white\",\"polar\":{\"angularaxis\":{\"gridcolor\":\"#EBF0F8\",\"linecolor\":\"#EBF0F8\",\"ticks\":\"\"},\"bgcolor\":\"white\",\"radialaxis\":{\"gridcolor\":\"#EBF0F8\",\"linecolor\":\"#EBF0F8\",\"ticks\":\"\"}},\"scene\":{\"xaxis\":{\"backgroundcolor\":\"white\",\"gridcolor\":\"#DFE8F3\",\"gridwidth\":2,\"linecolor\":\"#EBF0F8\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"#EBF0F8\"},\"yaxis\":{\"backgroundcolor\":\"white\",\"gridcolor\":\"#DFE8F3\",\"gridwidth\":2,\"linecolor\":\"#EBF0F8\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"#EBF0F8\"},\"zaxis\":{\"backgroundcolor\":\"white\",\"gridcolor\":\"#DFE8F3\",\"gridwidth\":2,\"linecolor\":\"#EBF0F8\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"#EBF0F8\"}},\"shapedefaults\":{\"line\":{\"color\":\"#2a3f5f\"}},\"ternary\":{\"aaxis\":{\"gridcolor\":\"#DFE8F3\",\"linecolor\":\"#A2B1C6\",\"ticks\":\"\"},\"baxis\":{\"gridcolor\":\"#DFE8F3\",\"linecolor\":\"#A2B1C6\",\"ticks\":\"\"},\"bgcolor\":\"white\",\"caxis\":{\"gridcolor\":\"#DFE8F3\",\"linecolor\":\"#A2B1C6\",\"ticks\":\"\"}},\"title\":{\"x\":0.05},\"xaxis\":{\"automargin\":true,\"gridcolor\":\"#EBF0F8\",\"linecolor\":\"#EBF0F8\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"#EBF0F8\",\"zerolinewidth\":2},\"yaxis\":{\"automargin\":true,\"gridcolor\":\"#EBF0F8\",\"linecolor\":\"#EBF0F8\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"#EBF0F8\",\"zerolinewidth\":2}}},\"title\":{\"font\":{\"size\":24},\"x\":0.5,\"xanchor\":\"center\"},\"margin\":{\"t\":100,\"b\":20,\"l\":40,\"r\":40},\"xaxis\":{\"title\":{}},\"yaxis\":{\"title\":{}},\"hovermode\":false}, {\"responsive\": true} ).then(function(){\n", + " \n", + "var gd = document.getElementById('12792956-d50c-4269-bc9c-d71798e414de');\n", + "var x = new MutationObserver(function (mutations, observer) {{\n", + " var display = window.getComputedStyle(gd).display;\n", + " if (!display || display === 'none') {{\n", + " console.log([gd, 'removed!']);\n", + " Plotly.purge(gd);\n", + " observer.disconnect();\n", + " }}\n", + "}});\n", + "\n", + "// Listen for the removal of the full notebook cells\n", + "var notebookContainer = gd.closest('#notebook-container');\n", + "if (notebookContainer) {{\n", + " x.observe(notebookContainer, {childList: true});\n", + "}}\n", + "\n", + "// Listen for the clearing of the current output cell\n", + "var outputEl = gd.closest('.output');\n", + "if (outputEl) {{\n", + " x.observe(outputEl, {childList: true});\n", + "}}\n", + "\n", + " }) }; }); </script> </div>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plotting.pareto_chart(agg,x='dt_grp',y=unit)" + ] } ], "metadata": { -- GitLab