-
Ryan Randles Jones authoredaaf31d04
Release notes version - 1.0 (12/08/2020)
Cluster Analysis
This initial version creates the dataset, kmeans clustering, and resulting graphs to analize how our users are utilizing the cluster.
Features included:
-
User input to choose date range of data to analyze
-
User input to choose min and max values for ReqMemCPU, AllocCPUS, and Elapsed
-
User input to choose how data is normalized: 0-1, log, or no normalization
-
User input to choose min and max x and y axes for 2D histogram graphs
- data on job counts for each density spot in 2d histograms
- summary statistics for each cluster in the form of the count of jobs and the count of users per cluster
Release Notes version - 1.1 Bug Fix (12/15/2020)
Dataset for completed jobs orginally had all jobs and each of their job steps. This skewed the clustering graphs, as there were more data points than individual jobs ran. The data is now being pulled into the dataset using only allocated jobs (done with -X in the slurm2sql.slurm2sql command), which results in each row of the dataset being a different job.
Release Notes verion - 2.0 (12/22/2020)
Added summary stats for each cluster. This includes the count for both jobs ran and users running those jobs for each of the four clusters.
- summary statistics in the form of a table showing the job and user count for each cluster
- Data on stats for each density spot in the 2d histograms will come in another notebook. This notebook will be a deeper analysis of each 2d histogram for each cluster. This notebook should be released by end of January 2021.