diff --git a/README.md b/README.md index 437aca97319cdec4560949d9db8f1c38608ebecf..b53b7dd15708e32e659af1f843663e4ab7e3ee47 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,39 @@ # SLURM Database Characterization +## Data + +I got a database dump as my input data. With some light research, it looks like it *is* technically possible to read this in directly to python and a pandas dataframe, but.... I don't want to do that, because it seems pretty hairy. + +So.... + +First things first, I decided to convert it to CSV. + +Based on https://blog.twineworks.com/converting-a-mysql-dump-to-csv-files-b5e92d7cc5dd + +I used + +``` +cat slurmdb_backup_20190417_031528 | dos2unix | gawk -f split.awk +``` + +where split.awk is from https://gist.github.com/slawo-ch/894349427655d22398f825dc535a40f0#file-split-awk + +and slurmdb_backup_20190417_031528 is the unzipped database dump. + +The output of the conversion spits out files which look something like the below + +```acct_coord_table.sql slurm_cluster_assoc_table.csv slurm_cluster_resv_table.csv slurm_cluster_wckey_usage_month_table.sql +acct_table.csv slurm_cluster_assoc_table.sql slurm_cluster_resv_table.sql split.awk +acct_table.sql slurm_cluster_assoc_usage_day_table.csv slurm_cluster_step_table.csv table.csv +clus_res_table.sql slurm_cluster_assoc_usage_day_table.sql slurm_cluster_step_table.sql table_defs_table.csv +cluster_table.csv slurm_cluster_assoc_usage_hour_table.csv slurm_cluster_suspend_table.sql table_defs_table.sql +cluster_table.sql slurm_cluster_assoc_usage_hour_table.sql slurm_cluster_usage_day_table.csv table.sql +convert_version_table.csv slurm_cluster_assoc_usage_month_table.csv slurm_cluster_usage_day_table.sql tres_table.csv +convert_version_table.sql slurm_cluster_assoc_usage_month_table.sql slurm_cluster_usage_hour_table.csv tres_table.sql +federation_table.sql slurm_cluster_event_table.csv slurm_cluster_usage_hour_table.sql txn_table.csv +header.sql slurm_cluster_event_table.sql slurm_cluster_usage_month_table.csv txn_table.sql +qos_table.csv slurm_cluster_job_table.csv slurm_cluster_usage_month_table.sql user_table.csv +qos_table.sql slurm_cluster_job_table.sql slurm_cluster_wckey_table.sql user_table.sql +README.md slurm_cluster_last_ran_table.csv slurm_cluster_wckey_usage_day_table.sql +res_table.sql slurm_cluster_last_ran_table.sql slurm_cluster_wckey_usage_hour_table.sql +``` \ No newline at end of file