Skip to content
Snippets Groups Projects
Commit e820c308 authored by Eesaan Atluri's avatar Eesaan Atluri
Browse files

Add documentation for xdmod

parent 919440d7
No related branches found
No related tags found
No related merge requests found
## XDMOD
Setup a basic OHPC cluster with a master node and a compute node by following the instruction below.
```
git clone --recursive git@gitlab.rc.uab.edu:atlurie/terraform-openstack.git xdmod-test
```
```
cd xdmod-test
```
```
git checkout feat-open-xdmod-terraform
```
```
cd CRI_XCBC
```
```
git remote add atlurie git@github.com:eesaanatluri/CRI_XCBC.git
```
```
git checkout feat-open-xdmod
```
```
terraform apply
```
After a basic acceptance test `srun hostname` and confirming your cluster is setup properly, import the slurm accounting data from your production cluster to your dev cluster.
### Steps to import slurm accounting data
- First, get the slurm accounting database dump into the master node
```
rsync slurmdb_backup_20191014_032723.gz centos@164.111.161.164:~
```
- ssh into the head node.
```
ssh centos@164.111.161.164
```
- unzip the database dump
```
gunzip -c slurmdb_backup_20191014_032723.gz > slurmdb_backup_20191014_032723.sql
```
- modify the table names in the database dump file to match the table names in dev setup.
```
perl -pi.bak -e 's/slurm_cluster/ohpc/g;' slurmdb_backup_20191014_032723.sql
```
- stop the slurmdbd service before restoring the dump file into new database
```
sudo systemctl stop slurmdbd
```
- delete the existing slurm accounting database
```
mysql -u root -e "drop database slurmdb;"
```
- create a new database with the name defined in slurm config
```
mysql -u root -e "create database slurmdb;"
```
- restore the database from the modified backup file.
```
mysql slurmdb -u root < slurmdb_backup_20191014_032723.sql
```
- start slurmdbd manually
```
sudo slurmdbd -Dvvvvv
```
- Your log would look like below,
```
[2019-09-26T17:12:49.390] slurmdbd version 18.08.8 started
[2019-09-26T17:12:49.390] debug2: running rollup at Thu Sep 26 17:12:49 2019
[2019-09-26T17:12:49.391] debug2: Attempting to connect to localhost:1234
[2019-09-26T17:12:49.391] debug4: 0(as_mysql_usage.c:118) query
select hourly_rollup, daily_rollup, monthly_rollup from "ohpc_last_ran_table"
[2019-09-26T17:12:51.394] debug2: slurm_connect poll timeout: Connection timed out
[2019-09-26T17:12:51.394] debug2: Error connecting slurm stream socket at 172.20.0.24:6817: Connection timed out
[2019-09-26T17:12:51.394] debug4: got 0 commits
[2019-09-26T17:12:54.000] debug2: Opened connection 9 from 10.1.1.10
[2019-09-26T17:12:54.002] debug: REQUEST_PERSIST_INIT: CLUSTER:ohpc VERSION:8448 UID:202 IP:10.1.1.10 CONN:9
[2019-09-26T17:12:54.002] debug2: acct_storage_p_get_connection: request new connection 1
[2019-09-26T17:12:54.002] debug2: Attempting to connect to localhost:1234
[2019-09-26T17:13:00.878] debug2: DBD_CLUSTER_TRES: called for ohpc(1=1,2=1,3=0,4=1,5=1,6=0,7=0,8=0)
[2019-09-26T17:13:00.878] debug: ohpc has changed tres from 1=2814,2=26264390,3=0,4=120,5=2814,6=0,7=0,8=0 to 1=1,2=1,3=0,4=1,5=1,6=0,7=0,8=0
[2019-09-26T17:13:01.030] debug3: DBD_CLUSTER_TRES: cluster not registered
[2019-09-26T17:13:01.030] debug4: got 0 commits
[2019-09-26T17:13:26.522] Warning: Note very large processing time from hourly_rollup for ohpc: usec=37130621 began=17:12:49.391
[2019-09-26T17:13:34.709] Warning: Note very large processing time from daily_rollup for ohpc: usec=8187177 began=17:13:26.522
[2019-09-26T17:13:34.709] debug2: No need to roll cluster ohpc this month 1567296000 <= 1567296000
[2019-09-26T17:13:34.755] debug2: Got 1 of 1 rolled up
[2019-09-26T17:13:34.755] debug2: Everything rolled up
[2019-09-26T17:13:34.755] debug4: got 0 commits
```
- As a test to see if your import is successful run the following command. If it is successful, you should see a list of records.
```
sudo sacct --accounts <user account> -S <start date>
```
### Set the hierarchies in XDMOD
**Note:** Before you proceed to work with xdmod data, drop and recreate the xdmod databases through the xdmod setup script
`sudo xdmod-setup` and then choose option 2 for database settings.
Get the hierarchies and group to hierarchy mapping CSVs from your system into the head node.
```
rsync ~/Downloads/depttoschoolmapping_1.csv centos@164.111.161.173:~
rsync ~/Downloads/cheahausers.csv centos@164.111.161.173:~
```
Import hierarchy CSV into XDMOD
```
sudo xdmod-import-csv -t hierarchy -i depttoschoolmapping.csv
```
After importing the hierarchy it is necessary to provide a mapping from your user groups to the hierarchy items
```
sudo xdmod-import-csv -t group-to-hierarchy -i cheahausers.csv
```
After importing this data you must ingest it for the date range of any job data you have already shredded.
```
sudo xdmod-ingestor --start-date 2015-01-01 --end-date 2019-12-31
```
### Shred and ingest slurm data into XDMOD
```
sudo xdmod-slurm-helper -r ohpc --start-time 2015-01-01T11:59:59 --end-time 2019-10-15T11:59:59
```
```
sudo xdmod-ingestor --start-date 2015-01-01 --end-date 2019-10-15
```
### References
Open XDMoD Documentation
[https://open.xdmod.org/8.1/index.html](https://open.xdmod.org/8.1/index.html)
The user manual
[https://xdmod.ccr.buffalo.edu/user_manual/](https://xdmod.ccr.buffalo.edu/user_manual/)
Campus Champions Talk on XDMOD
[https://www.youtube.com/watch?v=fO9BXd1z1xc&feature=youtu.be](https://www.youtube.com/watch?v=fO9BXd1z1xc&feature=youtu.be)
-------------------------
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment