Actually, looking at the current configuration of the nodes, BrightCM provisioned one NVMe as / and other OS partitions, the remaining NVMe and the 2 x 500G drives are un-provisioned:
[root@c0237 ~]# lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTsda 8:0 0 447.1G 0 disksdb 8:16 0 447.1G 0 disk├─sdb1 8:17 0 200M 0 part└─sdb2 8:18 0 2G 0 partnvme0n1 259:1 0 2.9T 0 disk├─nvme0n1p1 259:2 0 40G 0 part /├─nvme0n1p2 259:3 0 6G 0 part /var├─nvme0n1p3 259:4 0 6G 0 part /tmp├─nvme0n1p4 259:5 0 12G 0 part [SWAP]└─nvme0n1p5 259:6 0 2.9T 0 part /localnvme1n1 259:0 0 2.9T 0 disk
Note that these nodes do not have hardware RAID. We can do software RAID.
Proposal:
2 x 500G drives: Create a software RAID 0 (stripe) that would give the OS ~1TB
2 x 3TB NVMe drives: Create a software RAID 0 (stripe) and mount as /local to provide ~6TB local scratch
I'd like to move in the direction of /local as the base for locally attached disks. Then we can have things like /local/scratch and /local/home when that makes sense.
Ideally we will move towards /local/scratch/$USER and then make it so that slurm cleans that dir up at end of job.
I think we need some programming for that.
** From Mike**
@jpr Some thoughts around what will be needed to make the change from using /local as the endpoint for local scratch to /local/scratch
Notify the users of the up coming change
Create a Slurm prolog script that will create /local/scratch/$USER/$JOBID on assigned compute nodes if it doesn't exist (with appropriate $USER:$GROUP and 700 permissions)
Create a Slurm epilog script that will clear out /local/scratch/$USER/$JOBID
Update compute node images with /local/scratch and /local/home
Update the $LOCAL_SCRATCH to point to /local/scratch/$USER (currently it points to /scratch/local)
Update the symlink (or delete it?) /scratch/local -> /local/scratch, (currently symlink'd to /local
Testing to ensure serial, MPI, array and other types of jobs work properly
I've separated out the conversation surrounding the nvme drives on the A100 nodes from the original issue (Epics would be nice!) into this issue. In my opinion, we shouldn't let this issue block the A100 release, but still need to work on it. I'll bring it up at the next planning meeting.
From my perspective, the main remaining task is to align all node configs in terms of User Experience when using local drives. The A100s are configured differently, but the researchers shouldn't have to care where /local points. They should be able to change partition, and nothing else, without their sbatch script breaking.