|
|
## Current Communication
|
|
|
|
|
|
UAB Research Computing has good news to share! We have installed 40 A100 GPUs across 20 nodes, 2 GPUs per node, for immediate use by our research community. To get started quickly, use partitions "amperenodes" and/or "amperenodes-medium". For more information about the installation, known limitations, how to make the most of the A100 GPUs, and hardware details, please read on.
|
|
|
To our Research Computing Community,
|
|
|
|
|
|
UAB Research Computing has good news to share! We have installed 40 A100 GPUs across 20 nodes, 2 GPUs per node, for immediate use by our research community. To get started quickly, use partitions "amperenodes" and/or "amperenodes-medium". For more information about changes, known limitations, how to make the most of the A100 GPUs, and hardware details, please read on.
|
|
|
|
|
|
**Changes to CUDA software**
|
|
|
|
|
|
To use the latest version of CUDA, please use "module load CUDA/12.2.0`. To use the latest version of cuDNN, please use "module load cuDNN/12.1.0
|
|
|
To use the latest version of CUDA, please use "module load CUDA/12.2.0`. To use the latest version of cuDNN, please use "module load cuDNN/12.1.0". For more information please [see our documentation](https://docs.rc.uab.edu/cheaha/slurm/gpu/#cuda-modules).
|
|
|
|
|
|
**Hardware Specification**
|
|
|
|
|
|
Each A100 node has two A100 GPUs, each of which has 80 GB of memory. The nodes also have 128 cores split across two CPU dies and 512 GB of main system memory. 6 TB of NVMe storage is available in a striped (RAID 0) configuration for I/O performance.
|
|
|
Each A100 node has two A100 GPUs, each of which has 80 GB of memory. The nodes also have 128 cores split across two CPU dies and 512 GB of main system memory. 6 TB of NVMe storage is available in a striped (RAID 0) configuration for I/O performance. For more information please [see our hardware page](https://docs.rc.uab.edu/cheaha/hardware/#summary) and [GPU page](https://docs.rc.uab.edu/cheaha/slurm/gpu/#available-devices). Please also read about how to [ensure IO performance](https://docs.rc.uab.edu/cheaha/slurm/gpu/#ensuring-io-performance-with-a100-gpus).
|
|
|
|
|
|
**Known Limitations**
|
|
|
|
|
|
For TensorFlow users: we are researching how to make the TensorRT library available as a module. You may see warnings about TensorRT not found in TensorFlow. The lack of TensorRT may or may not impact performance, but this warning does not prevent or impact quality of model training.
|
|
|
|
|
|
**Further Reading**
|
|
|
|
|
|
- [A100 and `amperenodes` FAQ](https://docs.rc.uab.edu/cheaha/slurm/gpu/#frequently-asked-questions-faq-about-a100-gpus)
|
|
|
- [Cheaha GPU Documentation Page](https://docs.rc.uab.edu/cheaha/slurm/gpu/#gpus)
|
|
|
- [Ensuring IO Performance with A100 GPUs](https://docs.rc.uab.edu/cheaha/slurm/gpu/#ensuring-io-performance-with-a100-gpus)
|
|
|
- [CUDA Module Changes](https://docs.rc.uab.edu/cheaha/slurm/gpu/#cuda-modules)
|
|
|
|
|
|
**Questions and Concerns**
|
|
|
|
|
|
If you have any questions or concerns, please reply to this email to create a support ticket, or email <support@listserv.uab.edu>.
|
|
|
|
|
|
Thank you!
|
|
|
|
|
|
The UAB Research Computing Team
|
|
|
|
|
|
## After this point...
|
|
|
|
|
|
**All of the following changes have been made to the docs and elsewhere**
|
|
|
|
|
|
Put it in various places in the docs
|
|
|
|
|
|
**Questions and Answers**:
|
... | ... | |