FY 2026 Project Planning
Triaging and prioritizing RCS projects for FY26.
Please add comments to this issue rather than editing directly.
MUST >> WANT >> NICE
- TOP priority
- DOCS: Update facilities document https://github.com/uabrc/uabrc.github.io/issues/780 (and more, talk to William)
- CHEAHA: Migrate Head Node from GPFS4 to GPFS5 cheaha#62 (Slurm Shutdown reqd https://github.com/uabrc/devops-docs/issues/70)
- CHEAHA: Update Cheaha OS cheaha#55
- CHEAHA: Cheaha software builds via CICD (next tier priority) #679
- STORAGE: GPFS5/Ceph Core shared quotas #632
- CHEAHA: intel-dcb nodes made available on GPFS5 for internal priority (windfall?) use (facilitates calculating shared quotas via policy runs) gpfs-policy#61
- STORAGE: Scratch policy implement GPFS5 cheaha#34
- STORAGE: Cutover apps/modules from GPFS4 to GPFS5 gpfs5-migration#208
- STORAGE: Auto-hydrate all files <= 4 MB into performance tier (resolves conda/git hydration issues)
- GLOBUS: Migrate Shared Collections from GPFS4 to GPFS5 Mapped Collection #695
- REDCAP: Redcap deploy from SOM to OpenStack
- REDCAP: Install Redcap OpenStack extension
- STORAGE: Fix LTS-Globus bug #608
- PUBLICATION: Good 2025 conference paper
- PEARC 2026 paper (Feb 9 due date, unlikely to hit this date) and (May 4 due date for short paper, no changes needed)
- Supercomputing 2026 paper (April 1 due date, unlikely to be accepted due to topic without tech details)
- MUST haves (requirements)
- XNAT: Resolve ongoing data management issues (Update XNAT)
- GITLAB: Update GitLab OS gitlab#12
- SYSTEM: Disaggregate LDAP service from Master Node
- CHEAHA: Disaggregate Slurm service from Master Node
- STORAGE: Ceph Block/LTS/FS (not GPFS) shared quotas and accounting
- CHEAHA: Update Slurm
- CHEAHA: Update Lmod (EasyBuild 5 will require Lmod >=8)
- OOD
- CHEAHA: Update OOD (can get to 3.x)
- CHEAHA:
-ntasks-per-socketon OOD apps for correct multi-GPU usage (test OOD 3.x update first)
- CHEAHA: Node hardware info in slurm.conf (features) rc-data-science/metrics/rc-hardware#2 #230
- CHEAHA: Deprecate Anaconda3 module in favor of conda-forge (license compliance)
- CHEAHA: Add Makevars.site file to R modules on Cheaha to manage "Illegal Instruction" errors #708
- COMMUNICATION: Email sending automation (no more Outlook) #674
- SECURITY: SSO web service alignment everywhere for MFA and security compliance
- SECURITY: Get information from XIAS for security
- (for person X, what is sponsor BlazerID? is X expired? what is X site(s)? is site expired? is URI in site?)
- SECURITY: Security (POAM) milestones (docs, procedures, etc)
- SECURITY: HIPAA attestation
- SECURITY: NIST 800-171 attestation other than NIH (compliant for NIH grants, DUAs go thru OSP, limited to RCS only)
- SECURITY: Automated account state management (this is very broad) #669
- SECURITY: Enable and require 2FA for all systems (includes SSH)
- RCS: Resource entitlement and eligibility tracking (who can have/has which shared allocations) #637
- BRANDING: Branding all relevant sites (docs, OOD, cloud.rc, etc) #624 #556
- ADMIN: Internal staff onboarding procedure(s)
- WANT to haves (higher priority)
- LTS: Unify individual allocation labels to match
blazerid@uab.eduformat. - CHEAHA: SAS Enterprise Guide OOD app cluster-software#139
- CHEAHA: Install most recent SAS
- CHEAHA: Install most recent STATA
- CHEAHA: Prevent/auto-revert Cheaha TLD permission changes
- CHEAHA: Refine experience for TMP and LOCAL dirs on Cheaha nodes #678
- CHEAHA: Cheaha Slurm QoS, priority calculation, etc, review rc-slurm#61 cheaha#44
- CHEAHA: Refine old modules - ties into rebuild CICD #554 (closed) #566 #480 #460 #432 (closed)
- CHEAHA: Limit the number of queued OOD jobs per user #673 comment 1
- DOCS: Migrate docs to GitLab Pages gitlab#7
- NETWORK: Network upgrade beyond 10G link to campus, beyond 40G to scidmz
- METRICS: Cheaha job-specific/process-specific observability #661
- METRICS: Data/Storage observability #661
- RCS: Services status page #627
- STORAGE: Globus Azure Blob Connector #685
- STORAGE: Separate LTS from Core Ceph
- STORAGE: Shared, read-only research data allocation (e.g. alphafold) #680 #654
- STORAGE: Standardize /data/project/ #625 #633 #162 (closed)
- SECURITY: Cheaha observability for security #661
- SECURITY: RCS Account Statemachine and RabbitMQ refactoring rabbitmq_agents#163
- SECURITY: RCS Account Database (user reg db) enhancements #393 #400 #82 #410 #141 #642
- SECURITY: Restrict access to info about users, groups, slurm jobs, etc. cheaha#57
- TRAINING: Researcher training initiatives
- UX: RCS main page
- UX: Expose user reg app content (writing/theme) to facilitation via GitLab account-app#17 #556
- UX: Expose OOD content (writing/theme) to facilitation via GitLab rc-ood-message#1 #556
- UX: Expose OpenStack Horizon content (writing/theme) to facilitation via GitLab #624 #556
- LTS: Unify individual allocation labels to match
- NICE to haves (lower priority)
- PRODUCTIVITY: Pastebin and etherpad as RC apps
- self-hosted pastebin: https://github.com/awesome-selfhosted/awesome-selfhosted#pastebins
- self-hosted
- CHEAHA: Update Singularity #546 cluster-software#115
- CHEAHA: Resolve poor researcher UX around pip and conda behaving unexpectedly #485
- CHEAHA: Internal routing SSH key dedicated storage location
- DOCS: Docs AI chatbot courtesy of UAB Copilot
- DOCS: Docs observability and analytics
- METRICS: Cheaha observability for buy-in model #626 #471
- METRICS: Cloud.rc observability " #661
- METRICS: Network observability " #661
- METRICS: Buffered node exporter data locally for robustness
- FUNDING: Cheaha compute buy-in model and implementation #626 #637
- FUNDING: Cloud.rc buy-in model and implementation #672 #637
- FUNDING: Storage buy-in model and implementation (TiB vs TB is 10%, this matters for dollar costs!) #637
- STORAGE: Offsite data center backup solution and buy-in model (MUST if funded)
- STORAGE: Globus Azure Blob connector for backup and archive solution and buy-in model (MUST if funded)
- STORAGE: Project-parallel scratch directories #653
- STORAGE: Improve LTS Globus UX to handle multiple keys (individual, lab shared, core shared, ...) #684
- SECURITY: CMMC compliant enclave
- PRODUCTIVITY:
pastebinitcommand and self-hosted pastebin equivalent
- PRODUCTIVITY: Pastebin and etherpad as RC apps
- INFOrmational
- SECURITY: IDM Federation (what is our portion?)
Completed
-
STORAGE: Globus configured for GPFS5 (no issue) #694 (closed) -
STORAGE: New accounts created on GPFS5 -
GLOBUS: Understand and implement shared administration group and duties in Globus. -
GITLAB: Migrate data to GPFS5 gitlab#15 -
RCS: RCS now a fully realized platform.
Edited by Fortune Iriaye