As a an alphafold user on Cheaha, the alphafold genetic data should be available in a shared way so I don't have to download 2.5 TB
Problem
AlphaFold genetic databases are about 2.5 TB uncompressed. Every alphafold researcher must pull this data into their personal space. This takes time and consumes substantial storage (currently estimated at about 20-30 TB across Cheaha).
Solution
Create a shared storage location for common research data, like the AlphaFold genetic databases. This will reduce redundant storage consumption and increase researcher velocity.
Proposal
In approximate order of execution.
- Create
/data/project/shared-data/shared storage allocation. - Create
/data/project/shared-data/alphafold/directory and install genetic databases. - Assign
chmod o=rrecursively. - Create process for researcher maintenance support requests and internal periodic maintenance.
- Create process for researcher "add this new data to shared-data" support requests.
- Publish how-to guide on RF docs covering all aspects of availability, usage, support, maintenance.
- Announce availability.
Open Questions
- Should we have this be under the purview of
builduser? A newshared-datauser? I believe the latter is preferable. - Who is responsible for maintenance? Support? The data itself? Ensuring access controls? This all sounds like the job of a data steward.
- What other access controls should be considered?
- What about shared data we are allowed to rehost, but requires individual acceptance of terms (no examples yet)?
- Anything else?