Reducing data duplication on `/data/`
Some frequently-used, large research datasets have permissive licenses. We can avoid data duplication by mirroring these data on /data/
, publicly accessible, read-only.
A specific example is the AlphaFold genetic databases. If we had these in a central location, such as /data/public/alphafold
, then only 3 TB would be used, instead of 3 TB times the number of users using AlphaFold.