ENCODE HiC Installation and Setup
Referencing ticket RITM0714040 for Manuel Rosa-Garrido. Manuel needs to run an altered version of the HiC pipeline on Cheaha. Prior debugging on the standard HiC pipeline was not fruitful so a workstation was used instead. The workstation isn't sufficient for the current analysis so we needed to get this version of the pipeline up and running on Cheaha. This snippet describes setup steps specifically for running ENCODE's HiC pipeline on Cheaha.
name: encode
channels:
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- alsa-lib=1.2.3.2=h166bdaf_0
- bzip2=1.0.8=h5eee18b_6
- ca-certificates=2024.7.2=h06a4308_0
- cairo=1.16.0=hb05425b_5
- expat=2.6.2=h6a678d5_0
- fontconfig=2.14.1=h4c34cd2_2
- freetype=2.12.1=h4a9f257_0
- giflib=5.2.1=h5eee18b_3
- glib=2.78.4=h6a678d5_0
- glib-tools=2.78.4=h6a678d5_0
- graphite2=1.3.14=h295c915_1
- harfbuzz=2.8.1=h6f93f22_0
- icu=58.2=he6710b0_3
- jpeg=9e=h5eee18b_1
- lcms2=2.12=h3be6417_0
- ld_impl_linux-64=2.38=h1181459_1
- lerc=3.0=h295c915_0
- libdeflate=1.17=h5eee18b_1
- libffi=3.4.4=h6a678d5_1
- libgcc-ng=11.2.0=h1234567_1
- libglib=2.78.4=hdc74915_0
- libgomp=11.2.0=h1234567_1
- libiconv=1.16=h5eee18b_3
- libpng=1.6.39=h5eee18b_0
- libstdcxx-ng=11.2.0=h1234567_1
- libtiff=4.5.1=h6a678d5_0
- libuuid=1.41.5=h5eee18b_0
- libwebp-base=1.3.2=h5eee18b_0
- libxcb=1.15=h7f8727e_0
- libxml2=2.10.4=hcbfbd50_0
- lz4-c=1.9.4=h6a678d5_1
- ncurses=6.4=h6a678d5_0
- openjdk=11.0.9.1=h5cc2fde_1
- openssl=3.0.14=h5eee18b_0
- pcre2=10.42=hebb0a14_1
- pip=24.0=py311h06a4308_0
- pixman=0.40.0=h7f8727e_1
- python=3.11.9=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=69.5.1=py311h06a4308_0
- sqlite=3.45.3=h5eee18b_0
- tk=8.6.14=h39e8969_0
- wheel=0.43.0=py311h06a4308_0
- xorg-fixesproto=5.0=h7f98852_1002
- xorg-inputproto=2.3.2=h7f98852_1002
- xorg-kbproto=1.0.7=h7f98852_1002
- xorg-libx11=1.7.2=h7f98852_0
- xorg-libxext=1.3.4=h7f98852_1
- xorg-libxfixes=5.0.3=h7f98852_1004
- xorg-libxi=1.7.10=h7f98852_0
- xorg-libxrender=0.9.10=h7f98852_1003
- xorg-libxtst=1.2.3=h7f98852_1002
- xorg-recordproto=1.14.2=h7f98852_1002
- xorg-renderproto=0.11.1=h7f98852_1002
- xorg-xextproto=7.3.0=h7f98852_1002
- xorg-xproto=7.0.31=h27cfd23_1007
- xz=5.4.6=h5eee18b_1
- zlib=1.2.13=h5eee18b_1
- zstd=1.5.5=hc292b87_2
- pip:
- argcomplete==3.4.0
- autouri==0.4.4
- awscli==1.33.26
- boto3==1.34.144
- botocore==1.34.144
- bullet==2.2.0
- cachetools==5.4.0
- caper==2.3.2
- certifi==2024.7.4
- cffi==1.16.0
- charset-normalizer==3.3.2
- colorama==0.4.6
- coloredlogs==15.0.1
- contourpy==1.2.1
- cryptography==42.0.8
- cycler==0.12.1
- dateparser==1.2.0
- docker==7.1.0
- docutils==0.16
- filelock==3.15.4
- fonttools==4.53.1
- google-api-core==2.19.1
- google-auth==2.32.0
- google-cloud-core==2.4.1
- google-cloud-storage==2.17.0
- google-crc32c==1.5.0
- google-resumable-media==2.7.1
- googleapis-common-protos==1.63.2
- humanfriendly==10.0
- idna==3.7
- importlib-metadata==8.0.0
- jmespath==1.0.1
- joblib==1.4.2
- kiwisolver==1.4.5
- lark==1.1.9
- matplotlib==3.9.1
- miniwdl==1.12.0
- ntplib==0.4.0
- numpy==2.0.0
- packaging==24.1
- pandas==2.2.2
- pillow==10.4.0
- proto-plus==1.24.0
- protobuf==5.27.2
- psutil==5.9.8
- pyasn1==0.6.0
- pyasn1-modules==0.4.0
- pycparser==2.22
- pygtail==0.14.0
- pyhocon==0.3.61
- pyopenssl==24.1.0
- pyparsing==3.1.2
- python-dateutil==2.9.0.post0
- python-json-logger==2.0.7
- pytz==2024.1
- pyyaml==6.0.1
- regex==2024.5.15
- requests==2.32.3
- rsa==4.7.2
- s3transfer==0.10.2
- scikit-learn==1.5.1
- scipy==1.14.0
- six==1.16.0
- threadpoolctl==3.5.0
- tzdata==2024.1
- tzlocal==5.2
- urllib3==2.2.2
- xdg==6.0.0
- zipp==3.19.2
Clone HiC Repo
This may not be necessary, but running the various tests to get a feel for the inputs and outputs is nice.
# if using git ssh key
git clone git@github.com:ENCODE-DCC/hic-pipeline.git
Environment Setup
The HiC pipeline is controlled by the Caper job manager. Caper is available on PyPi and only requires an additional installation of the Java Development Kit to function. Java can also be installed via conda
for ease of use, or the Java module can be loaded on Cheaha. The following commands will set up an encode
environment. The env.yml
is included in this snippet.
module load Anaconda3
conda create -n encode -f env.yml
After the environment is created, you only need to activate the environment to run caper
now and in the future.
conda activate encode
Caper Configuration
Some initial configuration is needed for Caper to run correctly. First, activate the conda environment using the command above. Then run the following:
caper init slurm
This will set up some configuration files for caper. You will need to change the $HOME/.caper/default.conf
file to have the following:
backend=slurm
slurm-partition=amd-hdr100,long
slurm-leader-job-resource-param=-t 150:00:00 --mem 8G
local-loc-dir=/scratch/<username>/caper_cache
cromwell=/home/<username>/.caper/cromwell_jar/cromwell-82.jar
womtool=/home/<username>/.caper/womtool_jar/womtool-82.jar
Replace <username>
with your username before running Caper. It does not look like it expands bash environment variables in the paths, so using $USER_SCRATCH
or $USER
does not work.
This default configuration will submit all jobs to the amd-hdr100
or long
partitions. All of the nodes in that partition have enough resources to run each job in the test pipeline, but these are subject to change based on the analysis.
Running Caper Tests
Change directory to wherever you would like the outputs to be saved. This section assumes you have cloned the pipeline repository, are in the top level directory for the repo, and are running the general HiC test.
caper hpc submit hic.wdl -i tests/functional/json/test_hic.json --singularity --leader-job-name test_encode_hic
This will submit a leader job that manages the other jobs in the pipeline. You can monitor the status of the child jobs using squeue -u $USER
.