QuaC - First major review
QuaC pipeline has been modified to make it user-friendly and share interface similar to other pipelines at CGDS.
Note that one major deviation compared to our other pipeline is that QuaC uses recent snakemake version (6.0.5), and they are made available via conda environment. That is, we don't use snakemake
module from cheaha.
The following may need review at some level. Note: This list is not exhaustive.
-
Snakemake pipeline -
Documentation. I kept it simple; let me know if it needs more info.
Merge request reports
Activity
- Resolved by Brandon M Wilk
Test datasets are currently not included. I plan to add them later (#4 (closed)). Please let me know if you think they are important in this MR.
- Resolved by Brandon M Wilk
It may be a good idea to look at the output directory structure. We need to decide on where to store these results of this pipeline in long-term. That is, should these results be stored at project level in
worthey_labs
projects space, or somewhere outside of that setup.
- Resolved by Brandon M Wilk
Snakemake scripts in this repo are formatted using snakefmt with line length restriction as 110. There are few instances this linter makes weird decisions, but overall I like their formatting.
If you like their formatting, it may be a good idea to look into including it to utility-images. Since snakefmt is a a standalone tool and is easy to install via pip, it shouldn't be too hard to achieve this.
- Resolved by Brandon M Wilk
Singularity containerization is not used atm in QuaC. It may be a good idea to add them now. Initially I chose to not use singularity to keep it simple, and planned to use a simple shell script as the primary interface to snakemake pipeline (conceptually similar to what was implemented in variant annotation pipeline for Ditto).
But as QuaC became more complex, shell script interface wasn't sufficient. So I moved on to add the python "runner script" to interface to the pipeline, and therefore it shouldn't be difficult to incorporate containerization.
One beneficial side effect is that we could check out how recent snakemake version deals with conda + singularity issue - https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/small_variant_caller_pipeline/-/issues/37
- Resolved by Brandon M Wilk
I noticed that there's no mention of the use of
src/create_dummy_ped.py
anywhere. I know from discussion it's a stop-gap in place for generating pedigree files until we can start using Phenotips for that information. It looks like it needs a little clean up, formatting and maybe a simple CLI setup (nothing crazy) since it may be that we'll use it for at least a little while. Maybe just mention that it's there for help somewhere in the README?
- Resolved by Brandon M Wilk
- Resolved by Brandon M Wilk
@manag this is looking good and I'm done with the initial review! I did not try running everything yet knowing that some major things are going to change with a couple more MRs in the near future. If you think this is needed let me know.
mentioned in issue #6 (closed)
mentioned in commit 00cd5438