We receive WGS samples with high mean GC content (obtained from qualimap) rather frequently, but it is not clear what is causing samples to have high GC. We also do not know what are their consequences in downstream analysis.
Note: This is not a QuaC issue; instead this has to do with sample QC.
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
I looked at chr1 coverage for Musc*** samples with high GC. They both had variable coverage across chromosome length, compared to expected coverage at ~1.0. While this indexcov figure shows only two samples (LW001647 and LW001654 - these are part of Pad** samples) with high GC, such observation is common for other samples with high GC as well. LW001643, which has normal GC, is shown for reference here with coverage around ~1.0.
Such coverage variability can also be seen in coverage across reference. Plots below were obtained from qualimap. Note how coverage (red line) is shaky for those with high GC.
I wasn't much successful trying to find literature on this topic. Indexcov paper highlights a sample with high coverage variability, and it notes that "samples like this one will have many spurious CNV calls"; however it doesn't discuss the cause of high coverage variability.
While I think atm that high GC content might not have significant effect on small variant calling (not convinced fully though!), I expect them to cause issues with other types of variant calls. We need to revisit this topic at some point.
I was curious if Musc** samples tended to have high %GC. Did some analysis but results don't support this notion (on quick look at least). Well, UAB samples do but not Pad*** samples.
Code here in Cheaha: /data/project/worthey_lab/projects/experimental_pipelines/mana/small_tasks/qc_highGC