Attempt 1. Using outbam bams/vcfs from small var valler pipeline test datasets:
verifybamid exits with error as it couldn't find any markers in that datsset.
Attempt 2. Using test bam file from verifybamid repo:
Our ref genome uses chr20 but bam file has 20 contig naming. Also bam is aligned b37. verifybamid results in No reads found in any of the regions, exit! error.
verifybamid exits with error Insufficient Available markers. Note that verifybamid testing uses just the chr20 as ref genome to get around this issue.
Attempt 4. Use a sub-sample of 40x NA12878 bam
Sub-sample 40x NA12878 bam to small-ish fraction.
Failed. Tested verifybamid with bams subsampled at various levels (0.01%, 0.1%, 0.5%). It ran successfully with 0.5% bam, but failed with others due to Insufficient Available markers. 0.5% bam works but it is of size 660Mb!!!!! PS - Full grch38 reference genome was used as reference here.
Somalier vcf doesn't have sample column and this leads to error when using with bcftools stats. So switched to a test dataset provided by bcftools instead.
Capture region bed files are needed to support exome samples. Having unrelated bam and vcf files with each having different genomic regions could spell trouble. So I switched to new test bams and vcfs, which were derived from NA12878 by subsampling in the same genomic region. Test capture-regions bed file was also created.