欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

【生信分析笔记】GATK GenotypeConcordance:Sequence dictionaries are not the same size (25, 195)

程序员文章站 2024-03-02 08:06:22
...

分析时需要将数据与GIAB金标准集进行比较,GenotypeConcordance有报错:

my script:
time /gatk/4.1.7.0/gatk GenotypeConcordance \
--CALL_VCF myvcf.gz \
--TRUTH_VCF /GIAB/ftp-trace.ncbi.nih.gov/giab/ftp/release/AshkenazimTrio/HG002_NA24385_son/latest/GRCh38/HG002_GRCh38_GIAB_highconf_CG-Illfb-IllsentieonHC-Ion-10XsentieonHC-SOLIDgatkHC_CHROM1-22_v.3.3.2_highconf_triophased.vcf.gz \
--CALL_SAMPLE HG002 \
--TRUTH_SAMPLE HG002 \
--IGNORE_FILTER_STATUS true \
--OUTPUT HG002.GenotypeCocordance \
&& echo " HG002 GenotypeCocordance done" &

ERROR:
Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (25, 195) 

 

Sequence dictionaries are not the same size (25, 195) 

emmmm ...看见25这个数字就想到了chr1-22 X,Y,M 是25个chorm信息,检查了一下myvcf.gz 和标准级里header信息:

myvcf.gz header  里包括chr14_GL000194v1_random这类线粒体及其他片段,共195个

而标准级里header只有chr1-22 X,Y,M

$less myvcf.gz|grep "ID=chr" |wc -l
195

所以只需要把myvcf里的ID=chr信息替换掉标准集里的ID=chr信息。

将造好的header.txt 用bcftools reheader 工具更换即可。

 

参考:

https://gatk.broadinstitute.org/hc/en-us/articles/360036882311-GenotypeConcordance-Picard-