2vcf reference v3.0 released

the v3.0 release of the 2vcf reference is, as always, our best attempt yet at balancing the demands of coverage with accuracy and efficiency. this release represents a targeted approach at refining the list of included reference sites. that is, rather than taking the publicly available Illumina manifests for the genotyping arrays used by the public genotyping companies, we took actual examples of call sets from users, and compiled a list of those sites which have actually been observed. we make no claim about the completeness of the reference, as there are some sites which are deliberately left out, but we do claim that the reference VCF is as small as we can make it and still get good coverage of 23andme and Ancestry.com marker sets.

download v3 from https://openb.io/2vcf/2vcf-v2.0.vcf.gz

wget https://openb.io/2vcf/2vcf-v2.0.vcf.gz

the version 3.0 reference, like the v2.0 reference, is based on the dbSNP build 151 VCF. the reference contains 1,006,190 loci across 25 contigs.

contig name marker count
1 80,376
2 80,791
3 65,915
4 58,058
5 58,936
6 66,479
7 53,733
8 51,765
9 45,063
10 52,686
11 49,940
12 49,279
13 37,676
14 32,182
15 29,788
16 31,777
17 28,228
18 29,389
19 19,853
20 24,959
21 14,052
22 14,723
X 27,239
Y 2,862
MT 441

there are a small class of markers that 23andme included calls for, but which disagreed with dbSNP on which chromosome they are located. since we were unable to get help from 23andme and unable to make sense of the situation, those sites in were excluded from the reference.

RSID 23andme chromosome dbSNP b151 chromosome
rs10106770 8 2
rs1140961 1 6
rs1140965 1 3
rs11857958 15 5
rs11861001 16 4
rs11942835 4 3
rs12043679 1 13
rs12496398 3 4
rs12804886 11 8
rs12914236 15 1
rs1347505 Y X
rs1347507 Y X
rs1435909 Y X
rs17863175 7 15
rs2125843 8 3
rs2129709 Y X
rs2215794 Y 1
rs2220162 Y X
rs2229051 1 6
rs2229625 2 X
rs2352696 Y X
rs2433989 Y X
rs2437511 Y X
rs2452115 Y X
rs2452335 Y X
rs2496951 Y X
rs2522620 Y X
rs2522676 Y X
rs2524623 Y X
rs2524749 Y X
rs2524797 Y X
rs2524862 Y X
rs2525234 Y X
rs2557841 Y X
rs2558153 Y X
rs2562967 Y X
rs2563090 Y X
rs2563145 Y X
rs2563212 Y X
rs2563488 Y X
rs2563845 Y X
rs2563850 Y X
rs2574085 Y X
rs2574595 Y X
rs2578863 Y X
rs2580641 Y X
rs2750380 Y X
rs2750610 Y X
rs2750816 Y X
rs2751061 Y X
rs2751444 Y X
rs2751615 Y X
rs2751964 Y X
rs2754895 Y X
rs2754899 Y X
rs2754935 Y X
rs2760594 Y X
rs2766317 Y X
rs2771511 Y X
rs2771662 Y X
rs2771666 Y X
rs2774569 Y X
rs2882725 Y X
rs3021087 MT 1
rs35680999 MT 1
rs3749270 3 6
rs3853041 Y X
rs401949 Y X
rs4714901 6 3
rs61774271 5 1
rs8896 MT 1
rs9785828 Y 8

if you have any suggestions about the anomalous markers, please file an issue at the 2vcf github site. we appreciate your help in improving the utility, along with any other suggestions or issues you may have with the tool.