Stefan White, David Fredman*, Susanna Potter*, Evan Eichler#, Johan den Dunnen, Anthony Brookes*
Human and Clinical Genetics, Leiden University Medical Center, Leiden;
*Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm,
Sweden; #Department of Genetics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio
44106, USA.
There is uncertainty about the true nature of predicted SNPs in segmental duplications
(duplicons) and whether these markers genuinely exist at increased density as indicated in public databases. We explored these issues
by genotyping 157 predicted SNPs in duplicons and control regions in normal diploid genomes and fully homozygous complete hydatidiform moles. Our data
identified many true SNPs in duplicon regions and few paralogous sequence variants. Twenty-eight percent of the polymorphic duplicon sequences we
tested involved multisite variation, a new type of polymorphism representing the sum of the signals from many individual duplicon copies that vary in
sequence content due to duplication, deletion or gene conversion. Multisite variations can masquarade as normal SNPs when genotyped. Given that
duplicons comprise at least 5% of the genome and many are yet to be annotated in the genome draft, effective strategies to identify multisite
variation must be established and deployed.