ABSTRACT
The uneven recombination structure of human DNA has been highlighted by several recent studies. Knowledge of the haplotype blocks generated by this phenomenon can be applied to dramatically increase the statistical power of genetic mapping. Several criteria have already been proposed for identifying these blocks, all of which require haplotypes as input. We propose a comprehensive statistical model of haplotype block variation and show how the parameters of this model can be learned from haplotypes and/or unphased genotype data. Using real-world SNP data, we demonstrate that our approach can be used to resolve genotypes into their constituent haplotypes with greater accuracy than previously known methods.
- Goldstein D. B. Islands of linkage disequilibrium. Nature Genetics, 29(2):109--11, 2001.Google ScholarCross Ref
- Jeffreys A. et al. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics, 29(2):217--222, 2001.Google ScholarCross Ref
- Daly M. J. et al. High-resolution haplotype structure in the human genome. Nature Genetics, 29(2):229--32, 2001.Google ScholarCross Ref
- Patil N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294(5547):1719--23, 2001.Google ScholarCross Ref
- Gabriel S. B. et al. The Structure of Haplotype Blocks in the Human Genome. Science, 296(5576):2225--9, 2002.Google ScholarCross Ref
- Zhang K. et al. A dynamic programming algorithm for haplotype block partitioning. PNAS USA, 99(11):7335--9, 2002.Google ScholarCross Ref
- Michalatos-Beloin S. et al. Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR. Nucleic Acids Research, 24(23):4841--3, 1996.Google ScholarCross Ref
- Woolley A. T. et al. Direct haplotyping of kilobase-size DNA using carbon nanotube probes. Nature Biotechnology, 18(7):760--3, 2000.Google ScholarCross Ref
- Lizardi P. M. et al. Mutation detection and single-molecule counting using isothermal rolling-circle amplification. Nature Genetics, 19(3):225--32, 1999.Google ScholarCross Ref
- Douglas J. A. et al. Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies. Nature Genetics, 28(4):361--4, 2001.Google ScholarCross Ref
- Clark A. G. Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution, 7(2):111--22, 1990.Google Scholar
- Gusfield D. Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of Computational Biology, 8(3):305--23, 2001.Google ScholarCross Ref
- Excoffier L. & Slatkin M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution, 12(5):921--7, 1995.Google Scholar
- Long J. C. et al. An E-M algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human Genetics, 56(3):799--810, 1995.Google Scholar
- Templeton A. R. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. II. The analysis of natural populations. Genetics, 120:1145--1154, 1988.Google Scholar
- Stephens M. et al. A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68(4):978--89, 2001.Google ScholarCross Ref
- Niu T. et al. Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. American Journal of Human Genetics, 70(1):157--69, 2002.Google ScholarCross Ref
- Pearl J. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA, 2nd edition, 1988. Google ScholarDigital Library
- Jensen F. V. An Introduction to Bayesian Networks. Springer Verlag, New York, NY, 1996. Google ScholarDigital Library
- Dechter R. Bucket elimination: A unifying framework for probabilistic inference. In Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence (UAI-96), pages 211--219, August 1--4 1996. Google ScholarDigital Library
- Lauritzen S. L. The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19:191--201, 1995. Google ScholarDigital Library
- G. H. Hardy. Mendelian proportions in a mixed population. Science, 18:49--50, 1908.Google ScholarCross Ref
- Templeton A. R. et al. Recombinational and mutational hotspots within the human lipoprotein lipase gene. American Journal of Human Genetics, 66(1):69--83, 2000.Google ScholarCross Ref
- Fullerton S. et al. Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. American Journal of Human Genetics, 67(4):881--900, 2000.Google ScholarCross Ref
- Nachman M.W. & Crowell S.L. Estimate of the mutation rate per nucleotide in humans. Genetics, 156(1):297--304, 2000.Google Scholar
- Rissanen J. Modeling by shortest data description. Automatica, 14:465--471, 1978.Google ScholarDigital Library
- Schwarz, G. Estimating the dimension of a model. Annals of Statistics, 6(2):461--4, 1978.Google ScholarCross Ref
- Shannon C. E. A mathematical theory of communication. Bell Systems Technical Journal, 27:379--423, 623--656, 1948.Google ScholarCross Ref
- Rissanen J. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11:416--431, 1983.Google ScholarCross Ref
- Ardlie K.G. et al. Patterns of linkage disequilibrium in the human genome. Nature Reviews Genetics, 3(7):299--309, 2002.Google ScholarCross Ref
- Rieder M. J. et al. Sequence variation in the human angiotensin converting enzyme. Nature Genetics, 22(1):59--62, 1999.Google ScholarCross Ref
Index Terms
- Model-based inference of haplotype block variation
Recommendations
Maximum likelihood resolution of multi-block genotypes
RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biologyWe present a new algorithm for the problems of genotype phasing and block partitioning. Our algorithm is based on a new stochastic model, and on the novel concept of probabilistic common haplotypes. We formulate the goals of genotype resolving and block ...
Dynamic programming algorithms for haplotype block partitioning: applications to human chromosome 21 haplotype data
RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biologyRecent studies have shown that the human genome has a haplotype block structure such that it can be divided into discrete blocks of limited haplotype diversity. Patil et al. [6] and Zhang et al. [12] developed algorithms to partition haplotypes into ...
Haplotype reconstruction from SNP alignment
RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biologyIn this paper we describe a method for statistical reconstruction of haplotypes from a set of aligned SNP fragments. We consider the case of a pair of homologous human chromosomes, one from the mother and the other from the father. After fragment ...
Comments