fbpx
Wikipedia

Coalescent theory

Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almost exponentially back in time (with wide variance). Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence of mutations in these alleles.

The mathematical theory of the coalescent was developed independently by several groups in the early 1980s as a natural extension of classical population genetics theory and models,[1][2][3][4] but can be primarily attributed to John Kingman.[5] Advances in coalescent theory include recombination, selection, overlapping generations and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis.

The model can be used to produce many theoretical genealogies, and then compare observed data to these simulations to test assumptions about the demographic history of a population. Coalescent theory can be used to make inferences about population genetic parameters, such as migration, population size and recombination.

Theory Edit

Time to coalescence Edit

Consider a single gene locus sampled from two haploid individuals in a population. The ancestry of this sample is traced backwards in time to the point where these two lineages coalesce in their most recent common ancestor (MRCA). Coalescent theory seeks to estimate the expectation of this time period and its variance.

The probability that two lineages coalesce in the immediately preceding generation is the probability that they share a parental DNA sequence. In a population with a constant effective population size with 2Ne copies of each locus, there are 2Ne "potential parents" in the previous generation. Under a random mating model, the probability that two alleles originate from the same parental copy is thus 1/(2Ne) and, correspondingly, the probability that they do not coalesce is 1 − 1/(2Ne).

At each successive preceding generation, the probability of coalescence is geometrically distributed—that is, it is the probability of noncoalescence at the t − 1 preceding generations multiplied by the probability of coalescence at the generation of interest:

 

For sufficiently large values of Ne, this distribution is well approximated by the continuously defined exponential distribution

 

This is mathematically convenient, as the standard exponential distribution has both the expected value and the standard deviation equal to 2Ne. Therefore, although the expected time to coalescence is 2Ne, actual coalescence times have a wide range of variation. Note that coalescent time is the number of preceding generations where the coalescence took place and not calendar time, though an estimation of the latter can be made multiplying 2Ne with the average time between generations. The above calculations apply equally to a diploid population of effective size Ne (in other words, for a non-recombining segment of DNA, each chromosome can be treated as equivalent to an independent haploid individual; in the absence of inbreeding, sister chromosomes in a single individual are no more closely related than two chromosomes randomly sampled from the population). Some effectively haploid DNA elements, such as mitochondrial DNA, however, are only passed on by one sex, and therefore have one quarter the effective size of the equivalent diploid population (Ne/2)

Neutral variation Edit

Coalescent theory can also be used to model the amount of variation in DNA sequences expected from genetic drift and mutation. This value is termed the mean heterozygosity, represented as  . Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any "event" at that generation (either a mutation or a coalescence). The probability that the event is a mutation is the probability of a mutation in either of the two lineages:  . Thus the mean heterozygosity is equal to

 

For  , the vast majority of allele pairs have at least one difference in nucleotide sequence.

Extensions Edit

There are numerous extensions to the coalescent model, such as the Λ-coalescent which allows for the possibility of multifurcations[6].

Graphical representation Edit

Coalescents can be visualised using dendrograms which show the relationship of branches of the population to each other. The point where two branches meet indicates a coalescent event.

Applications Edit

Disease gene mapping Edit

The utility of coalescent theory in the mapping of disease is slowly gaining more appreciation; although the application of the theory is still in its infancy, there are a number of researchers who are actively developing algorithms for the analysis of human genetic data that utilise coalescent theory.[7][8][9]

A considerable number of human diseases can be attributed to genetics, from simple Mendelian diseases like sickle-cell anemia and cystic fibrosis, to more complicated maladies like cancers and mental illnesses. The latter are polygenic diseases, controlled by multiple genes that may occur on different chromosomes, but diseases that are precipitated by a single abnormality are relatively simple to pinpoint and trace – although not so simple that this has been achieved for all diseases. It is immensely useful in understanding these diseases and their processes to know where they are located on chromosomes, and how they have been inherited through generations of a family, as can be accomplished through coalescent analysis.[1]

Genetic diseases are passed from one generation to another just like other genes. While any gene may be shuffled from one chromosome to another during homologous recombination, it is unlikely that one gene alone will be shifted. Thus, other genes that are close enough to the disease gene to be linked to it can be used to trace it.[1]

Polygenic diseases have a genetic basis even though they don't follow Mendelian inheritance models, and these may have relatively high occurrence in populations, and have severe health effects. Such diseases may have incomplete penetrance, and tend to be polygenic, complicating their study. These traits may arise due to many small mutations, which together have a severe and deleterious effect on the health of the individual.[2]

Linkage mapping methods, including Coalescent theory can be put to work on these diseases, since they use family pedigrees to figure out which markers accompany a disease, and how it is inherited. At the very least, this method helps narrow down the portion, or portions, of the genome on which the deleterious mutations may occur. Complications in these approaches include epistatic effects, the polygenic nature of the mutations, and environmental factors. That said, genes whose effects are additive carry a fixed risk of developing the disease, and when they exist in a disease genotype, they can be used to predict risk and map the gene.[2] Both regular the coalescent and the shattered coalescent (which allows that multiple mutations may have occurred in the founding event, and that the disease may occasionally be triggered by environmental factors) have been put to work in understanding disease genes.[1]

Studies have been carried out correlating disease occurrence in fraternal and identical twins, and the results of these studies can be used to inform coalescent modeling. Since identical twins share all of their genome, but fraternal twins only share half their genome, the difference in correlation between the identical and fraternal twins can be used to work out if a disease is heritable, and if so how strongly.[2]

The genomic distribution of heterozygosity Edit

The human single-nucleotide polymorphism (SNP) map has revealed large regional variations in heterozygosity, more so than can be explained on the basis of (Poisson-distributed) random chance.[10] In part, these variations could be explained on the basis of assessment methods, the availability of genomic sequences, and possibly the standard coalescent population genetic model. Population genetic influences could have a major influence on this variation: some loci presumably would have comparatively recent common ancestors, others might have much older genealogies, and so the regional accumulation of SNPs over time could be quite different. The local density of SNPs along chromosomes appears to cluster in accordance with a variance to mean power law and to obey the Tweedie compound Poisson distribution.[11] In this model the regional variations in the SNP map would be explained by the accumulation of multiple small genomic segments through recombination, where the mean number of SNPs per segment would be gamma distributed in proportion to a gamma distributed time to the most recent common ancestor for each segment.[12]

History Edit

Coalescent theory is a natural extension of the more classical population genetics concept of neutral evolution and is an approximation to the Fisher–Wright (or Wright–Fisher) model for large populations. It was discovered independently by several researchers in the 1980s.[13][14][15][16]

Software Edit

A large body of software exists for both simulating data sets under the coalescent process as well as inferring parameters such as population size and migration rates from genetic data.

  • BEAST and BEAST 2Bayesian inference package via MCMC with a wide range of coalescent models including the use of temporally sampled sequences.[17]
  • BPP – software package for inferring phylogeny and divergence times among populations under a multispecies coalescent process.
  • – software for simulating genetic data under the coalescent model.
  • DIYABC – a user-friendly approach to ABC for inference on population history using molecular markers.[18]
  • DendroPy – a Python library for phylogenetic computing, with classes and methods for simulating pure (unconstrained) coalescent trees as well as constrained coalescent trees under the multispecies coalescent model (i.e., "gene trees in species trees").
  • – software for the fine-scale mapping of linkage disequilibrium mapping of disease genes using coalescent theory based on a Bayesian MCMC framework.
  • genetree software for estimation of population genetics parameters using coalescent theory and simulation (the R package "popgen"). See also
  • GENOME – rapid coalescent-based whole-genome simulation[19]
  • IBDSim – a computer package for the simulation of genotypic data under general isolation by distance models.[20]
  • – IMa implements the same Isolation with Migration model, but does so using a new method that provides estimates of the joint posterior probability density of the model parameters. IMa also allows log likelihood ratio tests of nested demographic models. IMa is based on a method described in Hey and Nielsen (2007 PNAS 104:2785–2790). IMa is faster and better than IM (i.e. by virtue of providing access to the joint posterior density function), and it can be used for most (but not all) of the situations and options that IM can be used for.
  • Lamarc – software for estimation of rates of population growth, migration, and recombination.
  • Migraine – a program which implements coalescent algorithms for a maximum likelihood analysis (using Importance Sampling algorithms) of genetic data with a focus on spatially structured populations.[21]
  • Migrate – maximum likelihood and Bayesian inference of migration rates under the n-coalescent. The inference is implemented using MCMC
  • MaCS – Markovian Coalescent Simulator – simulates genealogies spatially across chromosomes as a Markovian process. Similar to the SMC algorithm of McVean and Cardin, and supports all demographic scenarios found in Hudson's ms.
  • ms & msHOT – Richard Hudson's original program for generating samples under neutral models[22] and an extension which allows recombination hotspots.[23]
  • msms – an extended version of ms that includes selective sweeps.[24]
  • msprime – a fast and scalable ms-compatible simulator, allowing demographic simulations, producing compact output files for thousands or millions of genomes.
  • Recodon and NetRecodon – software to simulate coding sequences with inter/intracodon recombination, migration, growth rate and longitudinal sampling.[25][26]
  • CoalEvol and SGWE – software to simulate nucleotide, coding and amino acid sequences under the coalescent with demographics, recombination, population structure with migration and longitudinal sampling.[27]
  • – structure Ancestral Recombination Graph by Magnus Nordborg
  • simcoal2 – software to simulate genetic data under the coalescent model with complex demography and recombination
  • TreesimJ – forward simulation software allowing sampling of genealogies and data sets under diverse selective and demographic models.

References Edit

  1. ^ a b c Morris, A., Whittaker, J., & Balding, D. (2002). Fine-Scale Mapping of Disease Loci via Shattered Coalescent Modeling of Genealogies. The American Journal of Human Genetics, 70(3), 686–707. doi:10.1086/339271
  2. ^ a b c Rannala, B. (2001). Finding genes influencing susceptibility to complex diseases in the post-genome era. American journal of pharmacogenomics, 1(3), 203–221.

Sources Edit

Articles Edit

  • ^ Arenas, M. and Posada, D. (2014) Simulation of Genome-Wide Evolution under Heterogeneous Substitution Models and Complex Multispecies Coalescent Histories. Molecular Biology and Evolution 31(5): 1295–1301
  • ^ Arenas, M. and Posada, D. (2007) Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography. BMC Bioinformatics 8: 458
  • ^ Arenas, M. and Posada, D. (2010) Coalescent simulation of intracodon recombination. Genetics 184(2): 429–437
  • ^ Browning, S.R. (2006) Multilocus association mapping using variable-length markov chains. American Journal of Human Genetics 78:903–913
  • ^ Cornuet J.-M., Pudlo P., Veyssier J., Dehne-Garcia A., Gautier M., Leblois R., Marin J.-M., Estoup A. (2014) DIYABC v2.0: a software to make Approximate Bayesian Computation inferences about population history using Single Nucleotide Polymorphism, DNA sequence and microsatellite data.
  • ^ Degnan, JH and LA Salter. 2005. Gene tree distributions under the coalescent process. Evolution 59(1): 24–37.
  • ^ Donnelly, P., Tavaré, S. (1995) Coalescents and genealogical structure under neutrality. Annual Review of Genetics 29:401–421
  • ^ Drummond A, Suchard MA, Xie D, Rambaut A (2012). "Bayesian phylogenetics with BEAUti and the BEAST 1.7". Molecular Biology and Evolution. 29 (8): 1969–1973. doi:10.1093/molbev/mss075. PMC 3408070. PMID 22367748.
  • ^ Ewing, G. and Hermisson J. (2010), MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus, Bioinformatics 26:15
  • ^ Hellenthal, G., Stephens M. (2006) msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots
  • ^ Hudson, Richard R. (1983a). "Testing the Constant-Rate Neutral Allele Model with Protein Sequence Data". Evolution. 37 (1): 203–17. doi:10.2307/2408186. ISSN 1558-5646. JSTOR 2408186. PMID 28568026.
  • ^ Hudson RR (1983b) Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology 23:183–201.
  • ^ Hudson RR (1991) Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7: 1–44
  • ^ Hudson RR (2002) Generating samples under a Wright–Fisher neutral model.
  • ^ Kendal WS (2003) An exponential dispersion model for the distribution of human single nucleotide polymorphisms. Mol Biol Evol 20: 579–590
  • Hein, J., Schierup, M., Wiuf C. (2004) Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory Oxford University Press ISBN 978-0-19-852996-5
  • ^ Kaplan, N.L., Darden, T., Hudson, R.R. (1988) The coalescent process in models with selection. Genetics 120:819–829
  • ^ Kingman, J. F. C. (1982). "On the Genealogy of Large Populations". Journal of Applied Probability. 19: 27–43. CiteSeerX 10.1.1.552.1429. doi:10.2307/3213548. ISSN 0021-9002. JSTOR 3213548. S2CID 125055288.
  • ^ Kingman, J.F.C. (2000) Origins of the coalescent 1974–1982. Genetics 156:1461–1463
  • ^ Leblois R., Estoup A. and Rousset F. (2009) IBDSim: a computer program to simulate genotypic data under isolation by distance
  • ^ Liang L., Zöllner S., Abecasis G.R. (2007) GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics
  • ^ Mailund, T., Schierup, M.H., Pedersen, C.N.S., Mechlenborg, P. J. M., Madsen, J.N., Schauser, L. (2005) CoaSim: A Flexible Environment for Simulating Genetic Data under Coalescent Models BMC Bioinformatics 6:252
  • ^ Möhle, M., Sagitov, S. (2001) A classification of coalescent processes for haploid exchangeable population models The Annals of Probability 29:1547–1562
  • ^ Morris, A. P., Whittaker, J. C., Balding, D. J. (2002) Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies American Journal of Human Genetics 70:686–707
  • ^ Neuhauser, C., Krone, S.M. (1997) The genealogy of samples in models with selection Genetics 145 519–534
  • ^ Pitman, J. (1999) Coalescents with multiple collisions The Annals of Probability 27:1870–1902
  • ^ Harding, Rosalind, M. 1998. New phylogenies: an introductory look at the coalescent. pp. 15–22, in Harvey, P. H., Brown, A. J. L., Smith, J. M., Nee, S. New uses for new phylogenies. Oxford University Press (ISBN 0198549849)
  • ^ Rosenberg, N.A., Nordborg, M. (2002) Genealogical Trees, Coalescent Theory and the Analysis of Genetic Polymorphisms. Nature Reviews Genetics 3:380–390
  • ^ Sagitov, S. (1999) The general coalescent with asynchronous mergers of ancestral lines Journal of Applied Probability 36:1116–1125
  • ^ Schweinsberg, J. (2000) Coalescents with simultaneous multiple collisions Electronic Journal of Probability 5:1–50
  • ^ Slatkin, M. (2001) Simulating genealogies of selected alleles in populations of variable size Genetic Research 145:519–534
  • ^ Tajima, F. (1983) Evolutionary Relationship of DNA Sequences in finite populations. Genetics 105:437–460
  • ^ Tavare S, Balding DJ, Griffiths RC & Donnelly P. 1997. Inferring coalescent times from DNA sequence data. Genetics 145: 505–518.
  • ^ The international SNP map working group. 2001. A map of human genome variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928–933.
  • ^ Zöllner S. and Pritchard J.K. (2005) Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci
  • ^ Rousset F. and Leblois R. (2007) Likelihood and Approximate Likelihood Analyses of Genetic Structure in a Linear Habitat: Performance and Robustness to Model Mis-Specification

Books Edit

  • Hein, J; Schierup, M. H., and Wiuf, C. Gene Genealogies, Variation and Evolution – A Primer in Coalescent Theory. Oxford University Press, 2005. ISBN 0-19-852996-1.
  • Nordborg, M. (2001) Introduction to Coalescent Theory
  • Chapter 7 in Balding, D., Bishop, M., Cannings, C., editors, Handbook of Statistical Genetics. Wiley ISBN 978-0-471-86094-5
  • Wakeley J. (2006) An Introduction to Coalescent Theory Roberts & Co ISBN 0-9747077-5-9
  • ^ Rice SH. (2004). Evolutionary Theory: Mathematical and Conceptual Foundations. Sinauer Associates: Sunderland, MA. See esp. ch. 3 for detailed derivations.
  • Berestycki N. "Recent progress in coalescent theory" 2009 ENSAIOS Matematicos vol.16
  • Bertoin J. "Random Fragmentation and Coagulation Processes"., 2006. Cambridge Studies in Advanced Mathematics, 102. Cambridge University Press, Cambridge, 2006. ISBN 978-0-521-86728-3;
  • Pitman J. "Combinatorial stochastic processes" Springer (2003)

External links Edit

  • — overview, with probability equations for genetic drift, and simulation graphs

coalescent, theory, model, alleles, sampled, from, population, have, originated, from, common, ancestor, simplest, case, coalescent, theory, assumes, recombination, natural, selection, gene, flow, population, structure, meaning, that, each, variant, equally, l. Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor In the simplest case coalescent theory assumes no recombination no natural selection and no gene flow or population structure meaning that each variant is equally likely to have been passed from one generation to the next The model looks backward in time merging alleles into a single ancestral copy according to a random process in coalescence events Under this model the expected time between successive coalescence events increases almost exponentially back in time with wide variance Variance in the model comes from both the random passing of alleles from one generation to the next and the random occurrence of mutations in these alleles The mathematical theory of the coalescent was developed independently by several groups in the early 1980s as a natural extension of classical population genetics theory and models 1 2 3 4 but can be primarily attributed to John Kingman 5 Advances in coalescent theory include recombination selection overlapping generations and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis The model can be used to produce many theoretical genealogies and then compare observed data to these simulations to test assumptions about the demographic history of a population Coalescent theory can be used to make inferences about population genetic parameters such as migration population size and recombination Contents 1 Theory 1 1 Time to coalescence 1 2 Neutral variation 1 3 Extensions 2 Graphical representation 3 Applications 3 1 Disease gene mapping 3 2 The genomic distribution of heterozygosity 4 History 5 Software 6 References 7 Sources 7 1 Articles 7 2 Books 8 External linksTheory EditTime to coalescence Edit Consider a single gene locus sampled from two haploid individuals in a population The ancestry of this sample is traced backwards in time to the point where these two lineages coalesce in their most recent common ancestor MRCA Coalescent theory seeks to estimate the expectation of this time period and its variance The probability that two lineages coalesce in the immediately preceding generation is the probability that they share a parental DNA sequence In a population with a constant effective population size with 2Ne copies of each locus there are 2Ne potential parents in the previous generation Under a random mating model the probability that two alleles originate from the same parental copy is thus 1 2Ne and correspondingly the probability that they do not coalesce is 1 1 2Ne At each successive preceding generation the probability of coalescence is geometrically distributed that is it is the probability of noncoalescence at the t 1 preceding generations multiplied by the probability of coalescence at the generation of interest P c t 1 1 2 N e t 1 1 2 N e displaystyle P c t left 1 frac 1 2N e right t 1 left frac 1 2N e right nbsp For sufficiently large values of Ne this distribution is well approximated by the continuously defined exponential distribution P c t 1 2 N e e t 1 2 N e displaystyle P c t frac 1 2N e e frac t 1 2N e nbsp This is mathematically convenient as the standard exponential distribution has both the expected value and the standard deviation equal to 2Ne Therefore although the expected time to coalescence is 2Ne actual coalescence times have a wide range of variation Note that coalescent time is the number of preceding generations where the coalescence took place and not calendar time though an estimation of the latter can be made multiplying 2Ne with the average time between generations The above calculations apply equally to a diploid population of effective size Ne in other words for a non recombining segment of DNA each chromosome can be treated as equivalent to an independent haploid individual in the absence of inbreeding sister chromosomes in a single individual are no more closely related than two chromosomes randomly sampled from the population Some effectively haploid DNA elements such as mitochondrial DNA however are only passed on by one sex and therefore have one quarter the effective size of the equivalent diploid population Ne 2 Neutral variation Edit Coalescent theory can also be used to model the amount of variation in DNA sequences expected from genetic drift and mutation This value is termed the mean heterozygosity represented as H displaystyle bar H nbsp Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any event at that generation either a mutation or a coalescence The probability that the event is a mutation is the probability of a mutation in either of the two lineages 2 m displaystyle 2 mu nbsp Thus the mean heterozygosity is equal to H 2 m 2 m 1 2 N e 4 N e m 1 4 N e m 8 1 8 displaystyle begin aligned bar H amp frac 2 mu 2 mu frac 1 2N e 6pt amp frac 4N e mu 1 4N e mu 6pt amp frac theta 1 theta end aligned nbsp For 4 N e m 1 displaystyle 4N e mu gg 1 nbsp the vast majority of allele pairs have at least one difference in nucleotide sequence Extensions Edit There are numerous extensions to the coalescent model such as the L coalescent which allows for the possibility of multifurcations 6 Graphical representation EditCoalescents can be visualised using dendrograms which show the relationship of branches of the population to each other The point where two branches meet indicates a coalescent event Applications EditDisease gene mapping Edit The utility of coalescent theory in the mapping of disease is slowly gaining more appreciation although the application of the theory is still in its infancy there are a number of researchers who are actively developing algorithms for the analysis of human genetic data that utilise coalescent theory 7 8 9 A considerable number of human diseases can be attributed to genetics from simple Mendelian diseases like sickle cell anemia and cystic fibrosis to more complicated maladies like cancers and mental illnesses The latter are polygenic diseases controlled by multiple genes that may occur on different chromosomes but diseases that are precipitated by a single abnormality are relatively simple to pinpoint and trace although not so simple that this has been achieved for all diseases It is immensely useful in understanding these diseases and their processes to know where they are located on chromosomes and how they have been inherited through generations of a family as can be accomplished through coalescent analysis 1 Genetic diseases are passed from one generation to another just like other genes While any gene may be shuffled from one chromosome to another during homologous recombination it is unlikely that one gene alone will be shifted Thus other genes that are close enough to the disease gene to be linked to it can be used to trace it 1 Polygenic diseases have a genetic basis even though they don t follow Mendelian inheritance models and these may have relatively high occurrence in populations and have severe health effects Such diseases may have incomplete penetrance and tend to be polygenic complicating their study These traits may arise due to many small mutations which together have a severe and deleterious effect on the health of the individual 2 Linkage mapping methods including Coalescent theory can be put to work on these diseases since they use family pedigrees to figure out which markers accompany a disease and how it is inherited At the very least this method helps narrow down the portion or portions of the genome on which the deleterious mutations may occur Complications in these approaches include epistatic effects the polygenic nature of the mutations and environmental factors That said genes whose effects are additive carry a fixed risk of developing the disease and when they exist in a disease genotype they can be used to predict risk and map the gene 2 Both regular the coalescent and the shattered coalescent which allows that multiple mutations may have occurred in the founding event and that the disease may occasionally be triggered by environmental factors have been put to work in understanding disease genes 1 Studies have been carried out correlating disease occurrence in fraternal and identical twins and the results of these studies can be used to inform coalescent modeling Since identical twins share all of their genome but fraternal twins only share half their genome the difference in correlation between the identical and fraternal twins can be used to work out if a disease is heritable and if so how strongly 2 The genomic distribution of heterozygosity Edit The human single nucleotide polymorphism SNP map has revealed large regional variations in heterozygosity more so than can be explained on the basis of Poisson distributed random chance 10 In part these variations could be explained on the basis of assessment methods the availability of genomic sequences and possibly the standard coalescent population genetic model Population genetic influences could have a major influence on this variation some loci presumably would have comparatively recent common ancestors others might have much older genealogies and so the regional accumulation of SNPs over time could be quite different The local density of SNPs along chromosomes appears to cluster in accordance with a variance to mean power law and to obey the Tweedie compound Poisson distribution 11 In this model the regional variations in the SNP map would be explained by the accumulation of multiple small genomic segments through recombination where the mean number of SNPs per segment would be gamma distributed in proportion to a gamma distributed time to the most recent common ancestor for each segment 12 History EditCoalescent theory is a natural extension of the more classical population genetics concept of neutral evolution and is an approximation to the Fisher Wright or Wright Fisher model for large populations It was discovered independently by several researchers in the 1980s 13 14 15 16 Software EditA large body of software exists for both simulating data sets under the coalescent process as well as inferring parameters such as population size and migration rates from genetic data BEAST and BEAST 2 Bayesian inference package via MCMC with a wide range of coalescent models including the use of temporally sampled sequences 17 BPP software package for inferring phylogeny and divergence times among populations under a multispecies coalescent process CoaSim software for simulating genetic data under the coalescent model DIYABC a user friendly approach to ABC for inference on population history using molecular markers 18 DendroPy a Python library for phylogenetic computing with classes and methods for simulating pure unconstrained coalescent trees as well as constrained coalescent trees under the multispecies coalescent model i e gene trees in species trees GeneRecon software for the fine scale mapping of linkage disequilibrium mapping of disease genes using coalescent theory based on a Bayesian MCMC framework genetree software for estimation of population genetics parameters using coalescent theory and simulation the R package popgen See also Oxford Mathematical Genetics and Bioinformatics Group GENOME rapid coalescent based whole genome simulation 19 IBDSim a computer package for the simulation of genotypic data under general isolation by distance models 20 IMa IMa implements the same Isolation with Migration model but does so using a new method that provides estimates of the joint posterior probability density of the model parameters IMa also allows log likelihood ratio tests of nested demographic models IMa is based on a method described in Hey and Nielsen 2007 PNAS 104 2785 2790 IMa is faster and better than IM i e by virtue of providing access to the joint posterior density function and it can be used for most but not all of the situations and options that IM can be used for Lamarc software for estimation of rates of population growth migration and recombination Migraine a program which implements coalescent algorithms for a maximum likelihood analysis using Importance Sampling algorithms of genetic data with a focus on spatially structured populations 21 Migrate maximum likelihood and Bayesian inference of migration rates under the n coalescent The inference is implemented using MCMC MaCS Markovian Coalescent Simulator simulates genealogies spatially across chromosomes as a Markovian process Similar to the SMC algorithm of McVean and Cardin and supports all demographic scenarios found in Hudson s ms ms amp msHOT Richard Hudson s original program for generating samples under neutral models 22 and an extension which allows recombination hotspots 23 msms an extended version of ms that includes selective sweeps 24 msprime a fast and scalable ms compatible simulator allowing demographic simulations producing compact output files for thousands or millions of genomes Recodon and NetRecodon software to simulate coding sequences with inter intracodon recombination migration growth rate and longitudinal sampling 25 26 CoalEvol and SGWE software to simulate nucleotide coding and amino acid sequences under the coalescent with demographics recombination population structure with migration and longitudinal sampling 27 SARG structure Ancestral Recombination Graph by Magnus Nordborg simcoal2 software to simulate genetic data under the coalescent model with complex demography and recombination TreesimJ forward simulation software allowing sampling of genealogies and data sets under diverse selective and demographic models References Edit a b c Morris A Whittaker J amp Balding D 2002 Fine Scale Mapping of Disease Loci via Shattered Coalescent Modeling of Genealogies The American Journal of Human Genetics 70 3 686 707 doi 10 1086 339271 a b c Rannala B 2001 Finding genes influencing susceptibility to complex diseases in the post genome era American journal of pharmacogenomics 1 3 203 221 Sources EditArticles Edit Arenas M and Posada D 2014 Simulation of Genome Wide Evolution under Heterogeneous Substitution Models and Complex Multispecies Coalescent Histories Molecular Biology and Evolution 31 5 1295 1301 Arenas M and Posada D 2007 Recodon Coalescent simulation of coding DNA sequences with recombination migration and demography BMC Bioinformatics 8 458 Arenas M and Posada D 2010 Coalescent simulation of intracodon recombination Genetics 184 2 429 437 Browning S R 2006 Multilocus association mapping using variable length markov chains American Journal of Human Genetics 78 903 913 Cornuet J M Pudlo P Veyssier J Dehne Garcia A Gautier M Leblois R Marin J M Estoup A 2014 DIYABC v2 0 a software to make Approximate Bayesian Computation inferences about population history using Single Nucleotide Polymorphism DNA sequence and microsatellite data Bioinformatics 30 1187 1189 Degnan JH and LA Salter 2005 Gene tree distributions under the coalescent process Evolution 59 1 24 37 pdf from coaltree net Donnelly P Tavare S 1995 Coalescents and genealogical structure under neutrality Annual Review of Genetics 29 401 421 Drummond A Suchard MA Xie D Rambaut A 2012 Bayesian phylogenetics with BEAUti and the BEAST 1 7 Molecular Biology and Evolution 29 8 1969 1973 doi 10 1093 molbev mss075 PMC 3408070 PMID 22367748 Ewing G and Hermisson J 2010 MSMS a coalescent simulation program including recombination demographic structure and selection at a single locus Bioinformatics 26 15 Hellenthal G Stephens M 2006 msHOT modifying Hudson s ms simulator to incorporate crossover and gene conversion hotspots Bioinformatics AOP Hudson Richard R 1983a Testing the Constant Rate Neutral Allele Model with Protein Sequence Data Evolution 37 1 203 17 doi 10 2307 2408186 ISSN 1558 5646 JSTOR 2408186 PMID 28568026 Hudson RR 1983b Properties of a neutral allele model with intragenic recombination Theoretical Population Biology 23 183 201 Hudson RR 1991 Gene genealogies and the coalescent process Oxford Surveys in Evolutionary Biology 7 1 44 Hudson RR 2002 Generating samples under a Wright Fisher neutral model Bioinformatics 18 337 338 Kendal WS 2003 An exponential dispersion model for the distribution of human single nucleotide polymorphisms Mol Biol Evol 20 579 590 Hein J Schierup M Wiuf C 2004 Gene Genealogies Variation and Evolution A Primer in Coalescent Theory Oxford University Press ISBN 978 0 19 852996 5 Kaplan N L Darden T Hudson R R 1988 The coalescent process in models with selection Genetics 120 819 829 Kingman J F C 1982 On the Genealogy of Large Populations Journal of Applied Probability 19 27 43 CiteSeerX 10 1 1 552 1429 doi 10 2307 3213548 ISSN 0021 9002 JSTOR 3213548 S2CID 125055288 Kingman J F C 2000 Origins of the coalescent 1974 1982 Genetics 156 1461 1463 Leblois R Estoup A and Rousset F 2009 IBDSim a computer program to simulate genotypic data under isolation by distance Molecular Ecology Resources 9 107 109 Liang L Zollner S Abecasis G R 2007 GENOME a rapid coalescent based whole genome simulator Bioinformatics 23 1565 1567 Mailund T Schierup M H Pedersen C N S Mechlenborg P J M Madsen J N Schauser L 2005 CoaSim A Flexible Environment for Simulating Genetic Data under Coalescent Models BMC Bioinformatics 6 252 Mohle M Sagitov S 2001 A classification of coalescent processes for haploid exchangeable population models The Annals of Probability 29 1547 1562 Morris A P Whittaker J C Balding D J 2002 Fine scale mapping of disease loci via shattered coalescent modeling of genealogies American Journal of Human Genetics 70 686 707 Neuhauser C Krone S M 1997 The genealogy of samples in models with selection Genetics 145 519 534 Pitman J 1999 Coalescents with multiple collisions The Annals of Probability 27 1870 1902 Harding Rosalind M 1998 New phylogenies an introductory look at the coalescent pp 15 22 in Harvey P H Brown A J L Smith J M Nee S New uses for new phylogenies Oxford University Press ISBN 0198549849 Rosenberg N A Nordborg M 2002 Genealogical Trees Coalescent Theory and the Analysis of Genetic Polymorphisms Nature Reviews Genetics 3 380 390 Sagitov S 1999 The general coalescent with asynchronous mergers of ancestral lines Journal of Applied Probability 36 1116 1125 Schweinsberg J 2000 Coalescents with simultaneous multiple collisions Electronic Journal of Probability 5 1 50 Slatkin M 2001 Simulating genealogies of selected alleles in populations of variable size Genetic Research 145 519 534 Tajima F 1983 Evolutionary Relationship of DNA Sequences in finite populations Genetics 105 437 460 Tavare S Balding DJ Griffiths RC amp Donnelly P 1997 Inferring coalescent times from DNA sequence data Genetics 145 505 518 The international SNP map working group 2001 A map of human genome variation containing 1 42 million single nucleotide polymorphisms Nature 409 928 933 Zollner S and Pritchard J K 2005 Coalescent Based Association Mapping and Fine Mapping of Complex Trait Loci Genetics 169 1071 1092 Rousset F and Leblois R 2007 Likelihood and Approximate Likelihood Analyses of Genetic Structure in a Linear Habitat Performance and Robustness to Model Mis Specification Molecular Biology and Evolution 24 2730 2745 Books Edit Hein J Schierup M H and Wiuf C Gene Genealogies Variation and Evolution A Primer in Coalescent Theory Oxford University Press 2005 ISBN 0 19 852996 1 Nordborg M 2001 Introduction to Coalescent Theory Chapter 7 in Balding D Bishop M Cannings C editors Handbook of Statistical Genetics Wiley ISBN 978 0 471 86094 5 Wakeley J 2006 An Introduction to Coalescent Theory Roberts amp Co ISBN 0 9747077 5 9 Accompanying website with sample chapters Rice SH 2004 Evolutionary Theory Mathematical and Conceptual Foundations Sinauer Associates Sunderland MA See esp ch 3 for detailed derivations Berestycki N Recent progress in coalescent theory 2009 ENSAIOS Matematicos vol 16 Bertoin J Random Fragmentation and Coagulation Processes 2006 Cambridge Studies in Advanced Mathematics 102 Cambridge University Press Cambridge 2006 ISBN 978 0 521 86728 3 Pitman J Combinatorial stochastic processes Springer 2003 External links EditEvoMath 3 Genetic Drift and Coalescence Briefly overview with probability equations for genetic drift and simulation graphs Retrieved from https en wikipedia org w index php title Coalescent theory amp oldid 1153902067, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.