fbpx
Wikipedia

1000 Genomes Project

The 1000 Genomes Project (abbreviated as 1KGP), launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within the following three years, using newly developed technologies which were faster and less expensive. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature.[1] In 2012, the sequencing of 1092 genomes was announced in a Nature publication.[2] In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.[3][4]

Many rare variations, restricted to closely related groups, were identified, and eight structural-variation classes were analyzed.[5]

The project unites multidisciplinary research teams from institutes around the world, including China, Italy, Japan, Kenya, Nigeria, Peru, the United Kingdom, and the United States. Each will contribute to the enormous sequence dataset and to a refined human genome map, which will be freely accessible through public databases to the scientific community and the general public alike.[2]

By providing an overview of all human genetic variation, the consortium will generate a valuable tool for all fields of biological science, especially in the disciplines of genetics, medicine, pharmacology, biochemistry, and bioinformatics.[6]

Changes in the number and order of genes (A-D) create genetic diversity within and between populations.

Background

Since the completion of the Human Genome Project advances in human population genetics and comparative genomics have made it possible to gain increasing insight into the nature of genetic diversity.[7] However, we are just beginning to understand how processes like the random sampling of gametes, structural variations (insertions/deletions (indels), copy number variations (CNV), retroelements), single-nucleotide polymorphisms (SNPs), and natural selection have shaped the level and pattern of variation within species and also between species.[8][9][10][11]

Human genetic variation

The random sampling of gametes during sexual reproduction leads to genetic drift — a random fluctuation in the population frequency of a trait — in subsequent generations and would result in the loss of all variation in the absence of external influence. It is postulated that the rate of genetic drift is inversely proportional to population size, and that it may be accelerated in specific situations such as bottlenecks, where the population size is reduced for a certain period of time, and by the founder effect (individuals in a population tracing back to a small number of founding individuals).[8]

Anzai et al. demonstrated that indels account for 90.4% of all observed variations in the sequence of the major histocompatibility locus (MHC) between humans and chimpanzees. After taking multiple indels into consideration, the high degree of genomic similarity between the two species (98.6% nucleotide sequence identity) drops to only 86.7%. For example, a large deletion of 95 kilobases (kb) between the loci of the human MICA and MICB genes, results in a single hybrid chimpanzee MIC gene, linking this region to a species-specific handling of several retroviral infections and the resultant susceptibility to various autoimmune diseases. The authors conclude that instead of more subtle SNPs, indels were the driving mechanism in primate speciation.[9]

Besides mutations, SNPs and other structural variants such as copy-number variants (CNVs) are contributing to the genetic diversity in human populations. Using microarrays, almost 1,500 copy number variable regions, covering around 12% of the genome and containing hundreds of genes, disease loci, functional elements and segmental duplications, have been identified in the HapMap sample collection. Although the specific function of CNVs remains elusive, the fact that CNVs span more nucleotide content per genome than SNPs emphasizes the importance of CNVs in genetic diversity and evolution.[10]

Investigating human genomic variations holds great potential for identifying genes that might underlie differences in disease resistance (e.g. MHC region) or drug metabolism.[12]

Natural selection

Natural selection evolution of a trait can be divided into three classes. Directional or positive selection refers to a situation where a certain allele has a greater fitness than other alleles, consequently increasing its population frequency (e.g. antibiotic resistance of bacteria). In contrast, stabilizing or negative selection (also known as purifying selection) lowers the frequency or even removes alleles from a population due to disadvantages associated with it with respect to other alleles. Finally, a number of forms of balancing selection exist; those increase genetic variation within a species by being overdominant (heterozygous individuals are fitter than homozygous individuals, e.g. G6PD, a gene that is involved in both Hemolytic anaemia and malaria resistance) or can vary spatially within a species that inhabits different niches, thus favouring different alleles.[13] Some genomic differences may not affect fitness. Neutral variation, previously thought to be “junk” DNA, is unaffected by natural selection resulting in higher genetic variation at such sites when compared to sites where variation does influence fitness.[14]

It is not fully clear how natural selection has shaped population differences; however, genetic candidate regions under selection have been identified recently.[11] Patterns of DNA polymorphisms can be used to reliably detect signatures of selection and may help to identify genes that might underlie variation in disease resistance or drug metabolism.[13][14] Barreiro et al. found evidence that negative selection has reduced population differentiation at the amino acid–altering level (particularly in disease-related genes), whereas, positive selection has ensured regional adaptation of human populations by increasing population differentiation in gene regions (mainly nonsynonymous and 5'-untranslated region variants).[11]

It is thought that most complex and Mendelian diseases (except diseases with late onset, assuming that older individuals no longer contribute to the fitness of their offspring) will have an effect on survival and/or reproduction, thus, genetic factors underlying those diseases should be influenced by natural selection. Although, diseases that have late onset today could have been childhood diseases in the past as genes delaying disease progression could have undergone selection. Gaucher disease (mutations in the GBA gene), Crohn's disease (mutation of NOD2) and familial hypertrophic cardiomyopathy (mutations in MYH7, TNNT2, TPM1 and MYBPC3) are all examples of negative selection. These disease mutations are primarily recessive and segregate as expected at a low frequency, supporting the hypothesized negative selection. There is evidence that the genetic-basis of Type 1 Diabetes may have undergone positive selection.[15] Few cases have been reported, where disease-causing mutations appear at the high frequencies supported by balanced selection. The most prominent example is mutations of the G6PD locus where, if homozygous G6PD enzyme deficiency and consequently Hemolytic anaemia results, but in the heterozygous state are partially protective against malaria. Other possible explanations for segregation of disease alleles at moderate or high frequencies include genetic drift and recent alterations towards positive selection due to environmental changes such as diet or genetic hitch-hiking.[12]

Genome-wide comparative analyses of different human populations, as well as between species (e.g. human versus chimpanzee) are helping us to understand the relationship between diseases and selection and provide evidence of mutations in constrained genes being disproportionally associated with heritable disease phenotypes. Genes implicated in complex disorders tend to be under less negative selection than Mendelian disease genes or non-disease genes.[12]

Project description

Goals

There are two kinds of genetic variants related to disease. The first are rare genetic variants that have a severe effect predominantly on simple traits (e.g. Cystic fibrosis, Huntington disease). The second, more common, genetic variants have a mild effect and are thought to be implicated in complex traits (e.g. Cognition, Diabetes, Heart Disease). Between these two types of genetic variants lies a significant gap of knowledge, which the 1000 Genomes Project is designed to address.[6]

The primary goal of this project is to create a complete and detailed catalogue of human genetic variations, which in turn can be used for association studies relating genetic variation to disease. By doing so the consortium aims to discover >95 % of the variants (e.g. SNPs, CNVs, indels) with minor allele frequencies as low as 1% across the genome and 0.1-0.5% in gene regions, as well as to estimate the population frequencies, haplotype backgrounds and linkage disequilibrium patterns of variant alleles.[16]

Secondary goals will include the support of better SNP and probe selection for genotyping platforms in future studies and the improvement of the human reference sequence. Furthermore, the completed database will be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation and recombination.[16]

Outline

The human genome consists of approximately 3 billion DNA base pairs and is estimated to carry around 20,000 protein coding genes. In designing the study the consortium needed to address several critical issues regarding the project metrics such as technology challenges, data quality standards and sequence coverage.[16]

Over the course of the next three years,[clarification needed] scientists at the Sanger Institute, BGI Shenzhen and the National Human Genome Research Institute’s Large-Scale Sequencing Network are planning to sequence a minimum of 1,000 human genomes. Due to the large amount of sequence data that need to be generated and analyzed it is possible that other participants may be recruited over time.[6]

Almost 10 billion bases will be sequenced per day over a period of the two year production phase. This equates to more than two human genomes every 24 hours; a groundbreaking capacity. Challenging the leading experts of bioinformatics and statistical genetics, the sequence dataset will comprise 6 trillion DNA bases, 60-fold more sequence data than what has been published in DNA databases over the past 25 years.[6]

To determine the final design of the full project three pilot studies were designed and will be carried out within the first year of the project. The first pilot intends to genotype 180 people of 3 major geographic groups at low coverage (2x). For the second pilot study, the genomes of two nuclear families (both parents and an adult child) are going to be sequenced with deep coverage (20x per genome). The third pilot study involves sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).[6][16]

It has been estimated that the project would likely cost more than $500 million if standard DNA sequencing technologies were used. Therefore, several new technologies (e.g. Solexa, 454, SOLiD) will be applied, lowering the expected costs to between $30 million and $50 million. The major support will be provided by the Wellcome Trust Sanger Institute in Hinxton, England; the Beijing Genomics Institute, Shenzhen (BGI Shenzhen), China; and the NHGRI, part of the National Institutes of Health (NIH).[6]

In keeping with Fort Lauderdale principles 2013-12-28 at the Wayback Machine, all genome sequence data (including variant calls) is freely available as the project progresses and can be downloaded via ftp from the 1000 genomes project webpage.

Human genome samples

 
Locations of population samples of 1000 Genomes Project.[17] Each circle represents the number of sequences in the final release.

Based on the overall goals for the project, the samples will be chosen to provide power in populations where association studies for common diseases are being carried out. Furthermore, the samples do not need to have medical or phenotype information since the proposed catalogue will be a basic resource on human variation.[16]

For the pilot studies human genome samples from the HapMap collection will be sequenced. It will be useful to focus on samples that have additional data available (such as ENCODE sequence, genome-wide genotypes, fosmid-end sequence, structural variation assays, and gene expression) to be able to compare the results with those from other projects.[16]

Complying with extensive ethical procedures, the 1000 Genomes Project will then use samples from volunteer donors. The following populations will be included in the study: Yoruba in Ibadan (YRI), Nigeria; Japanese in Tokyo (JPT); Chinese in Beijing (CHB); Utah residents with ancestry from northern and western Europe (CEU); Luhya in Webuye, Kenya (LWK); Maasai in Kinyawa, Kenya (MKK); Toscani in Italy (TSI); Peruvians in Lima, Peru (PEL); Gujarati Indians in Houston (GIH); Chinese in metropolitan Denver (CHD); people of Mexican ancestry in Los Angeles (MXL); and people of African ancestry in the southwestern United States (ASW).[6]

ID Place Population Detail
ASW  * African Ancestry in SW USA Detail
ACB  * African Caribbean in Barbados Detail
BEB   Bengali in Bangladesh Detail
GBR   British from England and Scotland Detail
CDX   Chinese Dai in Xishuangbanna, China Detail
CLM   Colombian in Medellín, Colombia Detail
ESN   Esan in Nigeria Detail
FIN   Finnish in Finland Detail
GWD   Gambian in Western DivisionMandinka Detail
GIH  * Gujarati Indians in Houston, Texas, United States Detail
CHB   Han Chinese in Beijing, China Detail
CHS   Han Chinese South, China Detail
IBS   Iberian populations in Spain Detail
ITU  * Indian Telugu in the U.K. Detail
JPT   Japanese in Tokyo, Japan Detail
KHV   Kinh in Ho Chi Minh City, Vietnam Detail
LWK   Luhya in Webuye, Kenya Detail
MSL   Mende in Sierra Leone Detail
MXL  * Mexican Ancestry in Los Angeles CA United States Detail
PEL   Peruvian in Lima, Peru Detail
PUR   Puerto Rican in Puerto Rico Detail
PJL   Punjabi in Lahore, Pakistan Detail
STU  * Sri Lankan Tamil in the UK Detail
TSI   Toscani in Italia Detail
YRI   Yoruba in Ibadan, Nigeria Detail
CEU  * Utah residents with Northern and Western European ancestry from the CEPH collection Detail

* Population that was collected in diaspora

Community meeting

Data generated by the 1000 Genomes Project is widely used by the genetics community, making the first 1000 Genomes Project one of the most cited papers in biology.[18] To support this user community, the project held a community analysis meeting in July 2012 that included talks highlighting key project discoveries, their impact on population genetics and human disease studies, and summaries of other large-scale sequencing studies.[19]

Project findings

Pilot phase

The pilot phase consisted of three projects:

  • low-coverage whole-genome sequencing of 179 individuals from 4 populations
  • high-coverage sequencing of 2 trios (mother-father-child)
  • exon-targeted sequencing of 697 individuals from 7 populations

It was found that on average, each person carries around 250–300 loss-of-function variants in annotated genes and 50-100 variants previously implicated in inherited disorders. Based on the two trios, it is estimated that the rate of de novo germline mutation is approximately 10−8 per base per generation.[1]

See also

References

  1. ^ a b Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, et al. (October 2010). "A map of human genome variation from population-scale sequencing". Nature. 467 (7319): 1061–73. Bibcode:2010Natur.467.1061T. doi:10.1038/nature09534. PMC 3042601. PMID 20981092.
  2. ^ a b Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. (November 2012). "An integrated map of genetic variation from 1,092 human genomes". Nature. 491 (7422): 56–65. Bibcode:2012Natur.491...56T. doi:10.1038/nature11632. PMC 3498066. PMID 23128226.
  3. ^ Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, et al. (October 2015). "A global reference for human genetic variation". Nature. 526 (7571): 68–74. Bibcode:2015Natur.526...68T. doi:10.1038/nature15393. PMC 4750478. PMID 26432245.
  4. ^ Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. (October 2015). "An integrated map of structural variation in 2,504 human genomes". Nature. 526 (7571): 75–81. Bibcode:2015Natur.526...75.. doi:10.1038/nature15394. PMC 4617611. PMID 26432246.
  5. ^ "Variety of life". Nature News & Comment. 2015-09-30. Retrieved 2015-10-15.
  6. ^ a b c d e f g G Spencer, International Consortium Announces the 1000 Genomes Project, EMBARGOED (2008) http://www.nih.gov/news/health/jan2008/nhgri-22.htm
  7. ^ Nielsen R (October 2010). "Genomics: In search of rare human variants". Nature. 467 (7319): 1050–1. Bibcode:2010Natur.467.1050N. doi:10.1038/4671050a. PMID 20981085.
  8. ^ a b JC Long, Human Genetic Variation: The mechanisms and results of microevolution, American Anthropological Association (2004)
  9. ^ a b Anzai T, Shiina T, Kimura N, Yanagiya K, Kohara S, Shigenari A, et al. (June 2003). "Comparative sequencing of human and chimpanzee MHC class I regions unveils insertions/deletions as the major path to genomic divergence". Proceedings of the National Academy of Sciences of the United States of America. 100 (13): 7708–13. Bibcode:2003PNAS..100.7708A. doi:10.1073/pnas.1230533100. PMC 164652. PMID 12799463.
  10. ^ a b Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. (November 2006). "Global variation in copy number in the human genome". Nature. 444 (7118): 444–54. Bibcode:2006Natur.444..444R. doi:10.1038/nature05329. PMC 2669898. PMID 17122850.
  11. ^ a b c Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L (March 2008). "Natural selection has driven population differentiation in modern humans". Nature Genetics. 40 (3): 340–5. doi:10.1038/ng.78. PMID 18246066. S2CID 205357396.
  12. ^ a b c Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG (November 2007). "Recent and ongoing selection in the human genome". Nature Reviews. Genetics. 8 (11): 857–68. doi:10.1038/nrg2187. PMC 2933187. PMID 17943193.
  13. ^ a b EE Harris et al., The molecular signature of selection underlying human adaptations, Yearbook of Physical Anthropology 49: 89-130 (2006)
  14. ^ a b Bamshad M, Wooding SP (February 2003). "Signatures of natural selection in the human genome". Nature Reviews. Genetics. 4 (2): 99–111. doi:10.1038/nrg999. PMID 12560807. S2CID 13722452.
  15. ^ Corona E, Dudley JT, Butte AJ (August 2010). Hawks J (ed.). "Extreme evolutionary disparities seen in positive selection across seven complex diseases". PLOS ONE. 5 (8): e12236. Bibcode:2010PLoSO...512236C. doi:10.1371/journal.pone.0012236. PMC 2923198. PMID 20808933.
  16. ^ a b c d e f Meeting Report: A Workshop to Plan a Deep Catalog of Human Genetic Variation, (2007) http://www.1000genomes.org/sites/1000genomes.org/files/docs/1000Genomes-MeetingReport.pdf
  17. ^ Oleksyk TK, Brukhin V, O'Brien SJ (2015). "The Genome Russia project: closing the largest remaining omission on the world Genome map". GigaScience. 4: 53. doi:10.1186/s13742-015-0095-0. PMC 4644275. PMID 26568821.
  18. ^ C. King (2012) The Hottest Research of 2011. Science Watch http://archive.sciencewatch.com/newsletter/2012/201203/hottest_research_2012/
  19. ^ 1000 Genomes Project Community Analysis Meeting http://1000gconference.sph.umich.edu/

External links

  • 1000 Genomes - A Deep Catalog of Human Genetic Variation - official web page
  • International HapMap Project - official web page
  • Human Genome Project Information

1000, genomes, project, abbreviated, 1kgp, launched, january, 2008, international, research, effort, establish, most, detailed, catalogue, human, genetic, variation, scientists, planned, sequence, genomes, least, thousand, anonymous, participants, from, number. The 1000 Genomes Project abbreviated as 1KGP launched in January 2008 was an international research effort to establish by far the most detailed catalogue of human genetic variation Scientists planned to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within the following three years using newly developed technologies which were faster and less expensive In 2010 the project finished its pilot phase which was described in detail in a publication in the journal Nature 1 In 2012 the sequencing of 1092 genomes was announced in a Nature publication 2 In 2015 two papers in Nature reported results and the completion of the project and opportunities for future research 3 4 Many rare variations restricted to closely related groups were identified and eight structural variation classes were analyzed 5 The project unites multidisciplinary research teams from institutes around the world including China Italy Japan Kenya Nigeria Peru the United Kingdom and the United States Each will contribute to the enormous sequence dataset and to a refined human genome map which will be freely accessible through public databases to the scientific community and the general public alike 2 By providing an overview of all human genetic variation the consortium will generate a valuable tool for all fields of biological science especially in the disciplines of genetics medicine pharmacology biochemistry and bioinformatics 6 Changes in the number and order of genes A D create genetic diversity within and between populations Contents 1 Background 1 1 Human genetic variation 1 2 Natural selection 2 Project description 2 1 Goals 2 2 Outline 2 3 Human genome samples 2 4 Community meeting 3 Project findings 3 1 Pilot phase 4 See also 5 References 6 External linksBackground EditSince the completion of the Human Genome Project advances in human population genetics and comparative genomics have made it possible to gain increasing insight into the nature of genetic diversity 7 However we are just beginning to understand how processes like the random sampling of gametes structural variations insertions deletions indels copy number variations CNV retroelements single nucleotide polymorphisms SNPs and natural selection have shaped the level and pattern of variation within species and also between species 8 9 10 11 Human genetic variation Edit The random sampling of gametes during sexual reproduction leads to genetic drift a random fluctuation in the population frequency of a trait in subsequent generations and would result in the loss of all variation in the absence of external influence It is postulated that the rate of genetic drift is inversely proportional to population size and that it may be accelerated in specific situations such as bottlenecks where the population size is reduced for a certain period of time and by the founder effect individuals in a population tracing back to a small number of founding individuals 8 Anzai et al demonstrated that indels account for 90 4 of all observed variations in the sequence of the major histocompatibility locus MHC between humans and chimpanzees After taking multiple indels into consideration the high degree of genomic similarity between the two species 98 6 nucleotide sequence identity drops to only 86 7 For example a large deletion of 95 kilobases kb between the loci of the human MICA and MICB genes results in a single hybrid chimpanzee MIC gene linking this region to a species specific handling of several retroviral infections and the resultant susceptibility to various autoimmune diseases The authors conclude that instead of more subtle SNPs indels were the driving mechanism in primate speciation 9 Besides mutations SNPs and other structural variants such as copy number variants CNVs are contributing to the genetic diversity in human populations Using microarrays almost 1 500 copy number variable regions covering around 12 of the genome and containing hundreds of genes disease loci functional elements and segmental duplications have been identified in the HapMap sample collection Although the specific function of CNVs remains elusive the fact that CNVs span more nucleotide content per genome than SNPs emphasizes the importance of CNVs in genetic diversity and evolution 10 Investigating human genomic variations holds great potential for identifying genes that might underlie differences in disease resistance e g MHC region or drug metabolism 12 Natural selection Edit Natural selection evolution of a trait can be divided into three classes Directional or positive selection refers to a situation where a certain allele has a greater fitness than other alleles consequently increasing its population frequency e g antibiotic resistance of bacteria In contrast stabilizing or negative selection also known as purifying selection lowers the frequency or even removes alleles from a population due to disadvantages associated with it with respect to other alleles Finally a number of forms of balancing selection exist those increase genetic variation within a species by being overdominant heterozygous individuals are fitter than homozygous individuals e g G6PD a gene that is involved in both Hemolytic anaemia and malaria resistance or can vary spatially within a species that inhabits different niches thus favouring different alleles 13 Some genomic differences may not affect fitness Neutral variation previously thought to be junk DNA is unaffected by natural selection resulting in higher genetic variation at such sites when compared to sites where variation does influence fitness 14 It is not fully clear how natural selection has shaped population differences however genetic candidate regions under selection have been identified recently 11 Patterns of DNA polymorphisms can be used to reliably detect signatures of selection and may help to identify genes that might underlie variation in disease resistance or drug metabolism 13 14 Barreiro et al found evidence that negative selection has reduced population differentiation at the amino acid altering level particularly in disease related genes whereas positive selection has ensured regional adaptation of human populations by increasing population differentiation in gene regions mainly nonsynonymous and 5 untranslated region variants 11 It is thought that most complex and Mendelian diseases except diseases with late onset assuming that older individuals no longer contribute to the fitness of their offspring will have an effect on survival and or reproduction thus genetic factors underlying those diseases should be influenced by natural selection Although diseases that have late onset today could have been childhood diseases in the past as genes delaying disease progression could have undergone selection Gaucher disease mutations in the GBA gene Crohn s disease mutation of NOD2 and familial hypertrophic cardiomyopathy mutations in MYH7 TNNT2 TPM1 and MYBPC3 are all examples of negative selection These disease mutations are primarily recessive and segregate as expected at a low frequency supporting the hypothesized negative selection There is evidence that the genetic basis of Type 1 Diabetes may have undergone positive selection 15 Few cases have been reported where disease causing mutations appear at the high frequencies supported by balanced selection The most prominent example is mutations of the G6PD locus where if homozygous G6PD enzyme deficiency and consequently Hemolytic anaemia results but in the heterozygous state are partially protective against malaria Other possible explanations for segregation of disease alleles at moderate or high frequencies include genetic drift and recent alterations towards positive selection due to environmental changes such as diet or genetic hitch hiking 12 Genome wide comparative analyses of different human populations as well as between species e g human versus chimpanzee are helping us to understand the relationship between diseases and selection and provide evidence of mutations in constrained genes being disproportionally associated with heritable disease phenotypes Genes implicated in complex disorders tend to be under less negative selection than Mendelian disease genes or non disease genes 12 Project description EditThis section needs to be updated Please help update this article to reflect recent events or newly available information April 2021 Goals Edit There are two kinds of genetic variants related to disease The first are rare genetic variants that have a severe effect predominantly on simple traits e g Cystic fibrosis Huntington disease The second more common genetic variants have a mild effect and are thought to be implicated in complex traits e g Cognition Diabetes Heart Disease Between these two types of genetic variants lies a significant gap of knowledge which the 1000 Genomes Project is designed to address 6 The primary goal of this project is to create a complete and detailed catalogue of human genetic variations which in turn can be used for association studies relating genetic variation to disease By doing so the consortium aims to discover gt 95 of the variants e g SNPs CNVs indels with minor allele frequencies as low as 1 across the genome and 0 1 0 5 in gene regions as well as to estimate the population frequencies haplotype backgrounds and linkage disequilibrium patterns of variant alleles 16 Secondary goals will include the support of better SNP and probe selection for genotyping platforms in future studies and the improvement of the human reference sequence Furthermore the completed database will be a useful tool for studying regions under selection variation in multiple populations and understanding the underlying processes of mutation and recombination 16 Outline Edit The human genome consists of approximately 3 billion DNA base pairs and is estimated to carry around 20 000 protein coding genes In designing the study the consortium needed to address several critical issues regarding the project metrics such as technology challenges data quality standards and sequence coverage 16 Over the course of the next three years clarification needed scientists at the Sanger Institute BGI Shenzhen and the National Human Genome Research Institute s Large Scale Sequencing Network are planning to sequence a minimum of 1 000 human genomes Due to the large amount of sequence data that need to be generated and analyzed it is possible that other participants may be recruited over time 6 Almost 10 billion bases will be sequenced per day over a period of the two year production phase This equates to more than two human genomes every 24 hours a groundbreaking capacity Challenging the leading experts of bioinformatics and statistical genetics the sequence dataset will comprise 6 trillion DNA bases 60 fold more sequence data than what has been published in DNA databases over the past 25 years 6 To determine the final design of the full project three pilot studies were designed and will be carried out within the first year of the project The first pilot intends to genotype 180 people of 3 major geographic groups at low coverage 2x For the second pilot study the genomes of two nuclear families both parents and an adult child are going to be sequenced with deep coverage 20x per genome The third pilot study involves sequencing the coding regions exons of 1 000 genes in 1 000 people with deep coverage 20x 6 16 It has been estimated that the project would likely cost more than 500 million if standard DNA sequencing technologies were used Therefore several new technologies e g Solexa 454 SOLiD will be applied lowering the expected costs to between 30 million and 50 million The major support will be provided by the Wellcome Trust Sanger Institute in Hinxton England the Beijing Genomics Institute Shenzhen BGI Shenzhen China and the NHGRI part of the National Institutes of Health NIH 6 In keeping with Fort Lauderdale principles Archived 2013 12 28 at the Wayback Machine all genome sequence data including variant calls is freely available as the project progresses and can be downloaded via ftp from the 1000 genomes project webpage Human genome samples Edit Locations of population samples of 1000 Genomes Project 17 Each circle represents the number of sequences in the final release Based on the overall goals for the project the samples will be chosen to provide power in populations where association studies for common diseases are being carried out Furthermore the samples do not need to have medical or phenotype information since the proposed catalogue will be a basic resource on human variation 16 For the pilot studies human genome samples from the HapMap collection will be sequenced It will be useful to focus on samples that have additional data available such as ENCODE sequence genome wide genotypes fosmid end sequence structural variation assays and gene expression to be able to compare the results with those from other projects 16 Complying with extensive ethical procedures the 1000 Genomes Project will then use samples from volunteer donors The following populations will be included in the study Yoruba in Ibadan YRI Nigeria Japanese in Tokyo JPT Chinese in Beijing CHB Utah residents with ancestry from northern and western Europe CEU Luhya in Webuye Kenya LWK Maasai in Kinyawa Kenya MKK Toscani in Italy TSI Peruvians in Lima Peru PEL Gujarati Indians in Houston GIH Chinese in metropolitan Denver CHD people of Mexican ancestry in Los Angeles MXL and people of African ancestry in the southwestern United States ASW 6 ID Place Population DetailASW African Ancestry in SW USA DetailACB African Caribbean in Barbados DetailBEB Bengali in Bangladesh DetailGBR British from England and Scotland DetailCDX Chinese Dai in Xishuangbanna China DetailCLM Colombian in Medellin Colombia DetailESN Esan in Nigeria DetailFIN Finnish in Finland DetailGWD Gambian in Western Division Mandinka DetailGIH Gujarati Indians in Houston Texas United States DetailCHB Han Chinese in Beijing China DetailCHS Han Chinese South China DetailIBS Iberian populations in Spain DetailITU Indian Telugu in the U K DetailJPT Japanese in Tokyo Japan DetailKHV Kinh in Ho Chi Minh City Vietnam DetailLWK Luhya in Webuye Kenya DetailMSL Mende in Sierra Leone DetailMXL Mexican Ancestry in Los Angeles CA United States DetailPEL Peruvian in Lima Peru DetailPUR Puerto Rican in Puerto Rico DetailPJL Punjabi in Lahore Pakistan DetailSTU Sri Lankan Tamil in the UK DetailTSI Toscani in Italia DetailYRI Yoruba in Ibadan Nigeria DetailCEU Utah residents with Northern and Western European ancestry from the CEPH collection Detail Population that was collected in diaspora Community meeting Edit Data generated by the 1000 Genomes Project is widely used by the genetics community making the first 1000 Genomes Project one of the most cited papers in biology 18 To support this user community the project held a community analysis meeting in July 2012 that included talks highlighting key project discoveries their impact on population genetics and human disease studies and summaries of other large scale sequencing studies 19 Project findings EditPilot phase Edit The pilot phase consisted of three projects low coverage whole genome sequencing of 179 individuals from 4 populations high coverage sequencing of 2 trios mother father child exon targeted sequencing of 697 individuals from 7 populationsIt was found that on average each person carries around 250 300 loss of function variants in annotated genes and 50 100 variants previously implicated in inherited disorders Based on the two trios it is estimated that the rate of de novo germline mutation is approximately 10 8 per base per generation 1 See also Edit Biology portalHuman Genome Project HapMap Project Personal genomics Population groups in biomedicine 1000 Plant Genomes Project List of biological databasesReferences Edit a b Abecasis GR Altshuler D Auton A Brooks LD Durbin RM Gibbs RA et al October 2010 A map of human genome variation from population scale sequencing Nature 467 7319 1061 73 Bibcode 2010Natur 467 1061T doi 10 1038 nature09534 PMC 3042601 PMID 20981092 a b Abecasis GR Auton A Brooks LD DePristo MA Durbin RM Handsaker RE et al November 2012 An integrated map of genetic variation from 1 092 human genomes Nature 491 7422 56 65 Bibcode 2012Natur 491 56T doi 10 1038 nature11632 PMC 3498066 PMID 23128226 Auton A Brooks LD Durbin RM Garrison EP Kang HM Korbel JO et al October 2015 A global reference for human genetic variation Nature 526 7571 68 74 Bibcode 2015Natur 526 68T doi 10 1038 nature15393 PMC 4750478 PMID 26432245 Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov A Huddleston J et al October 2015 An integrated map of structural variation in 2 504 human genomes Nature 526 7571 75 81 Bibcode 2015Natur 526 75 doi 10 1038 nature15394 PMC 4617611 PMID 26432246 Variety of life Nature News amp Comment 2015 09 30 Retrieved 2015 10 15 a b c d e f g G Spencer International Consortium Announces the 1000 Genomes Project EMBARGOED 2008 http www nih gov news health jan2008 nhgri 22 htm Nielsen R October 2010 Genomics In search of rare human variants Nature 467 7319 1050 1 Bibcode 2010Natur 467 1050N doi 10 1038 4671050a PMID 20981085 a b JC Long Human Genetic Variation The mechanisms and results of microevolution American Anthropological Association 2004 a b Anzai T Shiina T Kimura N Yanagiya K Kohara S Shigenari A et al June 2003 Comparative sequencing of human and chimpanzee MHC class I regions unveils insertions deletions as the major path to genomic divergence Proceedings of the National Academy of Sciences of the United States of America 100 13 7708 13 Bibcode 2003PNAS 100 7708A doi 10 1073 pnas 1230533100 PMC 164652 PMID 12799463 a b Redon R Ishikawa S Fitch KR Feuk L Perry GH Andrews TD et al November 2006 Global variation in copy number in the human genome Nature 444 7118 444 54 Bibcode 2006Natur 444 444R doi 10 1038 nature05329 PMC 2669898 PMID 17122850 a b c Barreiro LB Laval G Quach H Patin E Quintana Murci L March 2008 Natural selection has driven population differentiation in modern humans Nature Genetics 40 3 340 5 doi 10 1038 ng 78 PMID 18246066 S2CID 205357396 a b c Nielsen R Hellmann I Hubisz M Bustamante C Clark AG November 2007 Recent and ongoing selection in the human genome Nature Reviews Genetics 8 11 857 68 doi 10 1038 nrg2187 PMC 2933187 PMID 17943193 a b EE Harris et al The molecular signature of selection underlying human adaptations Yearbook of Physical Anthropology 49 89 130 2006 a b Bamshad M Wooding SP February 2003 Signatures of natural selection in the human genome Nature Reviews Genetics 4 2 99 111 doi 10 1038 nrg999 PMID 12560807 S2CID 13722452 Corona E Dudley JT Butte AJ August 2010 Hawks J ed Extreme evolutionary disparities seen in positive selection across seven complex diseases PLOS ONE 5 8 e12236 Bibcode 2010PLoSO 512236C doi 10 1371 journal pone 0012236 PMC 2923198 PMID 20808933 a b c d e f Meeting Report A Workshop to Plan a Deep Catalog of Human Genetic Variation 2007 http www 1000genomes org sites 1000genomes org files docs 1000Genomes MeetingReport pdf Oleksyk TK Brukhin V O Brien SJ 2015 The Genome Russia project closing the largest remaining omission on the world Genome map GigaScience 4 53 doi 10 1186 s13742 015 0095 0 PMC 4644275 PMID 26568821 C King 2012 The Hottest Research of 2011 Science Watch http archive sciencewatch com newsletter 2012 201203 hottest research 2012 1000 Genomes Project Community Analysis Meeting http 1000gconference sph umich edu External links Edit1000 Genomes A Deep Catalog of Human Genetic Variation official web page International HapMap Project official web page Human Genome Project Information Retrieved from https en wikipedia org w index php title 1000 Genomes Project amp oldid 1129929977, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.