fbpx
Wikipedia

Inferring horizontal gene transfer

Horizontal or lateral gene transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate investigations of the evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages.[1]

Inferring horizontal gene transfer through computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.

Overview edit

 
Conceptual overview of HGT inference methods. (1) Parametric methods infer HGT by computing a statistic, here GC content, for a sliding window and comparing it to the typical range over the entire genome, here indicated between the two red horizontal lines. Regions with atypical values are inferred as having been horizontally transferred. (2) Phylogenetic approaches rely on the differences between genes and species tree evolution that result from HGT. Explicit phylogenetic methods reconstruct gene trees and infer the HGT events likely to have resulted into that particular gene tree. Implicit phylogenetic methods bypass gene tree reconstruction, e.g., by looking at discrepancies between pairwise distances between genes and their corresponding species.

Horizontal gene transfer was first observed in 1928, in Frederick Griffith's experiment: showing that virulence was able to pass from virulent to non-virulent strains of Streptococcus pneumoniae, Griffith demonstrated that genetic information can be horizontally transferred between bacteria via a mechanism known as transformation.[2] Similar observations in the 1940s[3] and 1950s[4] showed evidence that conjugation and transduction are additional mechanisms of horizontal gene transfer.[5]

To infer HGT events, which may not necessarily result in phenotypic changes, most contemporary methods are based on analyses of genomic sequence data. These methods can be broadly separated into two groups: parametric and phylogenetic methods. Parametric methods search for sections of a genome that significantly differ from the genomic average, such as GC content or codon usage.[6] Phylogenetic methods examine evolutionary histories of genes involved and identify conflicting phylogenies. Phylogenetic methods can be further divided into those that reconstruct and compare phylogenetic trees explicitly, and those that use surrogate measures in place of the phylogenetic trees.[7]

The main feature of parametric methods is that they only rely on the genome under study to infer HGT events that may have occurred on its lineage. It has been a considerable advantage at the early times of the sequencing era, when few closely related genomes were available for comparative methods. However, because they rely on the uniformity of the host's signature to infer HGT events, not accounting for the host's intra-genomic variability will result in overpredictions—flagging native segments as possible HGT events.[8] Similarly, the transferred segments need to exhibit the donor's signature and to be significantly different from the recipient's.[6] Furthermore, genomic segments of foreign origin are subject to the same mutational processes as the rest of the host genome, and so the difference between the two tends to vanish over time, a process referred to as amelioration.[9] This limits the ability of parametric methods to detect ancient HGTs.

Phylogenetic methods benefit from the recent availability of many sequenced genomes. Indeed, as for all comparative methods, phylogenetic methods can integrate information from multiple genomes, and in particular integrate them using a model of evolution. This lends them the ability to better characterize the HGT events they infer—notably by designating the donor species and time of the transfer. However, models have limits and need to be used cautiously. For instance, the conflicting phylogenies can be the result of events not accounted for by the model, such as unrecognized paralogy due to duplication followed by gene losses. Also, many approaches rely on a reference species tree that is supposed to be known, when in many instances it can be difficult to obtain a reliable tree. Finally, the computational costs of reconstructing many gene/species trees can be prohibitively expensive. Phylogenetic methods tend to be applied to genes or protein sequences as basic evolutionary units, which limits their ability to detect HGT in regions outside or across gene boundaries.

Because of their complementary approaches—and often non-overlapping sets of HGT candidates—combining predictions from parametric and phylogenetic methods can yield a more comprehensive set of HGT candidate genes. Indeed, combining different parametric methods has been reported to significantly improve the quality of predictions.[10][11] Moreover, in the absence of a comprehensive set of true horizontally transferred genes, discrepancies between different methods[12][13] might be resolved through combining parametric and phylogenetic methods. However, combining inferences from multiple methods also entails a risk of an increased false-positive rate.[14]

Parametric methods edit

Parametric methods to infer HGT use characteristics of the genome sequence specific to particular species or clades, also called genomic signatures. If a fragment of the genome strongly deviates from the genomic signature, this is a sign of a potential horizontal transfer. For example, because bacterial GC content falls within a wide range, GC content of a genome segment is a simple genomic signature. Commonly used genomic signatures include nucleotide composition,[15] oligonucleotide frequencies,[16] or structural features of the genome.[17]

To detect HGT using parametric methods, the host's genomic signature needs to be clearly recognizable. However, the host's genome is not always uniform with respect to the genome signature: for example, GC content of the third codon position is lower close to the replication terminus [18] and GC content tends to be higher in highly expressed genes.[19] Not accounting for such intra-genomic variability in the host can result in over-predictions, flagging native segments as HGT candidates.[8] Larger sliding windows can account for this variability at the cost of a reduced ability to detect smaller HGT regions.[12]

Just as importantly, horizontally transferred segments need to exhibit the donor's genomic signature. This might not be the case for ancient transfers where transferred sequences are subjected to the same mutational processes as the rest of the host genome, potentially causing their distinct signatures to "ameliorate"[9] and become undetectable through parametric methods. For example, Bdellovibrio bacteriovorus, a predatory δ-Proteobacterium, has homogeneous GC content, and it might be concluded that its genome is resistant to HGT.[20] However, subsequent analysis using phylogenetic methods identified a number of ancient HGT events in the genome of B. bacteriovorus.[21] Similarly, if the inserted segment was previously ameliorated to the host's genome, as is the case for prophage insertions,[22] parametric methods might miss predicting these HGT events. Also, the donor's composition must significantly differ from the recipient's to be identified as abnormal, a condition that might be missed in the case of short- to medium-distance HGT, which are the most prevalent. Furthermore, it has been reported that recently acquired genes tend to be AT-richer than the recipient's average,[15] which indicates that differences in GC-content signature may result from unknown post-acquisition mutational processes rather than from the donor's genome.

Nucleotide composition edit

 
Average GC content of coding regions compared to the genome size for selected bacteria. There is considerable variation in average GC content across species, which makes it relevant as a genomic signature.

Bacterial GC content falls within a wide range, with Ca. Zinderia insecticola having a GC content of 13.5%[23] and Anaeromyxobacter dehalogenans having a GC content of 75%.[24] Even within a closely related group of α-Proteobacteria, values range from approximately 30% to 65%.[25] These differences can be exploited when detecting HGT events as a significantly different GC content for a genome segment can be an indication of foreign origin.[15]

Oligonucleotide spectrum edit

The oligonucleotide spectrum (or k-mer frequencies) measures the frequency of all possible nucleotide sequences of a particular length in the genome. It tends to vary less within genomes than between genomes and therefore can also be used as a genomic signature.[26] A deviation from this signature suggests that a genomic segment might have arrived through horizontal transfer.

The oligonucleotide spectrum owes much of its discriminatory power to the number of possible oligonucleotides: if n is the size of the vocabulary and w is oligonucleotide size, the number of possible distinct oligonucleotides is nw; for example, there are 45=1024 possible pentanucleotides. Some methods can capture the signal recorded in motifs of variable size,[27] thus capturing both rare and discriminative motifs along with frequent, but more common ones.

Codon usage bias, a measure related to codon frequencies, was one of the first detection methods used in methodical assessments of HGT.[16] This approach requires a host genome which contains a bias towards certain synonymous codons (different codons which code for the same amino acid) which is clearly distinct from the bias found within the donor genome. The simplest oligonucleotide used as a genomic signature is the dinucleotide, for example the third nucleotide in a codon and the first nucleotide in the following codon represent the dinucleotide least restricted by amino acid preference and codon usage.[28]

It is important to optimise the size of the sliding window in which to count the oligonucleotide frequency: a larger sliding window will better buffer variability in the host genome at the cost of being worse at detecting smaller HGT regions.[29] A good compromise has been reported using tetranucleotide frequencies in a sliding window of 5 kb with a step of 0.5kb.[30]

A convenient method of modelling oligonucleotide genomic signatures is to use Markov chains. The transition probability matrix can be derived for endogenous vs. acquired genes,[31] from which Bayesian posterior probabilities for particular stretches of DNA can be obtained.[32]

Structural features edit

Just as the nucleotide composition of a DNA molecule can be represented by a sequence of letters, its structural features can be encoded in a numerical sequence. The structural features include interaction energies between neighbouring base pairs,[33] the angle of twist that makes two bases of a pair non-coplanar,[34] or DNA deformability induced by the proteins shaping the chromatin.[35]

The autocorrelation analysis of some of these numerical sequences show characteristic periodicities in complete genomes.[36] In fact, after detecting archaea-like regions in the thermophilic bacteria Thermotoga maritima,[37] periodicity spectra of these regions were compared to the periodicity spectra of the homologous regions in the archaea Pyrococcus horikoshii.[17] The revealed similarities in the periodicity were strong supporting evidence for a case of massive HGT between the bacteria and the archaea kingdoms.[17]

Genomic context edit

The existence of genomic islands, short (typically 10–200kb long) regions of a genome which have been acquired horizontally, lends support to the ability to identify non-native genes by their location in a genome.[38] For example, a gene of ambiguous origin which forms part of a non-native operon could be considered to be non-native. Alternatively, flanking repeat sequences or the presence of nearby integrases or transposases can indicate a non-native region.[39] A machine-learning approach combining oligonucleotide frequency scans with context information was reported to be effective at identifying genomic islands.[40] In another study, the context was used as a secondary indicator, after removal of genes which are strongly thought to be native or non-native through the use of other parametric methods.[10]

Phylogenetic methods edit

The use of phylogenetic analysis in the detection of HGT was advanced by the availability of many newly sequenced genomes. Phylogenetic methods detect inconsistencies in gene and species evolutionary history in two ways: explicitly, by reconstructing the gene tree and reconciling it with the reference species tree, or implicitly, by examining aspects that correlate with the evolutionary history of the genes in question, e.g., patterns of presence/absence across species, or unexpectedly short or distant pairwise evolutionary distances.

Explicit phylogenetic methods edit

The aim of explicit phylogenetic methods is to compare gene trees with their associated species trees. While weakly supported differences between gene and species trees can be due to inference uncertainty, statistically significant differences can be suggestive of HGT events. For example, if two genes from different species share the most recent ancestral connecting node in the gene tree, but the respective species are spaced apart in the species tree, an HGT event can be invoked. Such an approach can produce more detailed results than parametric approaches because the involved species, time and direction of transfer can potentially be identified.

As discussed in more detail below, phylogenetic methods range from simple methods merely identifying discordance between gene and species trees to mechanistic models inferring probable sequences of HGT events. An intermediate strategy entails deconstructing the gene tree into smaller parts until each matches the species tree (genome spectral approaches).

Explicit phylogenetic methods rely upon the accuracy of the input rooted gene and species trees, yet these can be challenging to build.[41] Even when there is no doubt in the input trees, the conflicting phylogenies can be the result of evolutionary processes other than HGT, such as duplications and losses, causing these methods to erroneously infer HGT events when paralogy is the correct explanation. Similarly, in the presence of incomplete lineage sorting, explicit phylogeny methods can erroneously infer HGT events.[42] That is why some explicit model-based methods test multiple evolutionary scenarios involving different kinds of events, and compare their fit to the data given parsimonious or probabilistic criteria.

Tests of topologies edit

To detect sets of genes that fit poorly to the reference tree, one can use statistical tests of topology, such as the Kishino–Hasegawa (KH),[43] Shimodaira–Hasegawa (SH),[44] and Approximately Unbiased (AU)[45] tests. These tests assess the likelihood of the gene sequence alignment when the reference topology is given as the null hypothesis.

The rejection of the reference topology is an indication that the evolutionary history for that gene family is inconsistent with the reference tree. When these inconsistencies cannot be explained using a small number of non-horizontal events, such as gene loss and duplication, an HGT event is inferred.

One such analysis checked for HGT in groups of homologs of the γ-Proteobacterial lineage.[46] Six reference trees were reconstructed using either the highly conserved small subunit ribosomal RNA sequences, a consensus of the available gene trees or concatenated alignments of orthologs. The failure to reject the six evaluated topologies, and the rejection of seven alternative topologies, was interpreted as evidence for a small number of HGT events in the selected groups.

Tests of topology identify differences in tree topology taking into account the uncertainty in tree inference but they make no attempt at inferring how the differences came about. To infer the specifics of particular events, genome spectral or subtree pruning and regraft methods are required.

Genome spectral approaches edit

In order to identify the location of HGT events, genome spectral approaches decompose a gene tree into substructures (such as bipartitions or quartets) and identify those that are consistent or inconsistent with the species tree.

Bipartitions Removing one edge from a reference tree produces two unconnected sub-trees, each a disjoint set of nodes—a bipartition. If a bipartition is present in both the gene and the species trees, it is compatible; otherwise, it is conflicting. These conflicts can indicate an HGT event or may be the result of uncertainty in gene tree inference. To reduce uncertainty, bipartition analyses typically focus on strongly supported bipartitions such as those associated with branches with bootstrap values or posterior probabilities above certain thresholds. Any gene family found to have one or several conflicting, but strongly supported, bipartitions is considered as an HGT candidate.[47][48][49]

Quartet decomposition Quartets are trees consisting of four leaves. In bifurcating (fully resolved) trees, each internal branch induces a quartet whose leaves are either subtrees of the original tree or actual leaves of the original tree. If the topology of a quartet extracted from the reference species tree is embedded in the gene tree, the quartet is compatible with the gene tree. Conversely, incompatible strongly supported quartets indicate potential HGT events.[50] Quartet mapping methods are much more computationally efficient and naturally handle heterogeneous representation of taxa among gene families, making them a good basis for developing large-scale scans for HGT, looking for highways of gene sharing in databases of hundreds of complete genomes.[51][52]

Subtree pruning and regrafting edit

A mechanistic way of modelling an HGT event on the reference tree is to first cut an internal branch—i.e., prune the tree—and then regraft it onto another edge, an operation referred to as subtree pruning and regrafting (SPR).[53] If the gene tree was topologically consistent with the original reference tree, the editing results in an inconsistency. Similarly, when the original gene tree is inconsistent with the reference tree, it is possible to obtain a consistent topology by a series of one or more prune and regraft operations applied to the reference tree. By interpreting the edit path of pruning and regrafting, HGT candidate nodes can be flagged and the host and donor genomes inferred.[49][48][54] To avoid reporting false positive HGT events due to uncertain gene tree topologies, the optimal "path" of SPR operations can be chosen among multiple possible combinations by considering the branch support in the gene tree. Weakly supported gene tree edges can be ignored a priori[55] or the support can be used to compute an optimality criterion.[49][56][57][58]

Because conversion of one tree to another by a minimum number of SPR operations is NP-Hard,[59] solving the problem becomes considerably more difficult as more nodes are considered. The computational challenge lies in finding the optimal edit path, i.e., the one that requires the fewest steps,[60][61] and different strategies are used in solving the problem. For example, the HorizStory algorithm reduces the problem by first eliminating the consistent nodes;[62] recursive pruning and regrafting reconciles the reference tree with the gene tree and optimal edits are interpreted as HGT events. The SPR methods included in the supertree reconstruction package SPRSupertrees substantially decrease the time of the search for the optimal set of SPR operations by considering multiple localised sub-problems in large trees through a clustering approach.[63] The T-REX (webserver) includes a number of HGT detection methods [56] (mostly SPR-based) and allows users to calculate the bootstrap support of the inferred transfers.[49]

Model-based reconciliation methods edit

Reconciliation of gene and species trees entails mapping evolutionary events onto gene trees in a way that makes them concordant with the species tree. Different reconciliation models exist, differing in the types of event they consider to explain the incongruences between gene and species tree topologies. Early methods exclusively modelled horizontal transfers (T).[53][57][56] More recent ones also account for duplication (D), loss (L), incomplete lineage sorting (ILS) or homologous recombination (HR) events. The difficulty is that by allowing for multiple types of events, the number of possible reconciliations increases rapidly. For instance, a conflicting gene tree topologies might be explained in terms of a single HGT event or multiple duplication and loss events. Both alternatives can be considered plausible reconciliation depending on the frequency of these respective events along the species tree.

Reconciliation methods can rely on a parsimonious or a probabilistic framework to infer the most likely scenario(s), where the relative cost/probability of D, T, L events can be fixed a priori or estimated from the data.[64] The space of DTL reconciliations and their parsimony costs—which can be extremely vast for large multi-copy gene family trees—can be efficiently explored through dynamic programming algorithms.[64][65][66] In some programs, the gene tree topology can be refined where it was uncertain to fit a better evolutionary scenario as well as the initial sequence alignment.[65][67][68] More refined models account for the biased frequency of HGT between closely related lineages,[69] reflecting the loss of efficiency of HR with phylogenetic distance,[70] for ILS,[71] or for the fact that the actual donor of most HGT belong to extinct or unsampled lineages.[72] Further extensions of DTL models are being developed towards an integrated description of the genome evolution processes. In particular, some of them consider horizontal at multiple scales—modelling independent evolution of gene fragments[73] or recognising co-evolution of several genes (e.g., due to co-transfer) within and across genomes.[74][75][76]

Implicit phylogenetic methods edit

In contrast to explicit phylogenetic methods, which compare the agreement between gene and species trees, implicit phylogenetic methods compare evolutionary distances or sequence similarity. Here, an unexpectedly short or long distance from a given reference compared to the average can be suggestive of an HGT event. Because tree construction is not required, implicit approaches tend to be simpler and faster than explicit methods.

However, implicit methods can be limited by disparities between the underlying correct phylogeny and the evolutionary distances considered. For instance, the most similar sequence as obtained by the highest-scoring BLAST hit is not always the evolutionarily closest one.[77]

Top sequence match in a distant species edit

A simple way of identifying HGT events is by looking for high-scoring sequence matches in distantly related species. For example, an analysis of the top BLAST hits of protein sequences in the bacteria Thermotoga maritima revealed that most hits were in archaea rather than closely related bacteria, suggesting extensive HGT between the two;[37] these predictions were later supported by an analysis of the structural features of the DNA molecule.[17]

However, this method is limited to detecting relatively recent HGT events. Indeed, if the HGT occurred in the common ancestor of two or more species included in the database, the closest hit will reside within that clade and therefore the HGT will not be detected by the method. Thus, the threshold of the minimum number of foreign top BLAST hits to observe to decide a gene was transferred is highly dependent on the taxonomic coverage of sequence databases. Therefore, experimental settings may need to be defined in an ad-hoc way.[78]

Discrepancy between gene and species distances edit

The molecular clock hypothesis posits that homologous genes evolve at an approximately constant rate across different species.[79] If one only considers homologous genes related through speciation events (referred to as “orthologous" genes), their underlying tree should by definition correspond to the species tree. Therefore, assuming a molecular clock, the evolutionary distance between orthologous genes should be approximately proportional to the evolutionary distances between their respective species. If a putative group of orthologs contains xenologs (pairs of genes related through an HGT), the proportionality of evolutionary distances may only hold among the orthologs, not the xenologs.[80]

Simple approaches compare the distribution of similarity scores of particular sequences and their orthologous counterparts in other species; HGT are inferred from outliers.[81][82] The more sophisticated DLIGHT ('Distance Likelihood-based Inference of Genes Horizontally Transferred') method considers simultaneously the effect of HGT on all sequences within groups of putative orthologs:[7] if a likelihood-ratio test of the HGT hypothesis versus a hypothesis of no HGT is significant, a putative HGT event is inferred. In addition, the method allows inference of potential donor and recipient species and provides an estimation of the time since the HGT event.

Phylogenetic profiles edit

A group of orthologous or homologous genes can be analysed in terms of the presence or absence of group members in the reference genomes; such patterns are called phylogenetic profiles.[83] To find HGT events, phylogenetic profiles are scanned for an unusual distribution of genes. Absence of a homolog in some members of a group of closely related species is an indication that the examined gene might have arrived via an HGT event. For example, the three facultatively symbiotic Frankia sp. strains are of strikingly different sizes: 5.43 Mbp, 7.50 Mbp and 9.04 Mbp, depending on their range of hosts.[84] Marked portions of strain-specific genes were found to have no significant hit in the reference database, and were possibly acquired by HGT transfers from other bacteria. Similarly, the three phenotypically diverse Escherichia coli strains (uropathogenic, enterohemorrhagic and benign) share about 40% of the total combined gene pool, with the other 60% being strain-specific genes and consequently HGT candidates.[85] Further evidence for these genes resulting from HGT was their strikingly different codon usage patterns from the core genes and a lack of gene order conservation (order conservation is typical of vertically evolved genes).[85] The presence/absence of homologs (or their effective count) can thus be used by programs to reconstruct the most likely evolutionary scenario along the species tree. Just as with reconciliation methods, this can be achieved through parsimonious[86] or probabilistic estimation of the number of gain and loss events.[87][88] Models can be complexified by adding processes, like the truncation of genes,[89] but also by modelling the heterogeneity of rates of gain and loss across lineages[90] and/or gene families.[88][91]

Clusters of polymorphic sites edit

Genes are commonly regarded as the basic units transferred through an HGT event. However it is also possible for HGT to occur within genes. For example, it has been shown that horizontal transfer between closely related species results in more exchange of ORF fragments,[92][93] a type a transfer called gene conversion, mediated by homologous recombination. The analysis of a group of four Escherichia coli and two Shigella flexneri strains revealed that the sequence stretches common to all six strains contain polymorphic sites, consequences of homologous recombination.[94] Clusters of excess of polymorphic sites can thus be used to detect tracks of DNA recombined with a distant relative.[95] This method of detection is, however, restricted to the sites in common to all analysed sequences, limiting the analysis to a group of closely related organisms.

Evaluation edit

The existence of the numerous and varied methods to infer HGT raises the question of how to validate individual inferences and of how to compare the different methods.

A main problem is that, as with other types of phylogenetic inferences, the actual evolutionary history cannot be established with certainty. As a result, it is difficult to obtain a representative test set of HGT events. Furthermore, HGT inference methods vary considerably in the information they consider and often identify inconsistent groups of HGT candidates:[6][96] it is not clear to what extent taking the intersection, the union, or some other combination of the individual methods affects the false positive and false negative rates.[14]

Parametric and phylogenetic methods draw on different sources of information; it is therefore difficult to make general statements about their relative performance. Conceptual arguments can however be invoked. While parametric methods are limited to the analysis of single or pairs of genomes, phylogenetic methods provide a natural framework to take advantage of the information contained in multiple genomes. In many cases, segments of genomes inferred as HGT based on their anomalous composition can also be recognised as such on the basis of phylogenetic analyses or through their mere absence in genomes of related organisms. In addition, phylogenetic methods rely on explicit models of sequence evolution, which provide a well-understood framework for parameter inference, hypothesis testing, and model selection. This is reflected in the literature, which tends to favour phylogenetic methods as the standard of proof for HGT.[97][98][99][100] The use of phylogenetic methods thus appears to be the preferred standard, especially given that the increase in computational power coupled with algorithmic improvements has made them more tractable,[63][72] and that the ever denser sampling of genomes lends more power to these tests.

Considering phylogenetic methods, several approaches to validating individual HGT inferences and benchmarking methods have been adopted, typically relying on various forms of simulation. Because the truth is known in simulation, the number of false positives and the number of false negatives are straightforward to compute. However, simulating data do not trivially resolve the problem because the true extent of HGT in nature remains largely unknown, and specifying rates of HGT in the simulated model is always hasardous. Nonetheless, studies involving the comparison of several phylogenetic methods in a simulation framework could provide quantitative assessment of their respective performances, and thus help the biologist in choosing objectively proper tools.[58]

Standard tools to simulate sequence evolution along trees such as INDELible[101] or PhyloSim[102] can be adapted to simulate HGT. HGT events cause the relevant gene trees to conflict with the species tree. Such HGT events can be simulated through subtree pruning and regrafting rearrangements of the species tree.[55] However, it is important to simulate data that are realistic enough to be representative of the challenge provided by real datasets, and simulation under complex models are thus preferable. A model was developed to simulate gene trees with heterogeneous substitution processes in addition to the occurrence of transfer, and accounting for the fact that transfer can come from now extinct donor lineages.[103] Alternatively, the genome evolution simulator ALF[104] directly generates gene families subject to HGT, by accounting for a whole range of evolutionary forces at the base level, but in the context of a complete genome. Given simulated sequences which have HGT, analysis of those sequences using the methods of interest and comparison of their results with the known truth permits study of their performance. Similarly, testing the methods on sequence known not to have HGT enables the study of false positive rates.

Simulation of HGT events can also be performed by manipulating the biological sequences themselves. Artificial chimeric genomes can be obtained by inserting known foreign genes into random positions of a host genome.[12][105][106][107] The donor sequences are inserted into the host unchanged or can be further evolved by simulation,[7] e.g., using the tools described above.

One important caveat to simulation as a way to assess different methods is that simulation is based on strong simplifying assumptions which may favour particular methods.[108]

See also edit

References edit

  This article was adapted from the following source under a CC BY 4.0 license (2015) (reviewer reports): Matt Ravenhall; Nives Škunca; Florent Lassalle; Christophe Dessimoz (May 2015). "Inferring horizontal gene transfer". PLOS Computational Biology. 11 (5): e1004095. doi:10.1371/JOURNAL.PCBI.1004095. ISSN 1553-734X. PMC 4462595. PMID 26020646. Wikidata Q21045419.{{cite journal}}: CS1 maint: unflagged free DOI (link)

  1. ^ Hiramatsu K, Cui L, Kuroda M, Ito T (October 2001). "The emergence and evolution of methicillin-resistant Staphylococcus aureus". Trends in Microbiology. 9 (10): 486–93. doi:10.1016/s0966-842x(01)02175-8. PMID 11597450.
  2. ^ Griffith F (January 1928). "The Significance of Pneumococcal Types". The Journal of Hygiene. 27 (2): 113–59. doi:10.1017/s0022172400031879. PMC 2167760. PMID 20474956.
  3. ^ Tatum EL, Lederberg J (June 1947). "Gene Recombination in the Bacterium Escherichia coli". Journal of Bacteriology. 53 (6): 673–84. doi:10.1128/JB.53.6.673-684.1947. PMC 518375. PMID 16561324.
  4. ^ Zinder ND, Lederberg J (November 1952). "Genetic exchange in Salmonella". Journal of Bacteriology. 64 (5): 679–99. doi:10.1128/JB.64.5.679-699.1952. PMC 169409. PMID 12999698.
  5. ^ Jones D, Sneath PH (March 1970). "Genetic transfer and bacterial taxonomy". Bacteriological Reviews. 34 (1): 40–81. doi:10.1128/MMBR.34.1.40-81.1970. PMC 378348. PMID 4909647.
  6. ^ a b c Lawrence JG, Ochman H (January 2002). "Reconciling the many faces of lateral gene transfer". Trends in Microbiology. 10 (1): 1–4. doi:10.1016/s0966-842x(01)02282-x. PMID 11755071.
  7. ^ a b c Dessimoz C, Margadant D, Gonnet GH (2008). "DLIGHT – Lateral Gene Transfer Detection Using Pairwise Evolutionary Distances in a Statistical Framework". Research in Computational Molecular Biology. Lecture Notes in Computer Science. Vol. 4955. pp. 315–330. doi:10.1007/978-3-540-78839-3_27. ISBN 978-3-540-78838-6. S2CID 12776750.
  8. ^ a b Guindon S, Perrière G (September 2001). "Intragenomic base content variation is a potential source of biases when searching for horizontally transferred genes". Molecular Biology and Evolution. 18 (9): 1838–40. doi:10.1093/oxfordjournals.molbev.a003972. PMID 11504864.
  9. ^ a b Lawrence JG, Ochman H (April 1997). "Amelioration of bacterial genomes: rates of change and exchange". Journal of Molecular Evolution. 44 (4): 383–97. Bibcode:1997JMolE..44..383L. CiteSeerX 10.1.1.590.7214. doi:10.1007/pl00006158. PMID 9089078. S2CID 7928957.
  10. ^ a b Azad RK, Lawrence JG (May 2011). "Towards more robust methods of alien gene detection". Nucleic Acids Research. 39 (9): e56. doi:10.1093/nar/gkr059. PMC 3089488. PMID 21297116.
  11. ^ Xiong D, Xiao F, Liu L, Hu K, Tan Y, He S, Gao X (2012). "Towards a better detection of horizontally transferred genes by combining unusual properties effectively". PLOS ONE. 7 (8): e43126. Bibcode:2012PLoSO...743126X. doi:10.1371/journal.pone.0043126. PMC 3419211. PMID 22905214.
  12. ^ a b c Becq J, Churlaud C, Deschavanne P (April 2010). "A benchmark of parametric methods for horizontal transfers detection". PLOS ONE. 5 (4): e9989. Bibcode:2010PLoSO...5.9989B. doi:10.1371/journal.pone.0009989. PMC 2848678. PMID 20376325.
  13. ^ Poptsova M (2009). "Testing Phylogenetic Methods to Identify Horizontal Gene Transfer". Horizontal Gene Transfer. Methods in Molecular Biology. Vol. 532. pp. 227–40. doi:10.1007/978-1-60327-853-9_13. ISBN 978-1-60327-852-2. PMID 19271188.
  14. ^ a b Poptsova MS, Gogarten JP (March 2007). "The power of phylogenetic approaches to detect horizontally transferred genes". BMC Evolutionary Biology. 7 (1): 45. Bibcode:2007BMCEE...7...45P. doi:10.1186/1471-2148-7-45. PMC 1847511. PMID 17376230.
  15. ^ a b c Daubin V, Lerat E, Perrière G (2003). "The source of laterally transferred genes in bacterial genomes". Genome Biology. 4 (9): R57. doi:10.1186/gb-2003-4-9-r57. PMC 193657. PMID 12952536.
  16. ^ a b Lawrence JG, Ochman H (August 1998). "Molecular archaeology of the Escherichia coli genome". Proceedings of the National Academy of Sciences of the United States of America. 95 (16): 9413–7. Bibcode:1998PNAS...95.9413L. doi:10.1073/pnas.95.16.9413. PMC 21352. PMID 9689094.
  17. ^ a b c d Worning P, Jensen LJ, Nelson KE, Brunak S, Ussery DW (February 2000). "Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima". Nucleic Acids Research. 28 (3): 706–9. doi:10.1093/nar/28.3.706. PMC 102551. PMID 10637321.
  18. ^ Deschavanne P, Filipski J (April 1995). "Correlation of GC content with replication timing and repair mechanisms in weakly expressed E.coli genes". Nucleic Acids Research. 23 (8): 1350–3. doi:10.1093/nar/23.8.1350. PMC 306860. PMID 7753625.
  19. ^ Wuitschick JD, Karrer KM (1999). "Analysis of genomic G + C content, codon usage, initiator codon context and translation termination sites in Tetrahymena thermophila". The Journal of Eukaryotic Microbiology. 46 (3): 239–47. doi:10.1111/j.1550-7408.1999.tb05120.x. PMID 10377985. S2CID 28836138.
  20. ^ Rendulic S, Jagtap P, Rosinus A, Eppinger M, Baar C, Lanz C, et al. (January 2004). "A predator unmasked: life cycle of Bdellovibrio bacteriovorus from a genomic perspective". Science. 303 (5658): 689–92. Bibcode:2004Sci...303..689R. doi:10.1126/science.1093027. PMID 14752164. S2CID 38154836.
  21. ^ Gophna U, Charlebois RL, Doolittle WF (February 2006). "Ancient lateral gene transfer in the evolution of Bdellovibrio bacteriovorus". Trends in Microbiology. 14 (2): 64–9. doi:10.1016/j.tim.2005.12.008. PMID 16413191.
  22. ^ Vernikos GS, Thomson NR, Parkhill J (2007). "Genetic flux over time in the Salmonella lineage". Genome Biology. 8 (6): R100. doi:10.1186/gb-2007-8-6-r100. PMC 2394748. PMID 17547764.
  23. ^ McCutcheon JP, Moran NA (2010). "Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution". Genome Biology and Evolution. 2: 708–18. doi:10.1093/gbe/evq055. PMC 2953269. PMID 20829280.
  24. ^ Liu Z, Venkatesh SS, Maley CC (October 2008). "Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples". BMC Genomics. 9: 509. doi:10.1186/1471-2164-9-509. PMC 2628393. PMID 18973670.
  25. ^ Bentley SD, Parkhill J (2004). "Comparative genomic structure of prokaryotes". Annual Review of Genetics. 38: 771–92. doi:10.1146/annurev.genet.38.072902.094318. PMID 15568993. S2CID 5524251.
  26. ^ Karlin S, Burge C (July 1995). "Dinucleotide relative abundance extremes: a genomic signature". Trends in Genetics. 11 (7): 283–90. doi:10.1016/S0168-9525(00)89076-9. PMID 7482779.
  27. ^ Vernikos GS, Parkhill J (September 2006). "Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands". Bioinformatics. 22 (18): 2196–203. doi:10.1093/bioinformatics/btl369. PMID 16837528.
  28. ^ Hooper SD, Berg OG (March 2002). "Detection of genes with atypical nucleotide sequence in microbial genomes". Journal of Molecular Evolution. 54 (3): 365–75. Bibcode:2002JMolE..54..365H. doi:10.1007/s00239-001-0051-8. PMID 11847562. S2CID 6872232.
  29. ^ Deschavanne PJ, Giron A, Vilain J, Fagot G, Fertil B (October 1999). "Genomic signature: characterization and classification of species assessed by chaos game representation of sequences". Molecular Biology and Evolution. 16 (10): 1391–9. doi:10.1093/oxfordjournals.molbev.a026048. PMID 10563018.
  30. ^ Dufraigne C, Fertil B, Lespinats S, Giron A, Deschavanne P (January 2005). "Detection and characterization of horizontal transfers in prokaryotes using genomic signature". Nucleic Acids Research. 33 (1): e6. doi:10.1093/nar/gni004. PMC 546175. PMID 15653627.
  31. ^ Cortez D, Forterre P, Gribaldo S (2009). "A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes". Genome Biology. 10 (6): R65. doi:10.1186/gb-2009-10-6-r65. PMC 2718499. PMID 19531232.
  32. ^ Nakamura Y, Itoh T, Matsuda H, Gojobori T (July 2004). "Biased biological functions of horizontally transferred genes in prokaryotic genomes". Nature Genetics. 36 (7): 760–6. doi:10.1038/ng1381. PMID 15208628.
  33. ^ Ornstein RL, Rein R (October 1978). "An optimized potential function for the calculation of nucleic acid interaction energies I. base stacking". Biopolymers. 17 (10): 2341–60. doi:10.1002/bip.1978.360171005. PMID 24624489. S2CID 13063636.
  34. ^ el Hassan MA, Calladine CR (May 1996). "Propeller-twisting of base-pairs and the conformational mobility of dinucleotide steps in DNA". Journal of Molecular Biology. 259 (1): 95–103. doi:10.1006/jmbi.1996.0304. PMID 8648652.
  35. ^ Olson WK, Gorin AA, Lu XJ, Hock LM, Zhurkin VB (September 1998). "DNA sequence-dependent deformability deduced from protein-DNA crystal complexes". Proceedings of the National Academy of Sciences of the United States of America. 95 (19): 11163–8. Bibcode:1998PNAS...9511163O. doi:10.1073/pnas.95.19.11163. PMC 21613. PMID 9736707.
  36. ^ Herzel H, Weiss O, Trifonov EN (March 1999). "10-11 bp periodicities in complete genomes reflect protein structure and DNA folding". Bioinformatics. 15 (3): 187–93. doi:10.1093/bioinformatics/15.3.187. PMID 10222405.
  37. ^ a b Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, et al. (May 1999). "Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima". Nature. 399 (6734): 323–9. Bibcode:1999Natur.399..323N. doi:10.1038/20601. PMID 10360571. S2CID 4420157.
  38. ^ Langille MG, Hsiao WW, Brinkman FS (May 2010). "Detecting genomic islands using bioinformatics approaches". Nature Reviews. Microbiology. 8 (5): 373–82. doi:10.1038/nrmicro2350. PMID 20395967. S2CID 2373228.
  39. ^ Hacker J, Blum-Oehler G, Mühldorfer I, Tschäpe H (March 1997). "Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution". Molecular Microbiology. 23 (6): 1089–97. doi:10.1046/j.1365-2958.1997.3101672.x. PMID 9106201. S2CID 27524815.
  40. ^ Vernikos GS, Parkhill J (February 2008). "Resolving the structural features of genomic islands: a machine learning approach". Genome Research. 18 (2): 331–42. doi:10.1101/gr.7004508. PMC 2203631. PMID 18071028.
  41. ^ Altenhoff AM, Dessimoz C (2012). "Inferring Orthology and Paralogy" (PDF). Evolutionary Genomics. Methods in Molecular Biology. Vol. 855. Totowa, NJ: Humana Press. pp. 259–79. doi:10.1007/978-1-61779-582-4_9. ISBN 978-1-61779-581-7. PMID 22407712.
  42. ^ Than C, Ruths D, Innan H, Nakhleh L (May 2007). "Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions". Journal of Computational Biology. 14 (4): 517–35. CiteSeerX 10.1.1.121.7834. doi:10.1089/cmb.2007.A010. PMID 17572027.
  43. ^ Goldman N, Anderson JP, Rodrigo AG (December 2000). "Likelihood-based tests of topologies in phylogenetics". Systematic Biology. 49 (4): 652–70. doi:10.1080/106351500750049752. PMID 12116432.
  44. ^ Shimodaira H, Hasegawa M (1999). "Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference". Molecular Biology and Evolution. 16 (8): 1114–1116. doi:10.1093/oxfordjournals.molbev.a026201.
  45. ^ Shimodaira H (June 2002). "An approximately unbiased test of phylogenetic tree selection". Systematic Biology. 51 (3): 492–508. doi:10.1080/10635150290069913. PMID 12079646. S2CID 11586099.
  46. ^ Lerat E, Daubin V, Moran NA (October 2003). "From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria". PLOS Biology. 1 (1): E19. doi:10.1371/journal.pbio.0000019. PMC 193605. PMID 12975657.
  47. ^ Zhaxybayeva O, Hamel L, Raymond J, Gogarten JP (2004). "Visualization of the phylogenetic content of five genomes using dekapentagonal maps". Genome Biology. 5 (3): R20. doi:10.1186/gb-2004-5-3-r20. PMC 395770. PMID 15003123.
  48. ^ a b Beiko RG, Harlow TJ, Ragan MA (October 2005). "Highways of gene sharing in prokaryotes". Proceedings of the National Academy of Sciences of the United States of America. 102 (40): 14332–7. Bibcode:2005PNAS..10214332B. doi:10.1073/pnas.0504068102. PMC 1242295. PMID 16176988.
  49. ^ a b c d Boc A, Philippe H, Makarenkov V (March 2010). "Inferring and validating horizontal gene transfer events using bipartition dissimilarity". Systematic Biology. 59 (2). Oxford University Press: 195–211. doi:10.1093/sysbio/syp103. PMID 20525630.
  50. ^ Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT (September 2006). "Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events". Genome Research. 16 (9): 1099–108. doi:10.1101/gr.5322306. PMC 1557764. PMID 16899658.
  51. ^ Bansal MS, Banay G, Gogarten JP, Shamir R (September 2011). "Detecting highways of horizontal gene transfer". Journal of Computational Biology. 18 (9): 1087–114. CiteSeerX 10.1.1.418.3658. doi:10.1089/cmb.2011.0066. PMID 21899418.
  52. ^ Bansal MS, Banay G, Harlow TJ, Gogarten JP, Shamir R (March 2013). "Systematic inference of highways of horizontal gene transfer in prokaryotes". Bioinformatics. 29 (5): 571–9. doi:10.1093/bioinformatics/btt021. PMID 23335015.
  53. ^ a b Hallett MT, Lagergren J. RECOMB 2001. Montreal: ACM; 2001. Efficient Algorithms for Lateral Gene Transfer Problems; pp. 149–156.
  54. ^ Baroni M, Grünewald S, Moulton V, Semple C (August 2005). "Bounding the number of hybridisation events for a consistent evolutionary history". Journal of Mathematical Biology. 51 (2): 171–82. doi:10.1007/s00285-005-0315-9. hdl:10092/12222. PMID 15868201. S2CID 3180904.
  55. ^ a b Beiko RG, Hamilton N (February 2006). "Phylogenetic identification of lateral genetic transfer events". BMC Evolutionary Biology. 6 (1): 15. Bibcode:2006BMCEE...6...15B. doi:10.1186/1471-2148-6-15. PMC 1431587. PMID 16472400.
  56. ^ a b c Boc A, Diallo AB, Makarenkov V (July 2012). "T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks". Nucleic Acids Research. 40 (W1). Oxford University Press: W573-9. doi:10.1093/nar/gks485. PMC 3394261. PMID 22675075.
  57. ^ a b Nakhleh L, Ruths DA, Wang L: RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer. COCOON, August 16–29, 2005; Kunming 2005.
  58. ^ a b Abby SS, Tannier E, Gouy M, Daubin V (June 2010). "Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests". BMC Bioinformatics. 11: 324. doi:10.1186/1471-2105-11-324. PMC 2905365. PMID 20550700.
  59. ^ Hickey G, Dehne F, Rau-Chaplin A, Blouin C (February 2008). "SPR distance computation for unrooted trees". Evolutionary Bioinformatics Online. 4: 17–27. doi:10.4137/ebo.s419. PMC 2614206. PMID 19204804.
  60. ^ Hein J, Jiang T, Wang L, Zhang K (1996). "On the complexity of comparing evolutionary trees". Discrete Applied Mathematics. 71 (1–3): 153–169. doi:10.1016/S0166-218X(96)00062-5.
  61. ^ Allen BL, Steel M (2001). "Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees". Annals of Combinatorics. 5: 1–15. CiteSeerX 10.1.1.24.8389. doi:10.1007/s00026-001-8006-8. S2CID 2934442.
  62. ^ MacLeod D, Charlebois RL, Doolittle F, Bapteste E (April 2005). "Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement". BMC Evolutionary Biology. 5: 27. doi:10.1186/1471-2148-5-27. PMC 1087482. PMID 15819979.
  63. ^ a b Whidden C, Zeh N, Beiko RG (July 2014). "Supertrees Based on the Subtree Prune-and-Regraft Distance". Systematic Biology. 63 (4): 566–81. doi:10.1093/sysbio/syu023. PMC 4055872. PMID 24695589.
  64. ^ a b Doyon JP, Hamel S, Chauve C (2012). "An efficient method for exploring the space of gene tree/species tree reconciliations in a probabilistic framework" (PDF). IEEE/ACM Transactions on Computational Biology and Bioinformatics. 9 (1): 26–39. doi:10.1109/TCBB.2011.64. PMID 21464510. S2CID 2493991.
  65. ^ a b David LA, Alm EJ (January 2011). "Rapid evolutionary innovation during an Archaean genetic expansion" (PDF). Nature. 469 (7328): 93–6. Bibcode:2011Natur.469...93D. doi:10.1038/nature09649. hdl:1721.1/61263. PMID 21170026. S2CID 4420725.
  66. ^ Szöllosi GJ, Boussau B, Abby SS, Tannier E, Daubin V (October 2012). "Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations". Proceedings of the National Academy of Sciences of the United States of America. 109 (43): 17513–8. Bibcode:2012PNAS..10917513S. doi:10.1073/pnas.1202997109. PMC 3491530. PMID 23043116.
  67. ^ Nguyen TH, Ranwez V, Pointet S, Chifolleau AM, Doyon JP, Berry V (April 2013). "Reconciliation and local gene tree rearrangement can be of mutual profit". Algorithms for Molecular Biology. 8 (1): 12. doi:10.1186/1748-7188-8-12. PMC 3871789. PMID 23566548.
  68. ^ Szöllosi GJ, Tannier E, Lartillot N, Daubin V (May 2013). "Lateral gene transfer from the dead". Systematic Biology. 62 (3): 386–97. arXiv:1211.4606. doi:10.1093/sysbio/syt003. PMC 3622898. PMID 23355531.
  69. ^ Bansal MS, Alm EJ, Kellis M (June 2012). "Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss". Bioinformatics. 28 (12): i283-91. doi:10.1093/bioinformatics/bts225. PMC 3371857. PMID 22689773.
  70. ^ Majewski J, Zawadzki P, Pickerill P, Cohan FM, Dowson CG (February 2000). "Barriers to genetic exchange between bacterial species: Streptococcus pneumoniae transformation". Journal of Bacteriology. 182 (4): 1016–23. doi:10.1128/jb.182.4.1016-1023.2000. PMC 94378. PMID 10648528.
  71. ^ Sjöstrand J, Tofigh A, Daubin V, Arvestad L, Sennblad B, Lagergren J (May 2014). "A Bayesian method for analyzing lateral gene transfer". Systematic Biology. 63 (3): 409–20. doi:10.1093/sysbio/syu007. PMID 24562812.
  72. ^ a b Szöllõsi GJ, Rosikiewicz W, Boussau B, Tannier E, Daubin V (November 2013). "Efficient exploration of the space of reconciled gene trees". Systematic Biology. 62 (6): 901–12. arXiv:1306.2167. Bibcode:2013arXiv1306.2167S. doi:10.1093/sysbio/syt054. PMC 3797637. PMID 23925510.
  73. ^ Haggerty LS, Jachiet PA, Hanage WP, Fitzpatrick DA, Lopez P, O'Connell MJ, et al. (March 2014). "A pluralistic account of homology: adapting the models to the data". Molecular Biology and Evolution. 31 (3): 501–16. doi:10.1093/molbev/mst228. PMC 3935183. PMID 24273322.
  74. ^ Szöllősi GJ, Tannier E, Daubin V, Boussau B (January 2015). "The inference of gene trees with species trees". Systematic Biology. 64 (1): e42-62. doi:10.1093/sysbio/syu048. PMC 4265139. PMID 25070970.
  75. ^ Lassalle F, Planel R, Penel S, Chapulliot D, Barbe V, Dubost A, et al. (December 2017). "Ancestral Genome Estimation Reveals the History of Ecological Diversification in Agrobacterium". Genome Biology and Evolution. 9 (12): 3413–3431. doi:10.1093/gbe/evx255. PMC 5739047. PMID 29220487.
  76. ^ Duchemin W, Anselmetti Y, Patterson M, Ponty Y, Bérard S, Chauve C, et al. (May 2017). "DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies". Genome Biology and Evolution. 9 (5): 1312–1319. doi:10.1093/gbe/evx069. PMC 5441342. PMID 28402423.
  77. ^ Koski LB, Golding GB (June 2001). "The closest BLAST hit is often not the nearest neighbor". Journal of Molecular Evolution. 52 (6): 540–2. Bibcode:2001JMolE..52..540K. doi:10.1007/s002390010184. PMID 11443357. S2CID 24848333.
  78. ^ Wisniewski-Dyé F, Borziak K, Khalsa-Moyers G, Alexandre G, Sukharnikov LO, Wuichet K, et al. (December 2011). Richardson PM (ed.). "Azospirillum genomes reveal transition of bacteria from aquatic to terrestrial environments". PLOS Genetics. 7 (12): e1002430. doi:10.1371/journal.pgen.1002430. PMC 3245306. PMID 22216014.
  79. ^ Zuckerkandl, E. and Pauling, L.B. 1965. Evolutionary divergence and convergence in proteins. In Bryson, V.and Vogel, H.J. (editors). Evolving Genes and Proteins. Academic Press, New York. pp. 97–166.
  80. ^ Novichkov PS, Omelchenko MV, Gelfand MS, Mironov AA, Wolf YI, Koonin EV (October 2004). "Genome-wide molecular clock and horizontal gene transfer in bacterial evolution". Journal of Bacteriology. 186 (19): 6575–85. doi:10.1128/JB.186.19.6575-6585.2004. PMC 516599. PMID 15375139.
  81. ^ Lawrence JG, Hartl DL (July 1992). "Inference of horizontal genetic transfer from molecular data: an approach using the bootstrap". Genetics. 131 (3): 753–60. doi:10.1093/genetics/131.3.753. PMC 1205046. PMID 1628816.
  82. ^ Clarke GD, Beiko RG, Ragan MA, Charlebois RL (April 2002). "Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores". Journal of Bacteriology. 184 (8): 2072–80. doi:10.1128/jb.184.8.2072-2080.2002. PMC 134965. PMID 11914337.
  83. ^ Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (April 1999). "Assigning protein functions by comparative genome analysis: protein phylogenetic profiles". Proceedings of the National Academy of Sciences of the United States of America. 96 (8): 4285–8. Bibcode:1999PNAS...96.4285P. doi:10.1073/pnas.96.8.4285. PMC 16324. PMID 10200254.
  84. ^ Normand P, Lapierre P, Tisa LS, Gogarten JP, Alloisio N, Bagnarol E, et al. (January 2007). "Genome characteristics of facultatively symbiotic Frankia sp. strains reflect host range and host plant biogeography". Genome Research. 17 (1): 7–15. doi:10.1101/gr.5798407. PMC 1716269. PMID 17151343.
  85. ^ a b Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, et al. (December 2002). "Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli". Proceedings of the National Academy of Sciences of the United States of America. 99 (26): 17020–4. Bibcode:2002PNAS...9917020W. doi:10.1073/pnas.252529799. PMC 139262. PMID 12471157.
  86. ^ Csűrös MS (2008). "Ancestral Reconstruction by Asymmetric Wagner Parsimony over Continuous Characters and Squared Parsimony over Distributions". Comparative Genomics. Lecture Notes in Computer Science. Vol. 5267. pp. 72–86. doi:10.1007/978-3-540-87989-3_6. ISBN 978-3-540-87988-6. S2CID 10717969.
  87. ^ Pagel M (October 1999). "Inferring the historical patterns of biological evolution". Nature. 401 (6756): 877–84. Bibcode:1999Natur.401..877P. doi:10.1038/44766. hdl:2027.42/148253. PMID 10553904. S2CID 205034365.
  88. ^ a b Csurös M, Miklós I (September 2009). "Streamlining and large ancestral genomes in Archaea inferred with a phylogenetic birth-and-death model". Molecular Biology and Evolution. 26 (9): 2087–95. doi:10.1093/molbev/msp123. PMC 2726834. PMID 19570746.
  89. ^ Hao W, Golding GB (September 2010). "Inferring bacterial genome flux while considering truncated genes". Genetics. 186 (1): 411–26. doi:10.1534/genetics.110.118448. PMC 2940306. PMID 20551435.
  90. ^ Hao W, Golding GB (May 2006). "The fate of laterally transferred genes: life in the fast lane to adaptation or death". Genome Research. 16 (5): 636–43. doi:10.1101/gr.4746406. PMC 1457040. PMID 16651664.
  91. ^ Hao W, Golding GB (May 2008). "Uncovering rate variation of lateral gene transfer during bacterial genome evolution". BMC Genomics. 9: 235. doi:10.1186/1471-2164-9-235. PMC 2426709. PMID 18492275.
  92. ^ Ochman H, Lawrence JG, Groisman EA (May 2000). "Lateral gene transfer and the nature of bacterial innovation". Nature. 405 (6784): 299–304. Bibcode:2000Natur.405..299O. doi:10.1038/35012500. PMID 10830951. S2CID 85739173.
  93. ^ Papke RT, Koenig JE, Rodríguez-Valera F, Doolittle WF (December 2004). "Frequent recombination in a saltern population of Halorubrum". Science. 306 (5703): 1928–9. Bibcode:2004Sci...306.1928P. doi:10.1126/science.1103289. PMID 15591201. S2CID 21595153.
  94. ^ Mau B, Glasner JD, Darling AE, Perna NT (2006). "Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli". Genome Biology. 7 (5): R44. doi:10.1186/gb-2006-7-5-r44. PMC 1779527. PMID 16737554.
  95. ^ Didelot X, Falush D (March 2007). "Inference of bacterial microevolution using multilocus sequence data". Genetics. 175 (3): 1251–66. doi:10.1534/genetics.106.063305. PMC 1840087. PMID 17151252.
  96. ^ Ragan MA (July 2001). "On surrogate methods for detecting lateral gene transfer". FEMS Microbiology Letters. 201 (2): 187–91. doi:10.1111/j.1574-6968.2001.tb10755.x. PMID 11470360.
  97. ^ Ragan MA, Harlow TJ, Beiko RG (January 2006). "Do different surrogate methods detect lateral genetic transfer events of different relative ages?". Trends in Microbiology. 14 (1): 4–8. doi:10.1016/j.tim.2005.11.004. PMID 16356716.
  98. ^ Kechris KJ, Lin JC, Bickel PJ, Glazer AN (June 2006). "Quantitative exploration of the occurrence of lateral gene transfer by using nitrogen fixation genes as a case study". Proceedings of the National Academy of Sciences of the United States of America. 103 (25): 9584–9. Bibcode:2006PNAS..103.9584K. doi:10.1073/pnas.0603534103. PMC 1480450. PMID 16769896.
  99. ^ Moran NA, Jarvik T (April 2010). "Lateral transfer of genes from fungi underlies carotenoid production in aphids". Science. 328 (5978): 624–7. Bibcode:2010Sci...328..624M. doi:10.1126/science.1187113. PMID 20431015. S2CID 14785276.
  100. ^ Danchin EG, Rosso MN, Vieira P, de Almeida-Engler J, Coutinho PM, Henrissat B, Abad P (October 2010). "Multiple lateral gene transfers and duplications have promoted plant parasitism ability in nematodes". Proceedings of the National Academy of Sciences of the United States of America. 107 (41): 17651–6. Bibcode:2010PNAS..10717651D. doi:10.1073/pnas.1008486107. PMC 2955110. PMID 20876108.
  101. ^ Fletcher W, Yang Z (August 2009). "INDELible: a flexible simulator of biological sequence evolution". Molecular Biology and Evolution. 26 (8): 1879–88. doi:10.1093/molbev/msp098. PMC 2712615. PMID 19423664.
  102. ^ Sipos B, Massingham T, Jordan GE, Goldman N (April 2011). "PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment". BMC Bioinformatics. 12: 104. doi:10.1186/1471-2105-12-104. PMC 3102636. PMID 21504561.
  103. ^ Galtier N (August 2007). "A model of horizontal gene transfer and the bacterial phylogeny problem". Systematic Biology. 56 (4): 633–42. doi:10.1080/10635150701546231. PMID 17661231.
  104. ^ Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C (April 2012). "ALF--a simulation framework for genome evolution". Molecular Biology and Evolution. 29 (4): 1115–23. doi:10.1093/molbev/msr268. PMC 3341827. PMID 22160766.
  105. ^ Cortez DQ, Lazcano A, Becerra A (2005). "Comparative analysis of methodologies for the detection of horizontally transferred genes: a reassessment of first-order Markov models". In Silico Biology. 5 (5–6): 581–92. PMID 16610135.
  106. ^ Tsirigos A, Rigoutsos I (2005). "A new computational method for the detection of horizontal gene transfer events". Nucleic Acids Research. 33 (3): 922–33. doi:10.1093/nar/gki187. PMC 549390. PMID 15716310.
  107. ^ Azad RK, Lawrence JG (November 2005). "Use of artificial genomes in assessing methods for atypical gene detection". PLOS Computational Biology. 1 (6): e56. Bibcode:2005PLSCB...1...56A. doi:10.1371/journal.pcbi.0010056. PMC 1282332. PMID 16292353.
  108. ^ Iantorno S, Gori K, Goldman N, Gil M, Dessimoz C (2014). "Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment". Multiple Sequence Alignment Methods. Methods in Molecular Biology. Vol. 1079. pp. 59–73. arXiv:1211.2160. doi:10.1007/978-1-62703-646-7_4. ISBN 978-1-62703-645-0. PMID 24170395. S2CID 2363657.

inferring, horizontal, gene, transfer, horizontal, lateral, gene, transfer, transmission, portions, genomic, between, organisms, through, process, decoupled, from, vertical, inheritance, presence, events, different, fragments, genome, result, different, evolut. Horizontal or lateral gene transfer HGT or LGT is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance In the presence of HGT events different fragments of the genome are the result of different evolutionary histories This can therefore complicate investigations of the evolutionary relatedness of lineages and species Also as HGT can bring into genomes radically different genotypes from distant lineages or even new genes bearing new functions it is a major source of phenotypic innovation and a mechanism of niche adaptation For example of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants leading to the emergence of pathogenic lineages 1 Inferring horizontal gene transfer through computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes Sequence composition based parametric methods search for deviations from the genomic average whereas evolutionary history based phylogenetic approaches identify genes whose evolutionary history significantly differs from that of the host species The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes for which the true history is known On real data different methods tend to infer different HGT events and as a result it can be difficult to ascertain all but simple and clear cut HGT events Contents 1 Overview 2 Parametric methods 2 1 Nucleotide composition 2 2 Oligonucleotide spectrum 2 3 Structural features 2 4 Genomic context 3 Phylogenetic methods 3 1 Explicit phylogenetic methods 3 1 1 Tests of topologies 3 1 2 Genome spectral approaches 3 1 3 Subtree pruning and regrafting 3 1 4 Model based reconciliation methods 3 2 Implicit phylogenetic methods 3 2 1 Top sequence match in a distant species 3 2 2 Discrepancy between gene and species distances 3 2 3 Phylogenetic profiles 3 2 4 Clusters of polymorphic sites 4 Evaluation 5 See also 6 ReferencesOverview edit nbsp Conceptual overview of HGT inference methods 1 Parametric methods infer HGT by computing a statistic here GC content for a sliding window and comparing it to the typical range over the entire genome here indicated between the two red horizontal lines Regions with atypical values are inferred as having been horizontally transferred 2 Phylogenetic approaches rely on the differences between genes and species tree evolution that result from HGT Explicit phylogenetic methods reconstruct gene trees and infer the HGT events likely to have resulted into that particular gene tree Implicit phylogenetic methods bypass gene tree reconstruction e g by looking at discrepancies between pairwise distances between genes and their corresponding species Horizontal gene transfer was first observed in 1928 in Frederick Griffith s experiment showing that virulence was able to pass from virulent to non virulent strains of Streptococcus pneumoniae Griffith demonstrated that genetic information can be horizontally transferred between bacteria via a mechanism known as transformation 2 Similar observations in the 1940s 3 and 1950s 4 showed evidence that conjugation and transduction are additional mechanisms of horizontal gene transfer 5 To infer HGT events which may not necessarily result in phenotypic changes most contemporary methods are based on analyses of genomic sequence data These methods can be broadly separated into two groups parametric and phylogenetic methods Parametric methods search for sections of a genome that significantly differ from the genomic average such as GC content or codon usage 6 Phylogenetic methods examine evolutionary histories of genes involved and identify conflicting phylogenies Phylogenetic methods can be further divided into those that reconstruct and compare phylogenetic trees explicitly and those that use surrogate measures in place of the phylogenetic trees 7 The main feature of parametric methods is that they only rely on the genome under study to infer HGT events that may have occurred on its lineage It has been a considerable advantage at the early times of the sequencing era when few closely related genomes were available for comparative methods However because they rely on the uniformity of the host s signature to infer HGT events not accounting for the host s intra genomic variability will result in overpredictions flagging native segments as possible HGT events 8 Similarly the transferred segments need to exhibit the donor s signature and to be significantly different from the recipient s 6 Furthermore genomic segments of foreign origin are subject to the same mutational processes as the rest of the host genome and so the difference between the two tends to vanish over time a process referred to as amelioration 9 This limits the ability of parametric methods to detect ancient HGTs Phylogenetic methods benefit from the recent availability of many sequenced genomes Indeed as for all comparative methods phylogenetic methods can integrate information from multiple genomes and in particular integrate them using a model of evolution This lends them the ability to better characterize the HGT events they infer notably by designating the donor species and time of the transfer However models have limits and need to be used cautiously For instance the conflicting phylogenies can be the result of events not accounted for by the model such as unrecognized paralogy due to duplication followed by gene losses Also many approaches rely on a reference species tree that is supposed to be known when in many instances it can be difficult to obtain a reliable tree Finally the computational costs of reconstructing many gene species trees can be prohibitively expensive Phylogenetic methods tend to be applied to genes or protein sequences as basic evolutionary units which limits their ability to detect HGT in regions outside or across gene boundaries Because of their complementary approaches and often non overlapping sets of HGT candidates combining predictions from parametric and phylogenetic methods can yield a more comprehensive set of HGT candidate genes Indeed combining different parametric methods has been reported to significantly improve the quality of predictions 10 11 Moreover in the absence of a comprehensive set of true horizontally transferred genes discrepancies between different methods 12 13 might be resolved through combining parametric and phylogenetic methods However combining inferences from multiple methods also entails a risk of an increased false positive rate 14 Parametric methods editParametric methods to infer HGT use characteristics of the genome sequence specific to particular species or clades also called genomic signatures If a fragment of the genome strongly deviates from the genomic signature this is a sign of a potential horizontal transfer For example because bacterial GC content falls within a wide range GC content of a genome segment is a simple genomic signature Commonly used genomic signatures include nucleotide composition 15 oligonucleotide frequencies 16 or structural features of the genome 17 To detect HGT using parametric methods the host s genomic signature needs to be clearly recognizable However the host s genome is not always uniform with respect to the genome signature for example GC content of the third codon position is lower close to the replication terminus 18 and GC content tends to be higher in highly expressed genes 19 Not accounting for such intra genomic variability in the host can result in over predictions flagging native segments as HGT candidates 8 Larger sliding windows can account for this variability at the cost of a reduced ability to detect smaller HGT regions 12 Just as importantly horizontally transferred segments need to exhibit the donor s genomic signature This might not be the case for ancient transfers where transferred sequences are subjected to the same mutational processes as the rest of the host genome potentially causing their distinct signatures to ameliorate 9 and become undetectable through parametric methods For example Bdellovibrio bacteriovorus a predatory d Proteobacterium has homogeneous GC content and it might be concluded that its genome is resistant to HGT 20 However subsequent analysis using phylogenetic methods identified a number of ancient HGT events in the genome of B bacteriovorus 21 Similarly if the inserted segment was previously ameliorated to the host s genome as is the case for prophage insertions 22 parametric methods might miss predicting these HGT events Also the donor s composition must significantly differ from the recipient s to be identified as abnormal a condition that might be missed in the case of short to medium distance HGT which are the most prevalent Furthermore it has been reported that recently acquired genes tend to be AT richer than the recipient s average 15 which indicates that differences in GC content signature may result from unknown post acquisition mutational processes rather than from the donor s genome Nucleotide composition edit nbsp Average GC content of coding regions compared to the genome size for selected bacteria There is considerable variation in average GC content across species which makes it relevant as a genomic signature Bacterial GC content falls within a wide range with Ca Zinderia insecticola having a GC content of 13 5 23 and Anaeromyxobacter dehalogenans having a GC content of 75 24 Even within a closely related group of a Proteobacteria values range from approximately 30 to 65 25 These differences can be exploited when detecting HGT events as a significantly different GC content for a genome segment can be an indication of foreign origin 15 Oligonucleotide spectrum edit The oligonucleotide spectrum or k mer frequencies measures the frequency of all possible nucleotide sequences of a particular length in the genome It tends to vary less within genomes than between genomes and therefore can also be used as a genomic signature 26 A deviation from this signature suggests that a genomic segment might have arrived through horizontal transfer The oligonucleotide spectrum owes much of its discriminatory power to the number of possible oligonucleotides if n is the size of the vocabulary and w is oligonucleotide size the number of possible distinct oligonucleotides is nw for example there are 45 1024 possible pentanucleotides Some methods can capture the signal recorded in motifs of variable size 27 thus capturing both rare and discriminative motifs along with frequent but more common ones Codon usage bias a measure related to codon frequencies was one of the first detection methods used in methodical assessments of HGT 16 This approach requires a host genome which contains a bias towards certain synonymous codons different codons which code for the same amino acid which is clearly distinct from the bias found within the donor genome The simplest oligonucleotide used as a genomic signature is the dinucleotide for example the third nucleotide in a codon and the first nucleotide in the following codon represent the dinucleotide least restricted by amino acid preference and codon usage 28 It is important to optimise the size of the sliding window in which to count the oligonucleotide frequency a larger sliding window will better buffer variability in the host genome at the cost of being worse at detecting smaller HGT regions 29 A good compromise has been reported using tetranucleotide frequencies in a sliding window of 5 kb with a step of 0 5kb 30 A convenient method of modelling oligonucleotide genomic signatures is to use Markov chains The transition probability matrix can be derived for endogenous vs acquired genes 31 from which Bayesian posterior probabilities for particular stretches of DNA can be obtained 32 Structural features edit Just as the nucleotide composition of a DNA molecule can be represented by a sequence of letters its structural features can be encoded in a numerical sequence The structural features include interaction energies between neighbouring base pairs 33 the angle of twist that makes two bases of a pair non coplanar 34 or DNA deformability induced by the proteins shaping the chromatin 35 The autocorrelation analysis of some of these numerical sequences show characteristic periodicities in complete genomes 36 In fact after detecting archaea like regions in the thermophilic bacteria Thermotoga maritima 37 periodicity spectra of these regions were compared to the periodicity spectra of the homologous regions in the archaea Pyrococcus horikoshii 17 The revealed similarities in the periodicity were strong supporting evidence for a case of massive HGT between the bacteria and the archaea kingdoms 17 Genomic context edit The existence of genomic islands short typically 10 200kb long regions of a genome which have been acquired horizontally lends support to the ability to identify non native genes by their location in a genome 38 For example a gene of ambiguous origin which forms part of a non native operon could be considered to be non native Alternatively flanking repeat sequences or the presence of nearby integrases or transposases can indicate a non native region 39 A machine learning approach combining oligonucleotide frequency scans with context information was reported to be effective at identifying genomic islands 40 In another study the context was used as a secondary indicator after removal of genes which are strongly thought to be native or non native through the use of other parametric methods 10 Phylogenetic methods editThe use of phylogenetic analysis in the detection of HGT was advanced by the availability of many newly sequenced genomes Phylogenetic methods detect inconsistencies in gene and species evolutionary history in two ways explicitly by reconstructing the gene tree and reconciling it with the reference species tree or implicitly by examining aspects that correlate with the evolutionary history of the genes in question e g patterns of presence absence across species or unexpectedly short or distant pairwise evolutionary distances Explicit phylogenetic methods edit The aim of explicit phylogenetic methods is to compare gene trees with their associated species trees While weakly supported differences between gene and species trees can be due to inference uncertainty statistically significant differences can be suggestive of HGT events For example if two genes from different species share the most recent ancestral connecting node in the gene tree but the respective species are spaced apart in the species tree an HGT event can be invoked Such an approach can produce more detailed results than parametric approaches because the involved species time and direction of transfer can potentially be identified As discussed in more detail below phylogenetic methods range from simple methods merely identifying discordance between gene and species trees to mechanistic models inferring probable sequences of HGT events An intermediate strategy entails deconstructing the gene tree into smaller parts until each matches the species tree genome spectral approaches Explicit phylogenetic methods rely upon the accuracy of the input rooted gene and species trees yet these can be challenging to build 41 Even when there is no doubt in the input trees the conflicting phylogenies can be the result of evolutionary processes other than HGT such as duplications and losses causing these methods to erroneously infer HGT events when paralogy is the correct explanation Similarly in the presence of incomplete lineage sorting explicit phylogeny methods can erroneously infer HGT events 42 That is why some explicit model based methods test multiple evolutionary scenarios involving different kinds of events and compare their fit to the data given parsimonious or probabilistic criteria Tests of topologies edit To detect sets of genes that fit poorly to the reference tree one can use statistical tests of topology such as the Kishino Hasegawa KH 43 Shimodaira Hasegawa SH 44 and Approximately Unbiased AU 45 tests These tests assess the likelihood of the gene sequence alignment when the reference topology is given as the null hypothesis The rejection of the reference topology is an indication that the evolutionary history for that gene family is inconsistent with the reference tree When these inconsistencies cannot be explained using a small number of non horizontal events such as gene loss and duplication an HGT event is inferred One such analysis checked for HGT in groups of homologs of the g Proteobacterial lineage 46 Six reference trees were reconstructed using either the highly conserved small subunit ribosomal RNA sequences a consensus of the available gene trees or concatenated alignments of orthologs The failure to reject the six evaluated topologies and the rejection of seven alternative topologies was interpreted as evidence for a small number of HGT events in the selected groups Tests of topology identify differences in tree topology taking into account the uncertainty in tree inference but they make no attempt at inferring how the differences came about To infer the specifics of particular events genome spectral or subtree pruning and regraft methods are required Genome spectral approaches edit In order to identify the location of HGT events genome spectral approaches decompose a gene tree into substructures such as bipartitions or quartets and identify those that are consistent or inconsistent with the species tree Bipartitions Removing one edge from a reference tree produces two unconnected sub trees each a disjoint set of nodes a bipartition If a bipartition is present in both the gene and the species trees it is compatible otherwise it is conflicting These conflicts can indicate an HGT event or may be the result of uncertainty in gene tree inference To reduce uncertainty bipartition analyses typically focus on strongly supported bipartitions such as those associated with branches with bootstrap values or posterior probabilities above certain thresholds Any gene family found to have one or several conflicting but strongly supported bipartitions is considered as an HGT candidate 47 48 49 Quartet decomposition Quartets are trees consisting of four leaves In bifurcating fully resolved trees each internal branch induces a quartet whose leaves are either subtrees of the original tree or actual leaves of the original tree If the topology of a quartet extracted from the reference species tree is embedded in the gene tree the quartet is compatible with the gene tree Conversely incompatible strongly supported quartets indicate potential HGT events 50 Quartet mapping methods are much more computationally efficient and naturally handle heterogeneous representation of taxa among gene families making them a good basis for developing large scale scans for HGT looking for highways of gene sharing in databases of hundreds of complete genomes 51 52 Subtree pruning and regrafting edit A mechanistic way of modelling an HGT event on the reference tree is to first cut an internal branch i e prune the tree and then regraft it onto another edge an operation referred to as subtree pruning and regrafting SPR 53 If the gene tree was topologically consistent with the original reference tree the editing results in an inconsistency Similarly when the original gene tree is inconsistent with the reference tree it is possible to obtain a consistent topology by a series of one or more prune and regraft operations applied to the reference tree By interpreting the edit path of pruning and regrafting HGT candidate nodes can be flagged and the host and donor genomes inferred 49 48 54 To avoid reporting false positive HGT events due to uncertain gene tree topologies the optimal path of SPR operations can be chosen among multiple possible combinations by considering the branch support in the gene tree Weakly supported gene tree edges can be ignored a priori 55 or the support can be used to compute an optimality criterion 49 56 57 58 Because conversion of one tree to another by a minimum number of SPR operations is NP Hard 59 solving the problem becomes considerably more difficult as more nodes are considered The computational challenge lies in finding the optimal edit path i e the one that requires the fewest steps 60 61 and different strategies are used in solving the problem For example the HorizStory algorithm reduces the problem by first eliminating the consistent nodes 62 recursive pruning and regrafting reconciles the reference tree with the gene tree and optimal edits are interpreted as HGT events The SPR methods included in the supertree reconstruction package SPRSupertrees substantially decrease the time of the search for the optimal set of SPR operations by considering multiple localised sub problems in large trees through a clustering approach 63 The T REX webserver includes a number of HGT detection methods 56 mostly SPR based and allows users to calculate the bootstrap support of the inferred transfers 49 Model based reconciliation methods edit Reconciliation of gene and species trees entails mapping evolutionary events onto gene trees in a way that makes them concordant with the species tree Different reconciliation models exist differing in the types of event they consider to explain the incongruences between gene and species tree topologies Early methods exclusively modelled horizontal transfers T 53 57 56 More recent ones also account for duplication D loss L incomplete lineage sorting ILS or homologous recombination HR events The difficulty is that by allowing for multiple types of events the number of possible reconciliations increases rapidly For instance a conflicting gene tree topologies might be explained in terms of a single HGT event or multiple duplication and loss events Both alternatives can be considered plausible reconciliation depending on the frequency of these respective events along the species tree Reconciliation methods can rely on a parsimonious or a probabilistic framework to infer the most likely scenario s where the relative cost probability of D T L events can be fixed a priori or estimated from the data 64 The space of DTL reconciliations and their parsimony costs which can be extremely vast for large multi copy gene family trees can be efficiently explored through dynamic programming algorithms 64 65 66 In some programs the gene tree topology can be refined where it was uncertain to fit a better evolutionary scenario as well as the initial sequence alignment 65 67 68 More refined models account for the biased frequency of HGT between closely related lineages 69 reflecting the loss of efficiency of HR with phylogenetic distance 70 for ILS 71 or for the fact that the actual donor of most HGT belong to extinct or unsampled lineages 72 Further extensions of DTL models are being developed towards an integrated description of the genome evolution processes In particular some of them consider horizontal at multiple scales modelling independent evolution of gene fragments 73 or recognising co evolution of several genes e g due to co transfer within and across genomes 74 75 76 Implicit phylogenetic methods edit In contrast to explicit phylogenetic methods which compare the agreement between gene and species trees implicit phylogenetic methods compare evolutionary distances or sequence similarity Here an unexpectedly short or long distance from a given reference compared to the average can be suggestive of an HGT event Because tree construction is not required implicit approaches tend to be simpler and faster than explicit methods However implicit methods can be limited by disparities between the underlying correct phylogeny and the evolutionary distances considered For instance the most similar sequence as obtained by the highest scoring BLAST hit is not always the evolutionarily closest one 77 Top sequence match in a distant species edit A simple way of identifying HGT events is by looking for high scoring sequence matches in distantly related species For example an analysis of the top BLAST hits of protein sequences in the bacteria Thermotoga maritima revealed that most hits were in archaea rather than closely related bacteria suggesting extensive HGT between the two 37 these predictions were later supported by an analysis of the structural features of the DNA molecule 17 However this method is limited to detecting relatively recent HGT events Indeed if the HGT occurred in the common ancestor of two or more species included in the database the closest hit will reside within that clade and therefore the HGT will not be detected by the method Thus the threshold of the minimum number of foreign top BLAST hits to observe to decide a gene was transferred is highly dependent on the taxonomic coverage of sequence databases Therefore experimental settings may need to be defined in an ad hoc way 78 Discrepancy between gene and species distances edit The molecular clock hypothesis posits that homologous genes evolve at an approximately constant rate across different species 79 If one only considers homologous genes related through speciation events referred to as orthologous genes their underlying tree should by definition correspond to the species tree Therefore assuming a molecular clock the evolutionary distance between orthologous genes should be approximately proportional to the evolutionary distances between their respective species If a putative group of orthologs contains xenologs pairs of genes related through an HGT the proportionality of evolutionary distances may only hold among the orthologs not the xenologs 80 Simple approaches compare the distribution of similarity scores of particular sequences and their orthologous counterparts in other species HGT are inferred from outliers 81 82 The more sophisticated DLIGHT Distance Likelihood based Inference of Genes Horizontally Transferred method considers simultaneously the effect of HGT on all sequences within groups of putative orthologs 7 if a likelihood ratio test of the HGT hypothesis versus a hypothesis of no HGT is significant a putative HGT event is inferred In addition the method allows inference of potential donor and recipient species and provides an estimation of the time since the HGT event Phylogenetic profiles edit A group of orthologous or homologous genes can be analysed in terms of the presence or absence of group members in the reference genomes such patterns are called phylogenetic profiles 83 To find HGT events phylogenetic profiles are scanned for an unusual distribution of genes Absence of a homolog in some members of a group of closely related species is an indication that the examined gene might have arrived via an HGT event For example the three facultatively symbiotic Frankia sp strains are of strikingly different sizes 5 43 Mbp 7 50 Mbp and 9 04 Mbp depending on their range of hosts 84 Marked portions of strain specific genes were found to have no significant hit in the reference database and were possibly acquired by HGT transfers from other bacteria Similarly the three phenotypically diverse Escherichia coli strains uropathogenic enterohemorrhagic and benign share about 40 of the total combined gene pool with the other 60 being strain specific genes and consequently HGT candidates 85 Further evidence for these genes resulting from HGT was their strikingly different codon usage patterns from the core genes and a lack of gene order conservation order conservation is typical of vertically evolved genes 85 The presence absence of homologs or their effective count can thus be used by programs to reconstruct the most likely evolutionary scenario along the species tree Just as with reconciliation methods this can be achieved through parsimonious 86 or probabilistic estimation of the number of gain and loss events 87 88 Models can be complexified by adding processes like the truncation of genes 89 but also by modelling the heterogeneity of rates of gain and loss across lineages 90 and or gene families 88 91 Clusters of polymorphic sites edit Genes are commonly regarded as the basic units transferred through an HGT event However it is also possible for HGT to occur within genes For example it has been shown that horizontal transfer between closely related species results in more exchange of ORF fragments 92 93 a type a transfer called gene conversion mediated by homologous recombination The analysis of a group of four Escherichia coli and two Shigella flexneri strains revealed that the sequence stretches common to all six strains contain polymorphic sites consequences of homologous recombination 94 Clusters of excess of polymorphic sites can thus be used to detect tracks of DNA recombined with a distant relative 95 This method of detection is however restricted to the sites in common to all analysed sequences limiting the analysis to a group of closely related organisms Evaluation editThe existence of the numerous and varied methods to infer HGT raises the question of how to validate individual inferences and of how to compare the different methods A main problem is that as with other types of phylogenetic inferences the actual evolutionary history cannot be established with certainty As a result it is difficult to obtain a representative test set of HGT events Furthermore HGT inference methods vary considerably in the information they consider and often identify inconsistent groups of HGT candidates 6 96 it is not clear to what extent taking the intersection the union or some other combination of the individual methods affects the false positive and false negative rates 14 Parametric and phylogenetic methods draw on different sources of information it is therefore difficult to make general statements about their relative performance Conceptual arguments can however be invoked While parametric methods are limited to the analysis of single or pairs of genomes phylogenetic methods provide a natural framework to take advantage of the information contained in multiple genomes In many cases segments of genomes inferred as HGT based on their anomalous composition can also be recognised as such on the basis of phylogenetic analyses or through their mere absence in genomes of related organisms In addition phylogenetic methods rely on explicit models of sequence evolution which provide a well understood framework for parameter inference hypothesis testing and model selection This is reflected in the literature which tends to favour phylogenetic methods as the standard of proof for HGT 97 98 99 100 The use of phylogenetic methods thus appears to be the preferred standard especially given that the increase in computational power coupled with algorithmic improvements has made them more tractable 63 72 and that the ever denser sampling of genomes lends more power to these tests Considering phylogenetic methods several approaches to validating individual HGT inferences and benchmarking methods have been adopted typically relying on various forms of simulation Because the truth is known in simulation the number of false positives and the number of false negatives are straightforward to compute However simulating data do not trivially resolve the problem because the true extent of HGT in nature remains largely unknown and specifying rates of HGT in the simulated model is always hasardous Nonetheless studies involving the comparison of several phylogenetic methods in a simulation framework could provide quantitative assessment of their respective performances and thus help the biologist in choosing objectively proper tools 58 Standard tools to simulate sequence evolution along trees such as INDELible 101 or PhyloSim 102 can be adapted to simulate HGT HGT events cause the relevant gene trees to conflict with the species tree Such HGT events can be simulated through subtree pruning and regrafting rearrangements of the species tree 55 However it is important to simulate data that are realistic enough to be representative of the challenge provided by real datasets and simulation under complex models are thus preferable A model was developed to simulate gene trees with heterogeneous substitution processes in addition to the occurrence of transfer and accounting for the fact that transfer can come from now extinct donor lineages 103 Alternatively the genome evolution simulator ALF 104 directly generates gene families subject to HGT by accounting for a whole range of evolutionary forces at the base level but in the context of a complete genome Given simulated sequences which have HGT analysis of those sequences using the methods of interest and comparison of their results with the known truth permits study of their performance Similarly testing the methods on sequence known not to have HGT enables the study of false positive rates Simulation of HGT events can also be performed by manipulating the biological sequences themselves Artificial chimeric genomes can be obtained by inserting known foreign genes into random positions of a host genome 12 105 106 107 The donor sequences are inserted into the host unchanged or can be further evolved by simulation 7 e g using the tools described above One important caveat to simulation as a way to assess different methods is that simulation is based on strong simplifying assumptions which may favour particular methods 108 See also editIndex of evolutionary biology articles Horizontal gene transfer Horizontal gene transfer in evolution Phylogenetic tree Phylogenetic network Bioinformatics Comparative genomics Homology biology References edit nbsp This article was adapted from the following source under a CC BY 4 0 license 2015 reviewer reports Matt Ravenhall Nives Skunca Florent Lassalle Christophe Dessimoz May 2015 Inferring horizontal gene transfer PLOS Computational Biology 11 5 e1004095 doi 10 1371 JOURNAL PCBI 1004095 ISSN 1553 734X PMC 4462595 PMID 26020646 Wikidata Q21045419 a href Template Cite journal html title Template Cite journal cite journal a CS1 maint unflagged free DOI link Hiramatsu K Cui L Kuroda M Ito T October 2001 The emergence and evolution of methicillin resistant Staphylococcus aureus Trends in Microbiology 9 10 486 93 doi 10 1016 s0966 842x 01 02175 8 PMID 11597450 Griffith F January 1928 The Significance of Pneumococcal Types The Journal of Hygiene 27 2 113 59 doi 10 1017 s0022172400031879 PMC 2167760 PMID 20474956 Tatum EL Lederberg J June 1947 Gene Recombination in the Bacterium Escherichia coli Journal of Bacteriology 53 6 673 84 doi 10 1128 JB 53 6 673 684 1947 PMC 518375 PMID 16561324 Zinder ND Lederberg J November 1952 Genetic exchange in Salmonella Journal of Bacteriology 64 5 679 99 doi 10 1128 JB 64 5 679 699 1952 PMC 169409 PMID 12999698 Jones D Sneath PH March 1970 Genetic transfer and bacterial taxonomy Bacteriological Reviews 34 1 40 81 doi 10 1128 MMBR 34 1 40 81 1970 PMC 378348 PMID 4909647 a b c Lawrence JG Ochman H January 2002 Reconciling the many faces of lateral gene transfer Trends in Microbiology 10 1 1 4 doi 10 1016 s0966 842x 01 02282 x PMID 11755071 a b c Dessimoz C Margadant D Gonnet GH 2008 DLIGHT Lateral Gene Transfer Detection Using Pairwise Evolutionary Distances in a Statistical Framework Research in Computational Molecular Biology Lecture Notes in Computer Science Vol 4955 pp 315 330 doi 10 1007 978 3 540 78839 3 27 ISBN 978 3 540 78838 6 S2CID 12776750 a b Guindon S Perriere G September 2001 Intragenomic base content variation is a potential source of biases when searching for horizontally transferred genes Molecular Biology and Evolution 18 9 1838 40 doi 10 1093 oxfordjournals molbev a003972 PMID 11504864 a b Lawrence JG Ochman H April 1997 Amelioration of bacterial genomes rates of change and exchange Journal of Molecular Evolution 44 4 383 97 Bibcode 1997JMolE 44 383L CiteSeerX 10 1 1 590 7214 doi 10 1007 pl00006158 PMID 9089078 S2CID 7928957 a b Azad RK Lawrence JG May 2011 Towards more robust methods of alien gene detection Nucleic Acids Research 39 9 e56 doi 10 1093 nar gkr059 PMC 3089488 PMID 21297116 Xiong D Xiao F Liu L Hu K Tan Y He S Gao X 2012 Towards a better detection of horizontally transferred genes by combining unusual properties effectively PLOS ONE 7 8 e43126 Bibcode 2012PLoSO 743126X doi 10 1371 journal pone 0043126 PMC 3419211 PMID 22905214 a b c Becq J Churlaud C Deschavanne P April 2010 A benchmark of parametric methods for horizontal transfers detection PLOS ONE 5 4 e9989 Bibcode 2010PLoSO 5 9989B doi 10 1371 journal pone 0009989 PMC 2848678 PMID 20376325 Poptsova M 2009 Testing Phylogenetic Methods to Identify Horizontal Gene Transfer Horizontal Gene Transfer Methods in Molecular Biology Vol 532 pp 227 40 doi 10 1007 978 1 60327 853 9 13 ISBN 978 1 60327 852 2 PMID 19271188 a b Poptsova MS Gogarten JP March 2007 The power of phylogenetic approaches to detect horizontally transferred genes BMC Evolutionary Biology 7 1 45 Bibcode 2007BMCEE 7 45P doi 10 1186 1471 2148 7 45 PMC 1847511 PMID 17376230 a b c Daubin V Lerat E Perriere G 2003 The source of laterally transferred genes in bacterial genomes Genome Biology 4 9 R57 doi 10 1186 gb 2003 4 9 r57 PMC 193657 PMID 12952536 a b Lawrence JG Ochman H August 1998 Molecular archaeology of the Escherichia coli genome Proceedings of the National Academy of Sciences of the United States of America 95 16 9413 7 Bibcode 1998PNAS 95 9413L doi 10 1073 pnas 95 16 9413 PMC 21352 PMID 9689094 a b c d Worning P Jensen LJ Nelson KE Brunak S Ussery DW February 2000 Structural analysis of DNA sequence evidence for lateral gene transfer in Thermotoga maritima Nucleic Acids Research 28 3 706 9 doi 10 1093 nar 28 3 706 PMC 102551 PMID 10637321 Deschavanne P Filipski J April 1995 Correlation of GC content with replication timing and repair mechanisms in weakly expressed E coli genes Nucleic Acids Research 23 8 1350 3 doi 10 1093 nar 23 8 1350 PMC 306860 PMID 7753625 Wuitschick JD Karrer KM 1999 Analysis of genomic G C content codon usage initiator codon context and translation termination sites in Tetrahymena thermophila The Journal of Eukaryotic Microbiology 46 3 239 47 doi 10 1111 j 1550 7408 1999 tb05120 x PMID 10377985 S2CID 28836138 Rendulic S Jagtap P Rosinus A Eppinger M Baar C Lanz C et al January 2004 A predator unmasked life cycle of Bdellovibrio bacteriovorus from a genomic perspective Science 303 5658 689 92 Bibcode 2004Sci 303 689R doi 10 1126 science 1093027 PMID 14752164 S2CID 38154836 Gophna U Charlebois RL Doolittle WF February 2006 Ancient lateral gene transfer in the evolution of Bdellovibrio bacteriovorus Trends in Microbiology 14 2 64 9 doi 10 1016 j tim 2005 12 008 PMID 16413191 Vernikos GS Thomson NR Parkhill J 2007 Genetic flux over time in the Salmonella lineage Genome Biology 8 6 R100 doi 10 1186 gb 2007 8 6 r100 PMC 2394748 PMID 17547764 McCutcheon JP Moran NA 2010 Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution Genome Biology and Evolution 2 708 18 doi 10 1093 gbe evq055 PMC 2953269 PMID 20829280 Liu Z Venkatesh SS Maley CC October 2008 Sequence space coverage entropy of genomes and the potential to detect non human DNA in human samples BMC Genomics 9 509 doi 10 1186 1471 2164 9 509 PMC 2628393 PMID 18973670 Bentley SD Parkhill J 2004 Comparative genomic structure of prokaryotes Annual Review of Genetics 38 771 92 doi 10 1146 annurev genet 38 072902 094318 PMID 15568993 S2CID 5524251 Karlin S Burge C July 1995 Dinucleotide relative abundance extremes a genomic signature Trends in Genetics 11 7 283 90 doi 10 1016 S0168 9525 00 89076 9 PMID 7482779 Vernikos GS Parkhill J September 2006 Interpolated variable order motifs for identification of horizontally acquired DNA revisiting the Salmonella pathogenicity islands Bioinformatics 22 18 2196 203 doi 10 1093 bioinformatics btl369 PMID 16837528 Hooper SD Berg OG March 2002 Detection of genes with atypical nucleotide sequence in microbial genomes Journal of Molecular Evolution 54 3 365 75 Bibcode 2002JMolE 54 365H doi 10 1007 s00239 001 0051 8 PMID 11847562 S2CID 6872232 Deschavanne PJ Giron A Vilain J Fagot G Fertil B October 1999 Genomic signature characterization and classification of species assessed by chaos game representation of sequences Molecular Biology and Evolution 16 10 1391 9 doi 10 1093 oxfordjournals molbev a026048 PMID 10563018 Dufraigne C Fertil B Lespinats S Giron A Deschavanne P January 2005 Detection and characterization of horizontal transfers in prokaryotes using genomic signature Nucleic Acids Research 33 1 e6 doi 10 1093 nar gni004 PMC 546175 PMID 15653627 Cortez D Forterre P Gribaldo S 2009 A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes Genome Biology 10 6 R65 doi 10 1186 gb 2009 10 6 r65 PMC 2718499 PMID 19531232 Nakamura Y Itoh T Matsuda H Gojobori T July 2004 Biased biological functions of horizontally transferred genes in prokaryotic genomes Nature Genetics 36 7 760 6 doi 10 1038 ng1381 PMID 15208628 Ornstein RL Rein R October 1978 An optimized potential function for the calculation of nucleic acid interaction energies I base stacking Biopolymers 17 10 2341 60 doi 10 1002 bip 1978 360171005 PMID 24624489 S2CID 13063636 el Hassan MA Calladine CR May 1996 Propeller twisting of base pairs and the conformational mobility of dinucleotide steps in DNA Journal of Molecular Biology 259 1 95 103 doi 10 1006 jmbi 1996 0304 PMID 8648652 Olson WK Gorin AA Lu XJ Hock LM Zhurkin VB September 1998 DNA sequence dependent deformability deduced from protein DNA crystal complexes Proceedings of the National Academy of Sciences of the United States of America 95 19 11163 8 Bibcode 1998PNAS 9511163O doi 10 1073 pnas 95 19 11163 PMC 21613 PMID 9736707 Herzel H Weiss O Trifonov EN March 1999 10 11 bp periodicities in complete genomes reflect protein structure and DNA folding Bioinformatics 15 3 187 93 doi 10 1093 bioinformatics 15 3 187 PMID 10222405 a b Nelson KE Clayton RA Gill SR Gwinn ML Dodson RJ Haft DH et al May 1999 Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima Nature 399 6734 323 9 Bibcode 1999Natur 399 323N doi 10 1038 20601 PMID 10360571 S2CID 4420157 Langille MG Hsiao WW Brinkman FS May 2010 Detecting genomic islands using bioinformatics approaches Nature Reviews Microbiology 8 5 373 82 doi 10 1038 nrmicro2350 PMID 20395967 S2CID 2373228 Hacker J Blum Oehler G Muhldorfer I Tschape H March 1997 Pathogenicity islands of virulent bacteria structure function and impact on microbial evolution Molecular Microbiology 23 6 1089 97 doi 10 1046 j 1365 2958 1997 3101672 x PMID 9106201 S2CID 27524815 Vernikos GS Parkhill J February 2008 Resolving the structural features of genomic islands a machine learning approach Genome Research 18 2 331 42 doi 10 1101 gr 7004508 PMC 2203631 PMID 18071028 Altenhoff AM Dessimoz C 2012 Inferring Orthology and Paralogy PDF Evolutionary Genomics Methods in Molecular Biology Vol 855 Totowa NJ Humana Press pp 259 79 doi 10 1007 978 1 61779 582 4 9 ISBN 978 1 61779 581 7 PMID 22407712 Than C Ruths D Innan H Nakhleh L May 2007 Confounding factors in HGT detection statistical error coalescent effects and multiple solutions Journal of Computational Biology 14 4 517 35 CiteSeerX 10 1 1 121 7834 doi 10 1089 cmb 2007 A010 PMID 17572027 Goldman N Anderson JP Rodrigo AG December 2000 Likelihood based tests of topologies in phylogenetics Systematic Biology 49 4 652 70 doi 10 1080 106351500750049752 PMID 12116432 Shimodaira H Hasegawa M 1999 Multiple Comparisons of Log Likelihoods with Applications to Phylogenetic Inference Molecular Biology and Evolution 16 8 1114 1116 doi 10 1093 oxfordjournals molbev a026201 Shimodaira H June 2002 An approximately unbiased test of phylogenetic tree selection Systematic Biology 51 3 492 508 doi 10 1080 10635150290069913 PMID 12079646 S2CID 11586099 Lerat E Daubin V Moran NA October 2003 From gene trees to organismal phylogeny in prokaryotes the case of the gamma Proteobacteria PLOS Biology 1 1 E19 doi 10 1371 journal pbio 0000019 PMC 193605 PMID 12975657 Zhaxybayeva O Hamel L Raymond J Gogarten JP 2004 Visualization of the phylogenetic content of five genomes using dekapentagonal maps Genome Biology 5 3 R20 doi 10 1186 gb 2004 5 3 r20 PMC 395770 PMID 15003123 a b Beiko RG Harlow TJ Ragan MA October 2005 Highways of gene sharing in prokaryotes Proceedings of the National Academy of Sciences of the United States of America 102 40 14332 7 Bibcode 2005PNAS 10214332B doi 10 1073 pnas 0504068102 PMC 1242295 PMID 16176988 a b c d Boc A Philippe H Makarenkov V March 2010 Inferring and validating horizontal gene transfer events using bipartition dissimilarity Systematic Biology 59 2 Oxford University Press 195 211 doi 10 1093 sysbio syp103 PMID 20525630 Zhaxybayeva O Gogarten JP Charlebois RL Doolittle WF Papke RT September 2006 Phylogenetic analyses of cyanobacterial genomes quantification of horizontal gene transfer events Genome Research 16 9 1099 108 doi 10 1101 gr 5322306 PMC 1557764 PMID 16899658 Bansal MS Banay G Gogarten JP Shamir R September 2011 Detecting highways of horizontal gene transfer Journal of Computational Biology 18 9 1087 114 CiteSeerX 10 1 1 418 3658 doi 10 1089 cmb 2011 0066 PMID 21899418 Bansal MS Banay G Harlow TJ Gogarten JP Shamir R March 2013 Systematic inference of highways of horizontal gene transfer in prokaryotes Bioinformatics 29 5 571 9 doi 10 1093 bioinformatics btt021 PMID 23335015 a b Hallett MT Lagergren J RECOMB 2001 Montreal ACM 2001 Efficient Algorithms for Lateral Gene Transfer Problems pp 149 156 Baroni M Grunewald S Moulton V Semple C August 2005 Bounding the number of hybridisation events for a consistent evolutionary history Journal of Mathematical Biology 51 2 171 82 doi 10 1007 s00285 005 0315 9 hdl 10092 12222 PMID 15868201 S2CID 3180904 a b Beiko RG Hamilton N February 2006 Phylogenetic identification of lateral genetic transfer events BMC Evolutionary Biology 6 1 15 Bibcode 2006BMCEE 6 15B doi 10 1186 1471 2148 6 15 PMC 1431587 PMID 16472400 a b c Boc A Diallo AB Makarenkov V July 2012 T REX a web server for inferring validating and visualizing phylogenetic trees and networks Nucleic Acids Research 40 W1 Oxford University Press W573 9 doi 10 1093 nar gks485 PMC 3394261 PMID 22675075 a b Nakhleh L Ruths DA Wang L RIATA HGT A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer COCOON August 16 29 2005 Kunming 2005 a b Abby SS Tannier E Gouy M Daubin V June 2010 Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests BMC Bioinformatics 11 324 doi 10 1186 1471 2105 11 324 PMC 2905365 PMID 20550700 Hickey G Dehne F Rau Chaplin A Blouin C February 2008 SPR distance computation for unrooted trees Evolutionary Bioinformatics Online 4 17 27 doi 10 4137 ebo s419 PMC 2614206 PMID 19204804 Hein J Jiang T Wang L Zhang K 1996 On the complexity of comparing evolutionary trees Discrete Applied Mathematics 71 1 3 153 169 doi 10 1016 S0166 218X 96 00062 5 Allen BL Steel M 2001 Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees Annals of Combinatorics 5 1 15 CiteSeerX 10 1 1 24 8389 doi 10 1007 s00026 001 8006 8 S2CID 2934442 MacLeod D Charlebois RL Doolittle F Bapteste E April 2005 Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement BMC Evolutionary Biology 5 27 doi 10 1186 1471 2148 5 27 PMC 1087482 PMID 15819979 a b Whidden C Zeh N Beiko RG July 2014 Supertrees Based on the Subtree Prune and Regraft Distance Systematic Biology 63 4 566 81 doi 10 1093 sysbio syu023 PMC 4055872 PMID 24695589 a b Doyon JP Hamel S Chauve C 2012 An efficient method for exploring the space of gene tree species tree reconciliations in a probabilistic framework PDF IEEE ACM Transactions on Computational Biology and Bioinformatics 9 1 26 39 doi 10 1109 TCBB 2011 64 PMID 21464510 S2CID 2493991 a b David LA Alm EJ January 2011 Rapid evolutionary innovation during an Archaean genetic expansion PDF Nature 469 7328 93 6 Bibcode 2011Natur 469 93D doi 10 1038 nature09649 hdl 1721 1 61263 PMID 21170026 S2CID 4420725 Szollosi GJ Boussau B Abby SS Tannier E Daubin V October 2012 Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations Proceedings of the National Academy of Sciences of the United States of America 109 43 17513 8 Bibcode 2012PNAS 10917513S doi 10 1073 pnas 1202997109 PMC 3491530 PMID 23043116 Nguyen TH Ranwez V Pointet S Chifolleau AM Doyon JP Berry V April 2013 Reconciliation and local gene tree rearrangement can be of mutual profit Algorithms for Molecular Biology 8 1 12 doi 10 1186 1748 7188 8 12 PMC 3871789 PMID 23566548 Szollosi GJ Tannier E Lartillot N Daubin V May 2013 Lateral gene transfer from the dead Systematic Biology 62 3 386 97 arXiv 1211 4606 doi 10 1093 sysbio syt003 PMC 3622898 PMID 23355531 Bansal MS Alm EJ Kellis M June 2012 Efficient algorithms for the reconciliation problem with gene duplication horizontal transfer and loss Bioinformatics 28 12 i283 91 doi 10 1093 bioinformatics bts225 PMC 3371857 PMID 22689773 Majewski J Zawadzki P Pickerill P Cohan FM Dowson CG February 2000 Barriers to genetic exchange between bacterial species Streptococcus pneumoniae transformation Journal of Bacteriology 182 4 1016 23 doi 10 1128 jb 182 4 1016 1023 2000 PMC 94378 PMID 10648528 Sjostrand J Tofigh A Daubin V Arvestad L Sennblad B Lagergren J May 2014 A Bayesian method for analyzing lateral gene transfer Systematic Biology 63 3 409 20 doi 10 1093 sysbio syu007 PMID 24562812 a b Szollosi GJ Rosikiewicz W Boussau B Tannier E Daubin V November 2013 Efficient exploration of the space of reconciled gene trees Systematic Biology 62 6 901 12 arXiv 1306 2167 Bibcode 2013arXiv1306 2167S doi 10 1093 sysbio syt054 PMC 3797637 PMID 23925510 Haggerty LS Jachiet PA Hanage WP Fitzpatrick DA Lopez P O Connell MJ et al March 2014 A pluralistic account of homology adapting the models to the data Molecular Biology and Evolution 31 3 501 16 doi 10 1093 molbev mst228 PMC 3935183 PMID 24273322 Szollosi GJ Tannier E Daubin V Boussau B January 2015 The inference of gene trees with species trees Systematic Biology 64 1 e42 62 doi 10 1093 sysbio syu048 PMC 4265139 PMID 25070970 Lassalle F Planel R Penel S Chapulliot D Barbe V Dubost A et al December 2017 Ancestral Genome Estimation Reveals the History of Ecological Diversification in Agrobacterium Genome Biology and Evolution 9 12 3413 3431 doi 10 1093 gbe evx255 PMC 5739047 PMID 29220487 Duchemin W Anselmetti Y Patterson M Ponty Y Berard S Chauve C et al May 2017 DeCoSTAR Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies Genome Biology and Evolution 9 5 1312 1319 doi 10 1093 gbe evx069 PMC 5441342 PMID 28402423 Koski LB Golding GB June 2001 The closest BLAST hit is often not the nearest neighbor Journal of Molecular Evolution 52 6 540 2 Bibcode 2001JMolE 52 540K doi 10 1007 s002390010184 PMID 11443357 S2CID 24848333 Wisniewski Dye F Borziak K Khalsa Moyers G Alexandre G Sukharnikov LO Wuichet K et al December 2011 Richardson PM ed Azospirillum genomes reveal transition of bacteria from aquatic to terrestrial environments PLOS Genetics 7 12 e1002430 doi 10 1371 journal pgen 1002430 PMC 3245306 PMID 22216014 Zuckerkandl E and Pauling L B 1965 Evolutionary divergence and convergence in proteins In Bryson V and Vogel H J editors Evolving Genes and Proteins Academic Press New York pp 97 166 Novichkov PS Omelchenko MV Gelfand MS Mironov AA Wolf YI Koonin EV October 2004 Genome wide molecular clock and horizontal gene transfer in bacterial evolution Journal of Bacteriology 186 19 6575 85 doi 10 1128 JB 186 19 6575 6585 2004 PMC 516599 PMID 15375139 Lawrence JG Hartl DL July 1992 Inference of horizontal genetic transfer from molecular data an approach using the bootstrap Genetics 131 3 753 60 doi 10 1093 genetics 131 3 753 PMC 1205046 PMID 1628816 Clarke GD Beiko RG Ragan MA Charlebois RL April 2002 Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores Journal of Bacteriology 184 8 2072 80 doi 10 1128 jb 184 8 2072 2080 2002 PMC 134965 PMID 11914337 Pellegrini M Marcotte EM Thompson MJ Eisenberg D Yeates TO April 1999 Assigning protein functions by comparative genome analysis protein phylogenetic profiles Proceedings of the National Academy of Sciences of the United States of America 96 8 4285 8 Bibcode 1999PNAS 96 4285P doi 10 1073 pnas 96 8 4285 PMC 16324 PMID 10200254 Normand P Lapierre P Tisa LS Gogarten JP Alloisio N Bagnarol E et al January 2007 Genome characteristics of facultatively symbiotic Frankia sp strains reflect host range and host plant biogeography Genome Research 17 1 7 15 doi 10 1101 gr 5798407 PMC 1716269 PMID 17151343 a b Welch RA Burland V Plunkett G Redford P Roesch P Rasko D et al December 2002 Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli Proceedings of the National Academy of Sciences of the United States of America 99 26 17020 4 Bibcode 2002PNAS 9917020W doi 10 1073 pnas 252529799 PMC 139262 PMID 12471157 Csuros MS 2008 Ancestral Reconstruction by Asymmetric Wagner Parsimony over Continuous Characters and Squared Parsimony over Distributions Comparative Genomics Lecture Notes in Computer Science Vol 5267 pp 72 86 doi 10 1007 978 3 540 87989 3 6 ISBN 978 3 540 87988 6 S2CID 10717969 Pagel M October 1999 Inferring the historical patterns of biological evolution Nature 401 6756 877 84 Bibcode 1999Natur 401 877P doi 10 1038 44766 hdl 2027 42 148253 PMID 10553904 S2CID 205034365 a b Csuros M Miklos I September 2009 Streamlining and large ancestral genomes in Archaea inferred with a phylogenetic birth and death model Molecular Biology and Evolution 26 9 2087 95 doi 10 1093 molbev msp123 PMC 2726834 PMID 19570746 Hao W Golding GB September 2010 Inferring bacterial genome flux while considering truncated genes Genetics 186 1 411 26 doi 10 1534 genetics 110 118448 PMC 2940306 PMID 20551435 Hao W Golding GB May 2006 The fate of laterally transferred genes life in the fast lane to adaptation or death Genome Research 16 5 636 43 doi 10 1101 gr 4746406 PMC 1457040 PMID 16651664 Hao W Golding GB May 2008 Uncovering rate variation of lateral gene transfer during bacterial genome evolution BMC Genomics 9 235 doi 10 1186 1471 2164 9 235 PMC 2426709 PMID 18492275 Ochman H Lawrence JG Groisman EA May 2000 Lateral gene transfer and the nature of bacterial innovation Nature 405 6784 299 304 Bibcode 2000Natur 405 299O doi 10 1038 35012500 PMID 10830951 S2CID 85739173 Papke RT Koenig JE Rodriguez Valera F Doolittle WF December 2004 Frequent recombination in a saltern population of Halorubrum Science 306 5703 1928 9 Bibcode 2004Sci 306 1928P doi 10 1126 science 1103289 PMID 15591201 S2CID 21595153 Mau B Glasner JD Darling AE Perna NT 2006 Genome wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli Genome Biology 7 5 R44 doi 10 1186 gb 2006 7 5 r44 PMC 1779527 PMID 16737554 Didelot X Falush D March 2007 Inference of bacterial microevolution using multilocus sequence data Genetics 175 3 1251 66 doi 10 1534 genetics 106 063305 PMC 1840087 PMID 17151252 Ragan MA July 2001 On surrogate methods for detecting lateral gene transfer FEMS Microbiology Letters 201 2 187 91 doi 10 1111 j 1574 6968 2001 tb10755 x PMID 11470360 Ragan MA Harlow TJ Beiko RG January 2006 Do different surrogate methods detect lateral genetic transfer events of different relative ages Trends in Microbiology 14 1 4 8 doi 10 1016 j tim 2005 11 004 PMID 16356716 Kechris KJ Lin JC Bickel PJ Glazer AN June 2006 Quantitative exploration of the occurrence of lateral gene transfer by using nitrogen fixation genes as a case study Proceedings of the National Academy of Sciences of the United States of America 103 25 9584 9 Bibcode 2006PNAS 103 9584K doi 10 1073 pnas 0603534103 PMC 1480450 PMID 16769896 Moran NA Jarvik T April 2010 Lateral transfer of genes from fungi underlies carotenoid production in aphids Science 328 5978 624 7 Bibcode 2010Sci 328 624M doi 10 1126 science 1187113 PMID 20431015 S2CID 14785276 Danchin EG Rosso MN Vieira P de Almeida Engler J Coutinho PM Henrissat B Abad P October 2010 Multiple lateral gene transfers and duplications have promoted plant parasitism ability in nematodes Proceedings of the National Academy of Sciences of the United States of America 107 41 17651 6 Bibcode 2010PNAS 10717651D doi 10 1073 pnas 1008486107 PMC 2955110 PMID 20876108 Fletcher W Yang Z August 2009 INDELible a flexible simulator of biological sequence evolution Molecular Biology and Evolution 26 8 1879 88 doi 10 1093 molbev msp098 PMC 2712615 PMID 19423664 Sipos B Massingham T Jordan GE Goldman N April 2011 PhyloSim Monte Carlo simulation of sequence evolution in the R statistical computing environment BMC Bioinformatics 12 104 doi 10 1186 1471 2105 12 104 PMC 3102636 PMID 21504561 Galtier N August 2007 A model of horizontal gene transfer and the bacterial phylogeny problem Systematic Biology 56 4 633 42 doi 10 1080 10635150701546231 PMID 17661231 Dalquen DA Anisimova M Gonnet GH Dessimoz C April 2012 ALF a simulation framework for genome evolution Molecular Biology and Evolution 29 4 1115 23 doi 10 1093 molbev msr268 PMC 3341827 PMID 22160766 Cortez DQ Lazcano A Becerra A 2005 Comparative analysis of methodologies for the detection of horizontally transferred genes a reassessment of first order Markov models In Silico Biology 5 5 6 581 92 PMID 16610135 Tsirigos A Rigoutsos I 2005 A new computational method for the detection of horizontal gene transfer events Nucleic Acids Research 33 3 922 33 doi 10 1093 nar gki187 PMC 549390 PMID 15716310 Azad RK Lawrence JG November 2005 Use of artificial genomes in assessing methods for atypical gene detection PLOS Computational Biology 1 6 e56 Bibcode 2005PLSCB 1 56A doi 10 1371 journal pcbi 0010056 PMC 1282332 PMID 16292353 Iantorno S Gori K Goldman N Gil M Dessimoz C 2014 Who Watches the Watchmen An Appraisal of Benchmarks for Multiple Sequence Alignment Multiple Sequence Alignment Methods Methods in Molecular Biology Vol 1079 pp 59 73 arXiv 1211 2160 doi 10 1007 978 1 62703 646 7 4 ISBN 978 1 62703 645 0 PMID 24170395 S2CID 2363657 Retrieved from https en wikipedia org w index php title Inferring horizontal gene transfer amp oldid 1211544304, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.