Nicholas Eriksson (UC-Berkeley): Reconstruction of Phylogenetic Trees with Algebraic Invariants
We present a new method of deducing phylogenetic trees using algebraic invariants and numerical linear algebra. Unlike distance based methods, our method utilizes all information in an alignment. Simulations studies show that our algorithm is promising. It can currently handle at least 20 taxa trees and work is under way to improve it.
The algebraic invariants of a Markov model on a tree T are the algebraic relations between the joint probabilities of the observations at the leaves of T. For any Markov model, some relations are given by rank conditions on matrices built by flattening the probability distribution along a split in the tree. Although there are exponentially many coordinates to the distribution and exponentially many splits, we provide an algorithm which is polynomial time in the number of taxa for fixed sequence length.
In order to reconstruct phylogenies using algebraic invariants, we start with a multiple alignment of the $n$ sequences. From this, we build the observed distribution. Then the matrices corresponding to splits can be tested to see if they satisfy the invariants. The singular value decomposition (SVD) is used to measure how far a matrix is from having the desired rank. Although these matrices are of exponential size, their SVD can be computed effectively since they are typically very sparse.
Our algorithm looks at all n-choose-2 possible cherries of the tree and picks the one that gives the best score using the SVD. The two taxa with the best score are collapsed and the algorithm continues recursively until a tree is built.
Daniel Ford (Stanford): Probability Measures on Phylogenetic Trees
The Yule, Uniform and Comb are common probability distributions on phylogenetic trees/cladograms. They all have the property that choosing a random tree with k leaves and deleting a random leaf gives a sample from the distribution on trees with k-1 leaves. In practice evolutionary trees are longer than Yule tree but flatter than Uniform trees.
Fortunately there is a parametrised family of probabilities which smoothly interpolates between these three families. This is easy to construct and perform statistical tests with.
Akemi Kashiwada (Harvey Mudd College): The Shapley Value of Phylogenetic Trees
(joint work with Claus-Jochen Haake and Francis E. Su)
Phylogenetic trees represent theoretical evolutionary relationships among various species. Mathematically they can be described as weighted binary trees and the leaves represent the taxa being compared and the weight of edge is some notion of distance between its endpoints. If we think of phylogenetic trees as cooperative games where the worth of a coalition is the weight of the subtree spanned by its members, then we can use the Shapley value, an important game theoretic concept, to solve this game. We prove that the Shapley value of tree games is characterized by five axioms, and discuss what these mean in a biological context. We also determine the linear transformation M that gives the Shapley value in terms of the edge weights of the tree, and show how the Shapley value of a n-leaf tree game can be reconstructed from the Shapley value of its ( n–1 )-leaf subtrees. We determine the null spaces of M, which depend on the topology of the tree.
(joint work with Francis E. Su, Ruriko Yoshida)
Algorithms for reconstructing a phylogenetic tree from sequence data typically rely on first finding a distance matrix: a set of values indicating the dissimilarity for each pair of elements. However, one may choose to consider instead the dissimilarity of groups of elements. We may define an m-dissimilarity map as a function D that takes subsets of m-elements to a non-negative real number. This may be thought of as a measure of how dissimilar a set of m elements are or, in terms of the tree, as the sum of the edge weights of the subtree spanned by those leaves.
Pachter and Speyer (2004) gave theoretical conditions under which a tree may be constructed from its dissimilarity map; in particular a tree may be reconstructed from its m-dissimilarity map if n > 2m-2 . Their method proceeds by first finding the splits, recovering the topology from the splits using Buneman indices, and then using the distances and topology to determine the edge weights. In contrast, our method constructs the topology of the tree and determines the edge weights simultaneously.
We achieve this result through an analogue of the four point condition for m-dissimilarity maps. We say that a pair of leaves (i,j) is a sub-cherry in the subtree T iff i and j are in T and there is only one vertex of degree 3 on the unique path from i to j in T. We call this intermediate vertex the sub-cherry node. Our four point condition for m-dissimilarity maps locates sub-cherries in subtrees with m + 2 leaves and determines the distance from the sub-cherry node to the rest of the subtree. By locating sub-cherries and determining sub-cherry node distances, we may reconstruct both the tree and the edge weights provided that n > 2m-2.
In the case that m = 3, we have exploited certain symmetries to provide a fast algorithm for reconstructing phylogenetic trees from 3-dissimilarity maps. This algorithm has a time complexity of O(n^2). We have also written and tested a C++ implementation of this algorithm.
Phylogenetic tree reconstruction from m-dissimilarity maps may provide a more accurate approach to determining the maximum likelihood tree by utilizing the more accurate m-subtree weights as opposed to pairwise distances. As demonstrated by the case m = 3, these algorithms are computationally competitive with distance based methods and may serve as a viable alternative to neighbor joining or quartet reconstructions.
Nick D. Pattengale (New Mexico): Algorithmic Decomposition of Most Parsimonious Sets
(joint work with Bernard M.E. Moret)
Parsimony search typically finds several most parsimonious trees. Our goal is take a set of most parsimonious trees as input and algorithmically identify strongly supported evolutionary hypotheses across the set. We do not require that algorithms produce a single tree, thereby overcoming some of the limitations of consensus methods. Additionally, in contrast to most consensus methods, our method is sensitive as to how edge weights differ across the input set. Our approach has thus far yielded an interesting tree decomposition that in turn gives rise to a functional mapping between trees. We expect that the functional mapping will prove useful as a bounding technique during parsimon search itself. We also consider generalizations of the technique in which parsimony scores across the input set are not necessarily homogeneous, but are within a constant bound from most parsimonious.
Raazesh Sainudiin (Cornell): Enclosing the Maximum Likelihood of the Simplest DNA Model Evolving on Fixed Topologies: Towards a Rigorous Framework for Phylogenetic Inference
An interval extension of Felsenstein's recursive formulation for the likelihood function (Felsenstein, 1981) of the simplest Markov model of DNA evolution (Jukes and Cantor, 1969) on unrooted phylogenetic trees with a fixed topology is used to obtain rigorous enclosure(s) of the global maximum likelihood. Validated global maximizer(s) inside any compact set of the parameter space which is the set of all branch lengths of the tree are thus obtained. The algorithm is an adaptation of a widely applied global optimization method using interval analysis (Hansen, 1980) for the phylogenetic context. The method is applied to enclose the maximizer(s) and the global maximum for the simplest DNA model evolving on trees with 2, 3, and 4 leaves.
Sagi Snir (UC-Berkeley): Convex Recolorings of Strings and Trees: Definitions, Hardness Results, and Algorithms
(joint work with Shlomo Moran)
A coloring of a tree is convex if the vertices that pertain to any color induce a connected subtree; a partial coloring (which assigns colors to some of the vertices) is convex if it can be completed to a convex (total) coloring. Convex colorings of trees arise in areas such as phylogenetics, linguistics, etc. e.g., a perfect phylogenetic tree is one in which the states of each character induce a convex coloring of the tree.
When a coloring of a tree is not convex, it is desirable to know "how far" it is from a convex one, and what are the convex colorings which are closest to it. In this paper we study a natural definition of this distance–the recoloring distance, which is the minimal number of color changes at the vertices needed to make the coloring convex. We show that finding this distance is NP-hard even for a colored string (a path), and for some other interesting variants of the problem. In the positive side, we present algorithms for computing the recoloring distance under some natural generalizations of this concept: The first generalization is the uniform weighted model, where each vertex has a weight which is the cost of changing its color. The other is the non uniform model, in which the cost of coloring a vertex v by a color d is an arbitrary nonnative number cost(v, d). Our first algorithms find optimal convex recolorings of strings and bounded degree trees under the non uniform model in time which, for any fixed number of colors, is linear in the input size. Next we improve these algorithms for the uniform model to run in time which is linear in the input size for a fixed number of bad colors, which are colors which violate convexity in some natural sense. Finally, we generalize the above result to hold for trees of unbounded degree.
Mariel Vazquez (UC-Berkeley): Genome Comparison Allowing Complex Rearrangements
(joint work with Dan Levy, Rainer Sachs)
Radiation cytogenetics deals with rearrangements of the genome caused by ionizing-radiation. We have developed a mathematical framework, related to the theory of cubic multigraphs, for characterizing such rearrangements. We here apply our work to a study of comparative genomics in evolution where one considers each genome as a short sequence of synteny blocks and analyzes the rearrangements taking one sequence of blocks into the other. In traditional studies of evolution of genomes the rearrangements (apart from fusions and fissions) are usually reduced to a sequence of reactions involving just two DNA breaks. We here explore the possibility that large-scale insertions and transpositions and other rearrangements involving 3 or more breaks occur.
Ruriko Yoshida (Duke): Neighbor Joining with Subtree Weights
(joint work with Dan Levy and Lior Pachter)
The Neighbor-Joining algorithm is a recursive procedure for reconstructing phylogenetic trees that is based on a transformation of pairwise distances between leaves for identifying cherries in the tree (two nodes are a cherry if there is exactly one intermediate vertex on the path between them). We show that estimates of the weights of m-leaf subtrees are more accurate than pairwise distances, and derive a generalization of the cherry picking criterion which uses such weights. This leads to an improved neighbor-joining algorithm whose total running time is still polynomial in the number of taxa. In my presentation, I will remind an outline of The Neighbor-Joining algorithm with pairwise distances, introduced by Saitou-Nei and Studier-Keppler in 1987 and then, I will describe the Neighbor-Joining algorithm with the weights of m-leaf subtrees and a generalization of the cherry picking criterion.