Akemi Kashiwada

Harvey Mudd College Mathematics 2005

Thesis Proposal: Game Theoretic Indices for Phylogenetic Trees
Thesis Advisor: Prof. Francis E. Su
Second Reader: Prof. Stephen Adolph
E-Mail: akashiwada@hmc.edu
Senior Thesis: Constructing Phylogenetic Trees from Subsplits
Poster: Poster presented at Harvey Mudd College 2005 Presentation Days
Presentation Slides: Presentation Slides

Constructing Phylogenetic Trees from Subsplits

Phylogenetic trees represent theoretical evolutionary relationships among various species. Mathematically they can be described as weighted binary trees and the leaves represent the taxa being compared. Reconstruction of such trees from biological data is currently one of the big questions in mathematical biology. If biologists are able to separate the taxa into two distinct groups, then we call such a partition a split. In 1971, Peter Buneman published a paper showing that each split can be thought of as an edge of some tree and proved that a labeled tree T is uniquely defined by compatible splits. Since the leaves of phylogenetic trees are taxa, splits reveal which taxa are more closely related; those in the same subset share a more recent common ancestor than with the taxa in the other subset. If biologists can only get partial splits, called subsplits from their data, can a tree still be constructed?

This question was considered in a paper published in 2002 by Charles Semple and Mike Steel. They were able to provide criteria necessary to reconstruct a unique tree from partial split information. One of the conditions required that the subsplits be consistent with a tree. However, determining which trees satisfied this condition was not apparent in their paper. So my senior thesis question is how do we know when a set of subsplits consistent with a tree and what does that tree (or set of trees) look like.

I was able to develop a reconstruction algorithm that would allow us to determine when a set of compatible (n-1)-leaf subsplits could be realized by an n-leaf tree. The idea of the algorithm is to add the missing leaf to each subsplit while maintaining the compatibility of the splits. Thus using Buneman's result we can reconstruct the tree that is consistent with our given set of subsplits. My main theorem proves that my algorithm will halt after transforming all of the subsplits into splits if and only if the subsplits are consistent with an n-leaf tree.