# Mike Hansen

Harvey Mudd College Mathematics 2007

Thesis Proposal: | Algebra and Phylogenetic Trees |
---|---|

Thesis Advisor: | Prof. Francis Su |

Second Reader: | Prof. Michael Orrison |

10 Minute Presentation: | Algebraic Invariants of Phylogenetic Trees |

20 Minute Presentation: | Algebraic Invariants of Phylogenetic Trees |

MAA Poster: | Algebraic Invariants of Phylogenetic Trees |

Final Report: | Algebraic Invariants of Phylogenetic Trees |

## Algebra and Phylogenetic Trees

### Introduction

One of the few and interesting topics that lies at the intersection of algebra and biology is that of phylogenetic trees. The central problem associated with phylogenetic trees is to reconstruct a tree given a set of biological data. There are certain algebraic techniques, namely that of spectral analysis and phylogenetic invariants, that allow one to reconstruct a phylogenetic tree from DNA sequence data. I'd like to study in detail the algebra associated with these phylogenetic trees.

### Proposed Research

In my research, I aim to continue to extension of spectral analysis started by Székely, Steel, and Erdos to allow the technique to be used with more general models of DNA substitution. I'd like to continue this research which I started in the summer of 2004, develop it more fully, and understand more deeply the statistics and mathematics involved.

### Prior Research

The first techniques of spectral analysis were developed by Michael Hendy in his paper "The Relationship Between Simple Evolutionary Tree Models and Observable Sequence Data" . In this work, he used Cavendar's model of changes in 2-character sequences of data. He devised the closest tree method for selecting the best phylogenetic tree based on the relationship between bipartition in the data and the corresponding tree.

Up until work done by Székely, Steel, and Erdos , spectral analysis could only be performed on 2-character sequence data as opposed to the 4-character sequence data normally contained within DNA. In addition, the assumptions required by the DNA substitution models were quite strong. Their work allows 4-character sequence data to be used as well as a more general model of DNA substitution to be used.

### References

- Hendy, M. D., The relationship between simple evolutionary tree models and observable sequence data, Systematic Zoology, vol. 38, no. 4, pp. 310-321, 1989
- Hendy, M D and Penny, D and Steel, M A, A discrete Fourier analysis for evolutionary trees, Proc Natl Acad Sci U S A, vol. 91, no. 8, pp. 3339-3343, 1994
- Székely, L. A. and Steel, M. A. and Erdos, Fourier calculus on evolutionary trees, Adv. in Appl. Math., vol. 14, no. 2, pp. 200-210, 1993