- Michael Orrison
- Second Reader(s)
- Matthew Davis
Given the result $v_0$ of a survey and a nested collection of summary statistics that could be used to describe that result, it is natural to ask which of these summary statistics best describe $v_0$. In 1998 Diaconis and Sturmfels presented an approach for determining the conditional significance of a higher order statistic, after sampling a space conditioned on the value of a lower order statistic. Their approach involves the computation of a Markov basis, followed by the use of a Markov process with stationary hypergeometric distribution to generate a sample.
This technique for data analysis has become an accepted tool of algebraic statistics, particularly for the study of fully ranked data. In this thesis, we explore the extension of this technique for data analysis to the study of partially ranked data, focusing on data from surveys in which participants are asked to identify their top $k$ choices of $n$ items. Before we move on to our own data analysis, though, we present a thorough discussion of the Diaconis–Sturmfels algorithm and its use in data analysis. In this discussion, we attempt to collect together all of the background on Markov bases, Markov proceses, Gröbner bases, implicitization theory, and elimination theory, that is necessary for a full understanding of this approach to data analysis.