Royal Statistical Society
Research Section




Extended Ordinary Meeting on

STATISTICAL MODELLING AND ANALYSIS OF GENETIC DATA

Wednesday 22 May, 2002, 2pm at the RSS

Abstracts


KW Broman and TP Speed
A model selection approach for the identification of quantitative trait loci in experimental crosses

We consider the problem of identifying the genetic loci (called quantitative trait loci, QTLs) contributing to variation in a quantitative trait, with data on an experimental cross. A large number of different statistical approaches to this problem have been described; most make use of multiple tests of hypotheses, and many consider models allowing only a single QTL. We feel the problem is best viewed as one of model selection. In this paper, we discuss the use of model selection ideas to identify QTLs in experimental crosses. We focus on a backcross experiment, with strictly additive QTLs, and concentrate on identifying QTLs, considering the estimation of their effects and precise locations of secondary importance. We present the results of a simulation study to compare the performance of a number of the more prominent methods.

Electronic version of the paper


P Fearnhead and P Donnelly
Approximate likelihood methods for estimating local recombination rates

There is currently great interest in understanding the way in which recombination rates vary, over short scales, across the human genome. Aside from inherent interest, an understanding of this local variation is essential for the sensible design and analysis of many studies aimed at elucidating the genetic basis of common diseases or of human population histories. Standard pedigree-based approaches do not have the fine-scale resolution needed to address this issue. In contrast, samples of DNA sequences from unrelated chromosomes in the population do carry relevant information, but inference from such data is extremely challenging. While there has been much recent interest in the development of full-likelihood inference methods for estimating local recombination rates from such data, they are not currently practicable for datasets of the size being generated by modern experimental techniques. In this paper we introduce and study two approximate likelihood methods. The first, a marginal likelihood, ignores some of the data. Careful choice of what to ignore results in substantial computational savings with virtually no loss of relevant information. For larger sequences, we introduce a "composite" likelihood, which approximates the model of interest by ignoring certain long-range dependencies. An informal asymptotic analysis and a simulation study suggest that inference based on the composite likelihood is practicable and performs well. We combine both methods to reanalyse data from the lipoprotein lipase gene, and the results seriously question conclusions from some earlier studies of this data.

Electronic version of the paper


B Larget, DL Simon and JB Kadane
Bayesian phylogenetic inference from animal mitochondrial genome arrangements

The determination of evolutionary relationships is a fundamental problem in evolutionary biology. Genome arrangement data is potentially more informative than DNA sequence data for inferring evolutionary relationships among distantly related taxa. We describe a Bayesian framework for phylogenetic inference from mitochondrial genome arrangement data using Markov chain Monte Carlo methods. We apply the method to assess evolutionary relationships among eight animal phyla.

Electronic version of the paper


G Nicholson, AV Smith, F Jonsson, O Gustafsson, K Stefansson and P Donnelly
Assessing population differentiation and isolation from single nucleotide polymorphism data

We introduce a new, hierarchical, model for SNP allele frequencies in a structured population, which is naturally fitted via MCMC. There is one parameter for each population, closely analogous to a population-specific version of Wright's F_ST , which can be interpreted as measuring how isolated the relevant population has been. Our model includes the effects of SNP ascertainment and is motivated by population genetics considerations, explicitly in the transient setting after divergence of populations, rather than as the equilibrium of a stochastic model, as is traditionally the case. For the sizes of data set we consider the method provides good parameter estimates, and considerably outperforms estimation methods analogous to those currently used in practice. We apply the method to one new, and one existing human data set, each with rather different characteristics - the first consisting of three rather close European populations, the second of four populations taken from across the globe. A novelty of our framework is that the fit of the underlying model can be assessed easily, and these results are encouraging for both data sets analysed. Our analysis suggests that Iceland is more differentiated than the other two European populations (France and Utah), a finding consistent with the historical record, but not obvious from comparisons of simple summary statistics.

Electronic version of the paper


G Parmigiani, ES Garrett, R Anbazhagan and E Gabrielson
A statistical framework for expression-based molecular classification in cancer

Genome-wide measurement of gene expression is a promising approach to the identification of subclasses of cancer that are currently not differentiable, but potentially biologically heterogeneous. This type of molecular classification gives hope for highly individualized and more effective prognosis and treatment of cancer. Statistically, molecular classification is a complex hypothesis-generating activity, involving data exploration, modeling, and expert elicitation. In this paper we propose a modeling framework that can be used to inform and organize the development of exploratory tools for classification. Our framework uses latent categories to provide both a statistical definition of differential expression, and a precise, experiment-independent, definition of a molecular profile. It also generates natural similarity measures for traditional clustering, and gives probabilistic statements about the assignment of tumor samples to molecular profiles.

Electronic version of the paper
Associated website


  • Back to the Genetics meeting home page.