Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data
Date
2007
Authors
Bao, Le
Gu, Hong
Dunn, Katherine A.
Bielawski, Joseph P.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Background: Models of codon evolution have proven useful for investigating the strength and
direction of natural selection. In some cases, a priori biological knowledge has been used
successfully to model heterogeneous evolutionary dynamics among codon sites. These are called
fixed-effect models, and they require that all codon sites are assigned to one of several partitions
which are permitted to have independent parameters for selection pressure, evolutionary rate,
transition to transversion ratio or codon frequencies. For single gene analysis, partitions might be
defined according to protein tertiary structure, and for multiple gene analysis partitions might be
defined according to a gene's functional category. Given a set of related fixed-effect
models, the task of selecting the model that best fits the data is not trivial. Results: In this
study, we implement a set of fixed-effect codon models which allow for different levels of
heterogeneity among partitions in the substitution process. We describe strategies for selecting
among these models by a backward elimination procedure, Akaike information criterion ( AIC) or a
corrected Akaike information criterion ( AICc). We evaluate the performance of these model selection
methods via a simulation study, and make several recommendations for real data analysis. Our
simulation study indicates that the backward elimination procedure can provide a reliable method for
model selection in this setting. We also demonstrate the utility of these models by application to a
single-gene dataset partitioned according to tertiary structure ( abalone sperm lysin), and a
multi-gene dataset partitioned according to the functional category of the gene ( flagellar-related
proteins of Listeria). Conclusion: Fixed-effect models have advantages and disadvantages.
Fixed-effect models are desirable when data partitions are known to exhibit significant
heterogeneity or when a statistical test of such heterogeneity is desired. They have the
disadvantage of requiring a priori knowledge for partitioning sites. We recommend: ( i) selection of
models by using backward elimination rather than AIC or AICc, ( ii) use a stringent cut-off, e. g.,
p = 0.0001, and ( iii) conduct sensitivity analysis of results. With thoughtful application, fixed-
effect codon models should provide a useful tool for large scale multi-gene analyses.
Description
Keywords
Citation
Bao, Le, Hong Gu, Katherine A. Dunn, and Joseph P. Bielawski. 2007. "Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on
their application to gene and genome data." Bmc Evolutionary Biology 7: 5-S5.