Unsupervised Clustering of Time Series from Microbial Marker-Gene Data

Hall, Michael

dc.contributor.author	Hall, Michael
dc.date.accessioned	2016-07-29T13:12:12Z
dc.date.available	2016-07-29T13:12:12Z
dc.date.issued	2016-07-29T13:12:12Z
dc.identifier.uri	http://hdl.handle.net/10222/72005
dc.description.abstract	Microorganisms interact with each other and the world around us, impacting every environment that they inhabit. DNA sequencing technology allows us to monitor entire communities of microorganisms. Using taxonomic marker genes, the abundance of thousands of microbial species can be tracked across time. Marker-gene data sets are often very large, requiring data reduction techniques for effective analysis. The typical approach involves clustering the DNA sequences by sequence identity, grouping similar sequences into operational taxonomic units. The emergence of marker-gene data sets with a temporal component offers opportunities to cluster genes based on temporal correlation rather than sequence identity; such an approach may be more effective in revealing ecologically meaningful associations. In this work, we describe an algorithm and software package for clustering marker-gene data based on time-series profiles. We present an efficient, interactive, and cross-platform solution that takes the user from raw sequence data to informative visualizations of the inferred clusters. We validate our method on simulated data and apply it to several longitudinal marker-gene data sets including faecal communities from the human gut, and communities from a freshwater lake sampled over eleven years. Within the gut, the segregation of the time series around a food poisoning event was immediately clear. In the freshwater lake, an annual summer bloom seasonal dynamics were isolated and highlighted by our method. We show that high sequence similarity between marker genes does not guarantee similar temporal dynamics. As a result, clustering based on sequence identity alone would hide many important patterns in these data sets. Our algorithm and visualization platform bring these patterns back to the surface. Finally, we demonstrate that multiple time series can be clustered simultaneously, providing a unique way to visualize marker-gene data sets with both longitudinal and cross-sectional components.	en_US
dc.language.iso	en	en_US
dc.subject	time series	en_US
dc.subject	clustering	en_US
dc.subject	microbiome	en_US
dc.subject	microbial ecology	en_US
dc.subject	bioinformatics	en_US
dc.title	Unsupervised Clustering of Time Series from Microbial Marker-Gene Data	en_US
dc.date.defence	2016-07-22
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.degree	Master of Science	en_US
dc.contributor.external-examiner	n/a	en_US
dc.contributor.graduate-coordinator	Robert Beiko	en_US
dc.contributor.thesis-reader	Hong Gu	en_US
dc.contributor.thesis-reader	Andrew Roger	en_US
dc.contributor.thesis-supervisor	Robert Beiko	en_US
dc.contributor.ethics-approval	Not Applicable	en_US
dc.contributor.manuscripts	Not Applicable	en_US
dc.contributor.copyright-release	Not Applicable	en_US

Find Full text

Files in this item

Name:: Hall-Michael-MSc-CBBI-July-2016.pdf
Size:: 4.798Mb
Format:: PDF
Description:: Main thesis document

View/Open

This item appears in the following Collection(s)

Faculty of Graduate Studies Online Theses

Show simple item record