dc.contributor.author | Cai, Yun | |
dc.date.accessioned | 2022-08-25T17:33:31Z | |
dc.date.available | 2022-08-25T17:33:31Z | |
dc.date.issued | 2022-08-25 | |
dc.identifier.uri | http://hdl.handle.net/10222/81891 | |
dc.description.abstract | Learning the structure of microbial communities is critical in understanding the
different community structures and functions of microbes in distinct individuals.
We view microbial communities as consisting of many subcommunities which are
formed by certain groups of microbes functionally dependent on each other. This
work studies the structure of microbial community data using the technique Non-
negative Matrix Factorisation (NMF).
The supervised NMF method for detecting the differences between microbial
communities was developed in my MSc. thesis. However, the interpretation of the
resulting factorizations were not considered, and the study of the performance of
the method was very limited. In Chapter 2 of this thesis, we review the supervised
NMF from my MSc. thesis, then perform extensive simulation studies and real
data analyses to better understand the interpretation and the performance of the
method under a wide range of scenarios.
One difficulty involved in using NMF is that there is not an accurate method to
select the rank for NMF. The rank corresponds to the number of subcommunities,
and is thus fundamentally important in interpreting the microbiome data. In order
to develop a suitable method to infer the number of ranks for NMF, we further
developed a deconvolution method to remove the convergence error in NMF results.
Chapter 3 develops a new method for the deconvolution problem. Deconvo-
lution is the problem of estimating the distribution of a quantity from a sample
with additive measurement error. Deconvolution has a wide number of applica-
tions, so this work is of very general interest. Our new deconvolution method
is based on maximizing log likelihood with a smoothness penalty (PMLE-decon).
We develop both the method and the associated asymptotic theory for PMLE
deconvolution, and provide an R package for general deconvolution distribution
estimation. Through simulations and real data examples, we show that our new
method has much better performance than existing methods, particularly for small
sample size or low signal-noise ratio. Our method can be applied both with known
or parametrically estimated error distribution, and with empirical error distribu-
tion, estimated from a pure error sample.
Finally, we develop a novel rank selection method based on hypothesis testing,
using a deconvolved bootstrap distribution to assess the significance level accu-
rately despite the large amount of optimisation error. Through simulations, we
demonstrate that our method is not only accurate at estimating the true ranks
for NMF but also efficient at computation compared with other methods, espe-
cially when the features are hard to distinguish. With the newly developed more
accurate rank selection method for NMF, we re-analyze the microbiome data we
worked on earlier and improve our understanding of microbial sub-communities. | en_US |
dc.language.iso | en | en_US |
dc.subject | NMF | en_US |
dc.subject | deconvolution | en_US |
dc.subject | NMF rank selection | en_US |
dc.title | MEASUREMENT ERROR DECONVOLUTION METHODS AND RANK SELECTION FOR NON-NEGATIVE MATRIX FACTORIZATION WITH APPLICATIONS IN MICROBIOME DATA | en_US |
dc.date.defence | 2022-08-19 | |
dc.contributor.department | Department of Mathematics & Statistics - Statistics Division | en_US |
dc.contributor.degree | Doctor of Philosophy | en_US |
dc.contributor.external-examiner | Grace Yi | en_US |
dc.contributor.graduate-coordinator | Joanna Mills Flemming | en_US |
dc.contributor.thesis-reader | Edward Susko | en_US |
dc.contributor.thesis-reader | Andrew Irwin | en_US |
dc.contributor.thesis-supervisor | Hong Gu | en_US |
dc.contributor.thesis-supervisor | Tobias Kenney | en_US |
dc.contributor.ethics-approval | Not Applicable | en_US |
dc.contributor.manuscripts | Not Applicable | en_US |
dc.contributor.copyright-release | Not Applicable | en_US |