Show simple item record

dc.contributor.authorHusain, Syed Mohammad Baqir
dc.date.accessioned2024-08-30T14:01:00Z
dc.date.available2024-08-30T14:01:00Z
dc.date.issued2024-08-30
dc.identifier.urihttp://hdl.handle.net/10222/84532
dc.description.abstractThis research introduces the Conceptual Document Clustering Explanation Model (CDCEM), a novel explanation model for explaining unsupervised textual clustering. CDCEM explains the discovered clusters and document assignments. Furthermore, it ensures faithfulness—meaning it accurately reflects the decision-making process—using the core elements of black-box textual clustering, such as document embedding and centroids from k-means. This faithfulness and comprehensiveness boost user trust and understanding and help debug clustering. Using Wikipedia, CDCEM first performs wikification, which extracts real-world concepts from the text. It then evaluates these concepts' significance for cluster assignment to produce concept-based explanations. CDCEM determines the importance of each concept within a cluster by measuring the cosine similarity between the concept's embedding (representing its contextual meaning) with the cluster centres (representing the cluster's theme), both of which it derives from a black-box model (using ELMO for embeddings and K-means for clustering). This concept's importance for each cluster facilitates generating concept-based explanations at two levels: cluster-level explanations, which describe the concepts that best represent the clusters, and document-level explanations, which clarify why the black-box model assigns a document to a particular cluster. We quantitatively evaluate the faithfulness of CDCEM using AG News, DBpedia, and Reuters-21578 datasets, comparing it with explainable classification methods (Decision Tree, Logistic Regression, and Naive Bayes) by treating clusters as classes and computing the agreement between the black-box model's predictions and explanations. Additionally, a user study was conducted to compare CDCEM with the best baseline in terms of comprehensiveness, accuracy, usefulness, user satisfaction, and usability of the explanation visualization tool on the AG News dataset. CDCEM showed higher faithfulness than the baseline model in quantitative evaluations, indicating accurate explanations of unsupervised clustering decisions. Qualitative evaluations revealed that users preferred CDCEM's cluster-level and document-level explanations for accuracy, clarity, logic, and comprehensibility.en_US
dc.language.isoenen_US
dc.subjectExplanation Modelen_US
dc.subjectDocument Clusteringen_US
dc.subjectFaithfulnessen_US
dc.titleFaithful Concept-based Explanations For Partition-based Document Clusteringen_US
dc.date.defence2024-08-26
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.thesis-readerHassan Sajjaden_US
dc.contributor.thesis-readerGabriel Spadon De Souzaen_US
dc.contributor.thesis-readerMasud Rahmanen_US
dc.contributor.thesis-supervisorEnayat Rajabien_US
dc.contributor.thesis-supervisorEvangelos E. Miliosen_US
dc.contributor.ethics-approvalReceiveden_US
dc.contributor.manuscriptsNoen_US
dc.contributor.copyright-releaseNoen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record