Faithful Concept-based Explanations For Partition-based Document Clustering

Husain, Syed Mohammad Baqir

dc.contributor.author	Husain, Syed Mohammad Baqir
dc.date.accessioned	2024-08-30T14:01:00Z
dc.date.available	2024-08-30T14:01:00Z
dc.date.issued	2024-08-30
dc.identifier.uri	http://hdl.handle.net/10222/84532
dc.description.abstract	This research introduces the Conceptual Document Clustering Explanation Model (CDCEM), a novel explanation model for explaining unsupervised textual clustering. CDCEM explains the discovered clusters and document assignments. Furthermore, it ensures faithfulness—meaning it accurately reflects the decision-making process—using the core elements of black-box textual clustering, such as document embedding and centroids from k-means. This faithfulness and comprehensiveness boost user trust and understanding and help debug clustering. Using Wikipedia, CDCEM first performs wikification, which extracts real-world concepts from the text. It then evaluates these concepts' significance for cluster assignment to produce concept-based explanations. CDCEM determines the importance of each concept within a cluster by measuring the cosine similarity between the concept's embedding (representing its contextual meaning) with the cluster centres (representing the cluster's theme), both of which it derives from a black-box model (using ELMO for embeddings and K-means for clustering). This concept's importance for each cluster facilitates generating concept-based explanations at two levels: cluster-level explanations, which describe the concepts that best represent the clusters, and document-level explanations, which clarify why the black-box model assigns a document to a particular cluster. We quantitatively evaluate the faithfulness of CDCEM using AG News, DBpedia, and Reuters-21578 datasets, comparing it with explainable classification methods (Decision Tree, Logistic Regression, and Naive Bayes) by treating clusters as classes and computing the agreement between the black-box model's predictions and explanations. Additionally, a user study was conducted to compare CDCEM with the best baseline in terms of comprehensiveness, accuracy, usefulness, user satisfaction, and usability of the explanation visualization tool on the AG News dataset. CDCEM showed higher faithfulness than the baseline model in quantitative evaluations, indicating accurate explanations of unsupervised clustering decisions. Qualitative evaluations revealed that users preferred CDCEM's cluster-level and document-level explanations for accuracy, clarity, logic, and comprehensibility.	en_US
dc.language.iso	en	en_US
dc.subject	Explanation Model	en_US
dc.subject	Document Clustering	en_US
dc.subject	Faithfulness	en_US
dc.title	Faithful Concept-based Explanations For Partition-based Document Clustering	en_US
dc.date.defence	2024-08-26
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.degree	Master of Computer Science	en_US
dc.contributor.external-examiner	n/a	en_US
dc.contributor.thesis-reader	Hassan Sajjad	en_US
dc.contributor.thesis-reader	Gabriel Spadon De Souza	en_US
dc.contributor.thesis-reader	Masud Rahman	en_US
dc.contributor.thesis-supervisor	Enayat Rajabi	en_US
dc.contributor.thesis-supervisor	Evangelos E. Milios	en_US
dc.contributor.ethics-approval	Received	en_US
dc.contributor.manuscripts	No	en_US
dc.contributor.copyright-release	No	en_US

Find Full text

Files in this item

Name:: SyedMohammadBaqirHusain2024.pdf
Size:: 2.707Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Faculty of Graduate Studies Online Theses

Show simple item record