ACCELERATING TEXT RELATEDNESS COMPUTATIONS ON GENERAL PURPOSE GRAPHICS PROCESSING UNITS

Angevine, Duffy

ACCELERATING TEXT RELATEDNESS COMPUTATIONS ON GENERAL PURPOSE GRAPHICS PROCESSING UNITS

dc.contributor.author	Angevine, Duffy
dc.contributor.copyright-release	Not Applicable	en_US
dc.contributor.degree	Master of Computer Science	en_US
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.ethics-approval	Not Applicable	en_US
dc.contributor.external-examiner	n/a	en_US
dc.contributor.graduate-coordinator	Dr. N. Zeh	en_US
dc.contributor.manuscripts	Not Applicable	en_US
dc.contributor.thesis-reader	Dr. N. Zeh	en_US
dc.contributor.thesis-reader	Dr. A. Moh'd	en_US
dc.contributor.thesis-supervisor	Dr. A. Rau-Chaplin	en_US
dc.date.accessioned	2015-12-14T15:01:56Z
dc.date.available	2015-12-14T15:01:56Z
dc.date.defence	2015-12-02
dc.date.issued	2015
dc.description.abstract	This thesis investigates a novel approach for accelerating document similarity calculations using the Google Trigram Method (GTM). GTM can be performed as either a 1:1 comparison between a pair of documents, a 1:N comparison which occurs between one document and several others, or as an N:N comparison, where all documents within a set are compared against each other. Existing research in this domain has focused on accelerating the GTM on standard processors. In contrast, this thesis focuses on accelerating the performance of an N:N document relatedness calculation using a General Purpose Graphics Processing Unit (GPGPU). Fundamental to our approach is the pre-computation of several static elements. These static elements are the GTM inputs: the documents to be compared, and the Google N-Grams. The Google N-Grams are processed to produce a word relatedness matrix, and the documents are tokenized. They are then saved to disk to allow for recall and are available for calculating document relatedness. The mapping of the GTM to a GPGPU requires analysis to establish an effective system to transfer documents to the GPGPU, the data structures to be used in the GTM calculations, as well as an investigation into how to effectively implement GTM on the GPGPU's unique architecture. Having designed a set of GPGPU methods we systematically evaluate their performance. In this thesis, the GPGPU methods are compared to a multi-core Central Processing Unit (CPU) method that acts as a baseline. In total, two different CPU methods and four different GPGPU methods are evaluated. The CPU hardware platform is a workstation with a pair of 8 core Intel Xeon processors, retailing for approximately $10,000. The GPGPU platform is a Nvidia GeForce 660 GTX, worth approximately $200 at the time of purchase. We observe across a wide range of data sets that the GPGPU achieved between 40% and 80% of the performance observed on the multi-core workstation, at one fiftieth of the cost	en_US
dc.identifier.uri	http://hdl.handle.net/10222/64679
dc.language.iso	en_US	en_US
dc.subject	GPGPU	en_US
dc.subject	Text Relatedness	en_US
dc.subject	GTM	en_US
dc.subject	GPU	en_US
dc.title	ACCELERATING TEXT RELATEDNESS COMPUTATIONS ON GENERAL PURPOSE GRAPHICS PROCESSING UNITS	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Angevine-Duffy-MCS-CSCI-December-2015.pdf
Size:: 2.2 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Graduate Studies Online Theses