Parallel generation of ROLAP data cubes.

Chen, Ying.

dc.contributor.author	Chen, Ying.	en_US
dc.date.accessioned	2014-10-21T12:35:22Z
dc.date.available	2005
dc.date.issued	2005	en_US
dc.identifier.other	AAINR08421	en_US
dc.identifier.uri	http://hdl.handle.net/10222/54753
dc.description	More and more organizations, such as business, health care providers and scientific enterprises, rely on Online Analytical Processing (OLAP) to analyze massive data sets at a variety of summary levels and in a multidimensional way. In OLAP systems, one of the most computationally intensive tasks is to execute the Cube query, which was proposed by Gray et al. in 1997 as an extension of the Structured Query Language (SQL). A cube query generates a set of group-bys/views over all combinations of a set of attributes/dimensions from a table. The result of the query is a collection of multidimensional data, called a Data Cube. Pre-computing of data cubes can dramatically reduce the response time of other queries. Recently many sequential algorithms have been proposed to generate data cubes efficiently, however as the size of data sets grows, there is a need for even more scalable algorithms. Currently, for large data sets, the cube queries may require hours or even days to run on standard sequential machines. Parallel Computing can provide two key ingredients for dealing with large data size: (1) increased computational power through multiple processors and (2) increased I/O bandwidth through multiple parallel disks.	en_US
dc.description	The work presented in this thesis combines (1) the design of efficient parallel cube generation algorithms for the three basic types of data cubes: full cubes, partial cubes and iceberg cubes, with (2) careful system work associated with parallelism and external memory issues, and (3) extensive experiments and evaluation. The proposal algorithms are both external memory and parallel. They are designed for shared-nothing clusters, and use explicitly represented cost models which aid in performance tuning and portability. Our experiments show that the relative speedup of the algorithms is close to optimal/linear speedup for a wide range of input parameters, and the scalability is almost linear on large data sets. The proposed algorithms have been carefully implemented in our cgmOLAP prototype, which is to our knowledge the first fully functional parallel OLAP system able to build data cubes at a rate of more than half terabyte per hour.	en_US
dc.description	Thesis (Ph.D.)--Dalhousie University (Canada), 2005.	en_US
dc.language	eng	en_US
dc.publisher	Dalhousie University	en_US
dc.publisher		en_US
dc.subject	Computer Science.	en_US
dc.title	Parallel generation of ROLAP data cubes.	en_US
dc.type	text	en_US
dc.contributor.degree	Ph.D.	en_US

Find Full text

Files in this item

Name:: NR08421.PDF
Size:: 6.942Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Faculty of Graduate Studies Online Theses

Show simple item record