dc.contributor.author | Chen, Ying. | en_US |
dc.date.accessioned | 2014-10-21T12:35:22Z | |
dc.date.available | 2005 | |
dc.date.issued | 2005 | en_US |
dc.identifier.other | AAINR08421 | en_US |
dc.identifier.uri | http://hdl.handle.net/10222/54753 | |
dc.description | More and more organizations, such as business, health care providers and scientific enterprises, rely on Online Analytical Processing (OLAP) to analyze massive data sets at a variety of summary levels and in a multidimensional way. In OLAP systems, one of the most computationally intensive tasks is to execute the Cube query, which was proposed by Gray et al. in 1997 as an extension of the Structured Query Language (SQL). A cube query generates a set of group-bys/views over all combinations of a set of attributes/dimensions from a table. The result of the query is a collection of multidimensional data, called a Data Cube. Pre-computing of data cubes can dramatically reduce the response time of other queries. Recently many sequential algorithms have been proposed to generate data cubes efficiently, however as the size of data sets grows, there is a need for even more scalable algorithms. Currently, for large data sets, the cube queries may require hours or even days to run on standard sequential machines. Parallel Computing can provide two key ingredients for dealing with large data size: (1) increased computational power through multiple processors and (2) increased I/O bandwidth through multiple parallel disks. | en_US |
dc.description | The work presented in this thesis combines (1) the design of efficient parallel cube generation algorithms for the three basic types of data cubes: full cubes, partial cubes and iceberg cubes, with (2) careful system work associated with parallelism and external memory issues, and (3) extensive experiments and evaluation. The proposal algorithms are both external memory and parallel. They are designed for shared-nothing clusters, and use explicitly represented cost models which aid in performance tuning and portability. Our experiments show that the relative speedup of the algorithms is close to optimal/linear speedup for a wide range of input parameters, and the scalability is almost linear on large data sets. The proposed algorithms have been carefully implemented in our cgmOLAP prototype, which is to our knowledge the first fully functional parallel OLAP system able to build data cubes at a rate of more than half terabyte per hour. | en_US |
dc.description | Thesis (Ph.D.)--Dalhousie University (Canada), 2005. | en_US |
dc.language | eng | en_US |
dc.publisher | Dalhousie University | en_US |
dc.publisher | | en_US |
dc.subject | Computer Science. | en_US |
dc.title | Parallel generation of ROLAP data cubes. | en_US |
dc.type | text | en_US |
dc.contributor.degree | Ph.D. | en_US |