DATA STRUCTURES FOR GEOMETRIC RETRIEVAL IN TREE-LIKE TOPOLOGIES
Abstract
This thesis studies the generalizations of orthogonal range searching to the case in which one of the dimensions is replaced with a tree topology. We design data structures for efficient support of path queries, which generalize the corresponding range queries. We work in the word-RAM model with word size w = Ω(lgn). Let T be an ordinal tree on n nodes, each of which is assigned a d-dimensional weight vector from [n] d = {1,2,...,n} d , where d is a constant integer.
We solve: (i) ancestor dominance reporting, in O(lg d−1 n+k) time and
O(nlg d−2 n) space (where k is the size of the output, and d ≥ 2). Also achieved is a trade-off of O(lg d−1 n/(lglgn) d−2 +k) time and
O(nlg d−2+ε n) space (d ≥ 3), for an arbitrary constant ε > 0; (ii) path successor, in O(lg d−1+ε n) time and O(nlg d−1 n) space (d ≥ 1); (iii) path counting, in O((lgn/lglgn) d ) time and O(n(lgn/lglgn) d−1 ) space (d ≥ 1); (iv) path reporting, in O(lg d−1 n/(lglgn) d−2+k) time and
O(nlg d−1+ε n) space (d ≥ 2). All these bounds match or nearly match the best bounds in corresponding range queries.
Next, we study trees in which each node is also assigned a category. For unweighted trees, we solve categorical path counting, in O(t lglgw σ) time and O(n + (n/t) 2 ) space, for fixed 1 ≤ t ≤ n, implying linear space for
O( √nlglgw σ) time. By a reduction from Boolean matrix multiplication, we show that the time is optimal within polylogarithmic factors, with current knowledge and only combinatorial methods. For weighted trees, we solve categorical path range counting in O(t lglgn) time and O(nlglgn+(n/t) 4 ) space, or O(t lg ε n) time and O(n+(n/t) 4 ) space, which implies linear space for O(n 3/4 lg ε n) time. The solution is extended to the trees weighted with vectors from [n] d , d ≥ 2, with time- and space-bounds which are within polylogarithmic factors of the best bounds for point-sets. We also solve approximate variations of these categorical queries.
Finally, we experimentally study the data structures for path median, path counting and path reporting queries. We propose practical realizations of the best theoretical results, and provide both succinct and pointer-based implementations. Our experiments show the viability of the former in practical scenarios.