Document Type

Dissertation

Degree

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor's Name

Naphtali Rishe

First Advisor's Title

Committee chair

Second Advisor's Name

Vagelis Hristidis

Third Advisor's Name

Raju Rangaswami

Fourth Advisor's Name

Malek Adjouadi

Keywords

Spatial databases, searches, MapReduce, scalability, indexing.

Date of Defense

11-8-2011

Abstract

Modern geographical databases store a rich set of aspatial attributes in addition to geographic data. Retrieving spatial records constrained on spatial and aspatial attributes provides users the ability to perform more interesting spatial analyses via composite spatial searches; e.g., in a real estate database, "Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000". Efficient processing of such composite searches requires combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by non-spatial filter, or viceversa), which can incur large performance overheads. On the other hand, the amount of geolocation data in databases is rapidly increasing due in part to advances in geolocation technologies (e.g., GPS- enabled mobile devices) that allow to associate location data to nearly every object or event. Hence, practical spatial databases may face data ingestion challenges of large data volumes. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre- processing task) can be scaled in MapReduce – a well-adopted parallel programming model, developed by Google, for data intensive problems. Close to linear scalability was observed in index construction tasks over large spatial datasets. Subsequently, we develop novel techniques for simultaneously indexing spatial with textual and numeric data to process k-nearest neighbor searches with aspatial Boolean selection constraints. In particular, numeric ranges are compactly encoded and explicitly indexed. Experimental evaluations with real spatial databases showed query response times within acceptable ranges for interactive search systems.

Share

COinS