Document Type

Dissertation

Degree

Doctor of Philosophy (PhD)

Major/Program

Computer Science

First Advisor's Name

Naphtali Rishe

First Advisor's Committee Title

Committee Chair

Second Advisor's Name

Dong Chen

Second Advisor's Committee Title

Committee Member

Third Advisor's Name

Masoud Sadjadi

Third Advisor's Committee Title

Committee Member

Fourth Advisor's Name

Armando Barreto

Fourth Advisor's Committee Title

Committee Member

Fifth Advisor's Name

Malek Adjouadi

Fifth Advisor's Committee Title

Committee Member

Keywords

geographic data, machine learning, spatial non-stationarity

Date of Defense

11-13-2020

Abstract

Geographic data are information associated with a location on the surface of the Earth. They comprise spatial attributes (latitude, longitude, and altitude) and non-spatial attributes (facts related to a location). Traditionally, Physical Geography datasets were considered to be more valuable, thus attracted most research interest. But with the advancements in remote sensing technologies and widespread use of GPS enabled cellphones and IoT (Internet of Things) devices, recent years witnessed explosive growth in the amount of available Human Geography datasets. However, methods and tools that are capable of analyzing and modeling these datasets are very limited. This is because Human Geography data are inherently difficult to model due to its characteristics (non-stationarity, uneven distribution, etc.).

Many algorithms were invented to solve these challenges -- especially non-stationarity -- in the past few years, like Geographically Weighted Regression, Multiscale GWR, Geographical Random Forest, etc. They were proven to be much more efficient than the general machine learning algorithms that are not specifically designed to deal with non-stationarity. However, such algorithms are far from perfect and have a lot of room for improvement.

This dissertation proposed multiple algorithms for modeling non-stationary geographic data. The main contributions are: (1) designed a novel method to evaluate non-stationarity and its impact on regression models; (2) proposed the Geographic R-Partition tree for modeling non-stationary data; (3) proposed the IDW-RF algorithm, which uses the advantages of Random Forests to deal with extremely unevenly distributed geographic datasets; (4) proposed the LVRF algorithm, which models geographic data using a latent variable based method. Experiments show that these algorithms are very efficient and outperform other state-of-the-art algorithms in certain scenarios.

Identifier

FIDC009222

Previously Published In

Liangdong Deng, Malek Adjouadi, and Naphtali Rishe. Inverse Distance Weighted Random Forests: Modeling Unevenly Distributed Non-Stationary Geographic Data. ICACSIS 2020 (in press).

Liangdong Deng, Malek Adjouadi, and Naphtali Rishe. Geographic Boosting Tree: Modeling Non-Stationary Spatial Data. ICMLA 2020 (in press).

Share

COinS
 

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).