Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
Naphtali Rishe
First Advisor's Committee Title
Committee Chair
Second Advisor's Name
Dong Chen
Second Advisor's Committee Title
Committee Member
Third Advisor's Name
Masoud Sadjadi
Third Advisor's Committee Title
Committee Member
Fourth Advisor's Name
Armando Barreto
Fourth Advisor's Committee Title
Committee Member
Fifth Advisor's Name
Malek Adjouadi
Fifth Advisor's Committee Title
Committee Member
Keywords
geographic data, machine learning, spatial non-stationarity
Date of Defense
11-13-2020
Abstract
Geographic data are information associated with a location on the surface of the Earth. They comprise spatial attributes (latitude, longitude, and altitude) and non-spatial attributes (facts related to a location). Traditionally, Physical Geography datasets were considered to be more valuable, thus attracted most research interest. But with the advancements in remote sensing technologies and widespread use of GPS enabled cellphones and IoT (Internet of Things) devices, recent years witnessed explosive growth in the amount of available Human Geography datasets. However, methods and tools that are capable of analyzing and modeling these datasets are very limited. This is because Human Geography data are inherently difficult to model due to its characteristics (non-stationarity, uneven distribution, etc.).
Many algorithms were invented to solve these challenges -- especially non-stationarity -- in the past few years, like Geographically Weighted Regression, Multiscale GWR, Geographical Random Forest, etc. They were proven to be much more efficient than the general machine learning algorithms that are not specifically designed to deal with non-stationarity. However, such algorithms are far from perfect and have a lot of room for improvement.
This dissertation proposed multiple algorithms for modeling non-stationary geographic data. The main contributions are: (1) designed a novel method to evaluate non-stationarity and its impact on regression models; (2) proposed the Geographic R-Partition tree for modeling non-stationary data; (3) proposed the IDW-RF algorithm, which uses the advantages of Random Forests to deal with extremely unevenly distributed geographic datasets; (4) proposed the LVRF algorithm, which models geographic data using a latent variable based method. Experiments show that these algorithms are very efficient and outperform other state-of-the-art algorithms in certain scenarios.
Identifier
FIDC009222
Previously Published In
Liangdong Deng, Malek Adjouadi, and Naphtali Rishe. Inverse Distance Weighted Random Forests: Modeling Unevenly Distributed Non-Stationary Geographic Data. ICACSIS 2020 (in press).
Liangdong Deng, Malek Adjouadi, and Naphtali Rishe. Geographic Boosting Tree: Modeling Non-Stationary Spatial Data. ICMLA 2020 (in press).
Recommended Citation
Deng, Liangdong, "Geographic Data Mining and Knowledge Discovery" (2020). FIU Electronic Theses and Dissertations. 4565.
https://digitalcommons.fiu.edu/etd/4565
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).