Doctor of Philosophy (PhD)
First Advisor's Name
First Advisor's Committee Title
Second Advisor's Name
Second Advisor's Committee Title
Third Advisor's Name
Third Advisor's Committee Title
Fourth Advisor's Name
Fourth Advisor's Committee Title
Fifth Advisor's Name
Fifth Advisor's Committee Title
geographic data, machine learning, spatial non-stationarity
Date of Defense
Geographic data are information associated with a location on the surface of the Earth. They comprise spatial attributes (latitude, longitude, and altitude) and non-spatial attributes (facts related to a location). Traditionally, Physical Geography datasets were considered to be more valuable, thus attracted most research interest. But with the advancements in remote sensing technologies and widespread use of GPS enabled cellphones and IoT (Internet of Things) devices, recent years witnessed explosive growth in the amount of available Human Geography datasets. However, methods and tools that are capable of analyzing and modeling these datasets are very limited. This is because Human Geography data are inherently difficult to model due to its characteristics (non-stationarity, uneven distribution, etc.).
Many algorithms were invented to solve these challenges -- especially non-stationarity -- in the past few years, like Geographically Weighted Regression, Multiscale GWR, Geographical Random Forest, etc. They were proven to be much more efficient than the general machine learning algorithms that are not specifically designed to deal with non-stationarity. However, such algorithms are far from perfect and have a lot of room for improvement.
This dissertation proposed multiple algorithms for modeling non-stationary geographic data. The main contributions are: (1) designed a novel method to evaluate non-stationarity and its impact on regression models; (2) proposed the Geographic R-Partition tree for modeling non-stationary data; (3) proposed the IDW-RF algorithm, which uses the advantages of Random Forests to deal with extremely unevenly distributed geographic datasets; (4) proposed the LVRF algorithm, which models geographic data using a latent variable based method. Experiments show that these algorithms are very efficient and outperform other state-of-the-art algorithms in certain scenarios.
Previously Published In
Liangdong Deng, Malek Adjouadi, and Naphtali Rishe. Inverse Distance Weighted Random Forests: Modeling Unevenly Distributed Non-Stationary Geographic Data. ICACSIS 2020 (in press).
Liangdong Deng, Malek Adjouadi, and Naphtali Rishe. Geographic Boosting Tree: Modeling Non-Stationary Spatial Data. ICMLA 2020 (in press).
Deng, Liangdong, "Geographic Data Mining and Knowledge Discovery" (2020). FIU Electronic Theses and Dissertations. 4565.
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).