Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
S. Sitharama Iyengar
First Advisor's Committee Title
Co-Committee Chair
Second Advisor's Name
Niki Pissinou
Second Advisor's Committee Title
Co-Committee Chair
Third Advisor's Name
Deng Pan
Fourth Advisor's Name
Shaolei Ren
Fifth Advisor's Name
Shu-Ching Chen
Keywords
Sensor Networks, Mobile Sensor Networks, Data-cleaning, Machine Learning, Data Mining, Routing, Power-aware routing, Netcoding, Data Aggregation, Quality of Data, Quality of Service, Feature Extraction, Randomforest, Bagging, Classifiers, Renewable Energy
Date of Defense
10-16-2013
Abstract
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool.
First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams.
Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier.
The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.
Identifier
FI13120409
Recommended Citation
Iyer, Vasanth, "Ensemble Stream Model for Data-Cleaning in Sensor Networks" (2013). FIU Electronic Theses and Dissertations. 973.
https://digitalcommons.fiu.edu/etd/973
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).