Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
Nagarajan Prabakar
First Advisor's Committee Title
Committee Chair
Second Advisor's Name
Leonardo Bobadilla
Second Advisor's Committee Title
Committee Member
Third Advisor's Name
Hadi Amini
Third Advisor's Committee Title
Committee Member
Fourth Advisor's Name
Ananda Mondal
Fourth Advisor's Committee Title
Committee Member
Fifth Advisor's Name
Himanshu Upadhyay
Fifth Advisor's Committee Title
Committee Member
Sixth Advisor's Name
Ajeet Kaushik
Sixth Advisor's Committee Title
Committee Member
Keywords
Anomaly Detection, Sequential Data, Deep Learning, Long Short Term Memory, Optimization
Date of Defense
6-20-2022
Abstract
Anomaly Detection has been researched in various domains with several applications in intrusion detection, fraud detection, system health management, and bio-informatics. Conventional anomaly detection methods analyze each data instance independently (univariate or multivariate) and ignore the sequential characteristics of the data. Anomalies in the data can be detected by grouping the individual data instances into sequential data and hence conventional way of analyzing independent data instances cannot detect anomalies. Currently: (1) Deep learning-based algorithms are widely used for anomaly detection purposes. However, significant computational overhead time is incurred during the training process due to static constant batch size and learning rate parameters for each epoch, (2) the threshold to decide whether an event is normal or malicious is often set as static. This can drastically increase the false alarm rate if the threshold is set low or decrease the True Alarm rate if it is set to a remarkably high value, (3) Real-life data is messy. It is impossible to learn the data features by training just one algorithm. Therefore, several one-class-based algorithms need to be trained. The final output is the ensemble of the output from all the algorithms. The prediction accuracy can be increased by giving a proper weight to each algorithm's output. By extending the state-of-the-art techniques in learning-based algorithms, this dissertation provides the following solutions: (i) To address (1), we propose a hybrid, dynamic batch size and learning rate tuning algorithm that reduces the overall training time of the neural network. (ii) As a solution for (2), we present an adaptive thresholding algorithm that reduces high false alarm rates. (iii) To overcome (3), we propose a multilevel hybrid ensemble anomaly detection framework that increases the anomaly detection rate of the high dimensional dataset.
Identifier
FIDC010748
ORCID
0000-0002-5740-4597
Previously Published In
J. Soni, N. Prabakar and H. Upadhyay, "Behavioral Analysis of System Call Sequences Using LSTM Seq-Seq, Cosine Similarity and Jaccard Similarity for Real-Time Anomaly Detection," 2019 International Conference on Computational Science and Computational Intelligence (CSCI), 2019, pp. 214-219, doi: 10.1109/CSCI49370.2019.00043.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Soni, Jayesh, "Anomaly Detection in Sequential Data: A Deep Learning-Based Approach" (2022). FIU Electronic Theses and Dissertations. 5052.
https://digitalcommons.fiu.edu/etd/5052
Deans Approval of Dissertation Content
Included in
Artificial Intelligence and Robotics Commons, Data Science Commons, Information Security Commons, Theory and Algorithms Commons
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).