Document Type

Dissertation

Degree

Doctor of Philosophy (PhD)

Major/Program

Computer Science

First Advisor's Name

Nagarajan Prabakar

First Advisor's Committee Title

Committee Chair

Second Advisor's Name

Leonardo Bobadilla

Second Advisor's Committee Title

Committee Member

Third Advisor's Name

Hadi Amini

Third Advisor's Committee Title

Committee Member

Fourth Advisor's Name

Ananda Mondal

Fourth Advisor's Committee Title

Committee Member

Fifth Advisor's Name

Himanshu Upadhyay

Fifth Advisor's Committee Title

Committee Member

Sixth Advisor's Name

Ajeet Kaushik

Sixth Advisor's Committee Title

Committee Member

Keywords

Anomaly Detection, Sequential Data, Deep Learning, Long Short Term Memory, Optimization

Date of Defense

6-20-2022

Abstract

Anomaly Detection has been researched in various domains with several applications in intrusion detection, fraud detection, system health management, and bio-informatics. Conventional anomaly detection methods analyze each data instance independently (univariate or multivariate) and ignore the sequential characteristics of the data. Anomalies in the data can be detected by grouping the individual data instances into sequential data and hence conventional way of analyzing independent data instances cannot detect anomalies. Currently: (1) Deep learning-based algorithms are widely used for anomaly detection purposes. However, significant computational overhead time is incurred during the training process due to static constant batch size and learning rate parameters for each epoch, (2) the threshold to decide whether an event is normal or malicious is often set as static. This can drastically increase the false alarm rate if the threshold is set low or decrease the True Alarm rate if it is set to a remarkably high value, (3) Real-life data is messy. It is impossible to learn the data features by training just one algorithm. Therefore, several one-class-based algorithms need to be trained. The final output is the ensemble of the output from all the algorithms. The prediction accuracy can be increased by giving a proper weight to each algorithm's output. By extending the state-of-the-art techniques in learning-based algorithms, this dissertation provides the following solutions: (i) To address (1), we propose a hybrid, dynamic batch size and learning rate tuning algorithm that reduces the overall training time of the neural network. (ii) As a solution for (2), we present an adaptive thresholding algorithm that reduces high false alarm rates. (iii) To overcome (3), we propose a multilevel hybrid ensemble anomaly detection framework that increases the anomaly detection rate of the high dimensional dataset.

Identifier

FIDC010748

ORCID

0000-0002-5740-4597

Previously Published In

J. Soni, N. Prabakar and H. Upadhyay, "Behavioral Analysis of System Call Sequences Using LSTM Seq-Seq, Cosine Similarity and Jaccard Similarity for Real-Time Anomaly Detection," 2019 International Conference on Computational Science and Computational Intelligence (CSCI), 2019, pp. 214-219, doi: 10.1109/CSCI49370.2019.00043.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Deans Office Internal form - Soni.pdf (112 kB)
Deans Approval of Dissertation Content

Share

COinS
 

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).