Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Electrical and Computer Engineering
First Advisor's Name
Kang Yen
First Advisor's Committee Title
Committee chair
Second Advisor's Name
Alexander Pons
Second Advisor's Committee Title
Committee member
Third Advisor's Name
Jean Andrian
Third Advisor's Committee Title
Committee member
Fourth Advisor's Name
Nezih Pala
Fourth Advisor's Committee Title
Committee member
Fifth Advisor's Name
Deng Pan
Fifth Advisor's Committee Title
Committee member
Keywords
gradient descent, gradient deviation, model optimization, artificial neural network
Date of Defense
2-23-2021
Abstract
In deep learning, optimization algorithms are employed to expedite the resolution to accurate models through the calibrations of the current gradient and the associated learning rate. A major shortcoming of these existing methods is the manner in which the calibration terms are computed, only utilizing the previous gradients during their computations. Because the gradient is a time-sensitive variable computed at a specific moment in time, it is possible that older gradients can introduce significant deviation into the calibration terms. Although most algorithms alleviate this situation by combining the exponential moving average of the previous gradients, we found that this method is not very effective in practice, as it still causes undesirable accumulated impact on the gradients. Another shortcoming is that these existing algorithms lack the ability to incorporate the cost variance during the computation of the new gradient. Therefore, employing the same strategy in reducing the cost under all circumstances is inherently inaccurate. In addition, we identified that some advanced algorithms employ measurements that are confiscatory, resulting in erratic new gradients in practice. With respect to evaluation, we determined that a high error rate is more likely to result from the weak ability of translating the reduction in the cost to the error rate, a circumstance that has not been addressed in the research to improve the accuracies of new gradients.
In this dissertation, we propose an algorithm that employs the angle between consecutive gradients as a new metric to resolve all the aforementioned problems. The new and nine existing algorithms are implemented into a neural network and a logistic regression classifier for evaluation. The results show that the new method can improve the ability of cost/error rate reduction by 9.40%/11.11% on MNIST dataset and 41.63%/29.58% on NSL-KDD dataset. Also, the aforementioned translating ability of the new method outperforms other optimizers by 33.06%. One of the main contributions of our work is verifying the feasibility and effectiveness of using the angle between consecutive gradients as a reliable metric in generating accurate new gradients. Angle-based measurements could be incorporated into existing algorithms to enhance the cost reduction and translating abilities.
Identifier
FIDC009555
ORCID
0000-0003-4208-5738
Previously Published In
Song, C., Pons, A, Yen, K. (2021), AG-SGD: Angle-based Stochastic Gradient Descent. IEEE Access.
Song, C., Pons, A, Yen, K. (2020), Optimizing Stochastic Gradient Descent Using the Angle Between Gradients. IEEE International Conference on Big Data.
Recommended Citation
Song, Chongya, "An Angle-based Stochastic Gradient Descent Method for Machine Learning: Principle and Application" (2021). FIU Electronic Theses and Dissertations. 4699.
https://digitalcommons.fiu.edu/etd/4699
Included in
Artificial Intelligence and Robotics Commons, Data Science Commons, Theory and Algorithms Commons
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).