Document Type

Dissertation

Degree

Doctor of Philosophy (PhD)

Major/Program

Electrical and Computer Engineering

First Advisor's Name

Kang Yen

First Advisor's Committee Title

Committee chair

Second Advisor's Name

Alexander Pons

Second Advisor's Committee Title

Committee member

Third Advisor's Name

Jean Andrian

Third Advisor's Committee Title

Committee member

Fourth Advisor's Name

Nezih Pala

Fourth Advisor's Committee Title

Committee member

Fifth Advisor's Name

Deng Pan

Fifth Advisor's Committee Title

Committee member

Keywords

gradient descent, gradient deviation, model optimization, artificial neural network

Date of Defense

2-23-2021

Abstract

In deep learning, optimization algorithms are employed to expedite the resolution to accurate models through the calibrations of the current gradient and the associated learning rate. A major shortcoming of these existing methods is the manner in which the calibration terms are computed, only utilizing the previous gradients during their computations. Because the gradient is a time-sensitive variable computed at a specific moment in time, it is possible that older gradients can introduce significant deviation into the calibration terms. Although most algorithms alleviate this situation by combining the exponential moving average of the previous gradients, we found that this method is not very effective in practice, as it still causes undesirable accumulated impact on the gradients. Another shortcoming is that these existing algorithms lack the ability to incorporate the cost variance during the computation of the new gradient. Therefore, employing the same strategy in reducing the cost under all circumstances is inherently inaccurate. In addition, we identified that some advanced algorithms employ measurements that are confiscatory, resulting in erratic new gradients in practice. With respect to evaluation, we determined that a high error rate is more likely to result from the weak ability of translating the reduction in the cost to the error rate, a circumstance that has not been addressed in the research to improve the accuracies of new gradients.

In this dissertation, we propose an algorithm that employs the angle between consecutive gradients as a new metric to resolve all the aforementioned problems. The new and nine existing algorithms are implemented into a neural network and a logistic regression classifier for evaluation. The results show that the new method can improve the ability of cost/error rate reduction by 9.40%/11.11% on MNIST dataset and 41.63%/29.58% on NSL-KDD dataset. Also, the aforementioned translating ability of the new method outperforms other optimizers by 33.06%. One of the main contributions of our work is verifying the feasibility and effectiveness of using the angle between consecutive gradients as a reliable metric in generating accurate new gradients. Angle-based measurements could be incorporated into existing algorithms to enhance the cost reduction and translating abilities.

Identifier

FIDC009555

ORCID

0000-0003-4208-5738

Previously Published In

Song, C., Pons, A, Yen, K. (2021), AG-SGD: Angle-based Stochastic Gradient Descent. IEEE Access.

Song, C., Pons, A, Yen, K. (2020), Optimizing Stochastic Gradient Descent Using the Angle Between Gradients. IEEE International Conference on Big Data.

Share

COinS
 

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).