A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters
Date of Publication
2021 12:00 AM
Security Theme
Extreme Events
Keywords
Extreme Events, Sentimental analysis, K-means clustering, Latent Dirichlet allocation, Machine learning, Twitter, Natural disasters
Description
The success factor of sentimental analysis lies in identifying the most occurring and relevant opinions among users relating to the particular topic. In this paper, we develop a framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach. We choose TF-IDF and K-means for sentiment classification among affinitive and hierarchical clustering. Latent Dirichlet Allocation, a pipeline of Doc2Vec and K-means used to capture themes, then perform multi-level polarity indices classification and its time series analysis. In our study, we draw insights from 243,746 tweets for Kerala’s 2018 natural disasters in India. The key findings of the study are the classification of sentiments based on similarity and polarity indices and identifying themes among the topics discussed on Twitter. We observe different sets of emotions and influencers, among others. Through this case example of Kerala floods, it shows how the government and other organizations could track the positive/negative sentiments concerning time and location; gain a better understanding of the topic of discussion trending among the public, and collaborate with crucial Twitter users/influencers to spread and figure out the gaps in the implementation of schemes in terms of design and execution. This research’s uniqueness is the streamlined and efficient combination of algorithms and techniques embedded in the framework used in achieving the above output, which can be integrated into a platform with GUI for further automation.
A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters
The success factor of sentimental analysis lies in identifying the most occurring and relevant opinions among users relating to the particular topic. In this paper, we develop a framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach. We choose TF-IDF and K-means for sentiment classification among affinitive and hierarchical clustering. Latent Dirichlet Allocation, a pipeline of Doc2Vec and K-means used to capture themes, then perform multi-level polarity indices classification and its time series analysis. In our study, we draw insights from 243,746 tweets for Kerala’s 2018 natural disasters in India. The key findings of the study are the classification of sentiments based on similarity and polarity indices and identifying themes among the topics discussed on Twitter. We observe different sets of emotions and influencers, among others. Through this case example of Kerala floods, it shows how the government and other organizations could track the positive/negative sentiments concerning time and location; gain a better understanding of the topic of discussion trending among the public, and collaborate with crucial Twitter users/influencers to spread and figure out the gaps in the implementation of schemes in terms of design and execution. This research’s uniqueness is the streamlined and efficient combination of algorithms and techniques embedded in the framework used in achieving the above output, which can be integrated into a platform with GUI for further automation.