Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
Tao Li
First Advisor's Committee Title
Committee Chair
Second Advisor's Name
Shu-Ching Chen
Second Advisor's Committee Title
Committee Member
Third Advisor's Name
Debra VanderMeer
Third Advisor's Committee Title
Committee Member
Fourth Advisor's Name
Jinpeng Wei
Fourth Advisor's Committee Title
Committee Member
Fifth Advisor's Name
Bogdan Carbunar
Fifth Advisor's Committee Title
Committee Member
Keywords
Text Mining, Data Mining on Social Media, Information Retrieval
Date of Defense
10-31-2014
Abstract
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users.
Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs.
In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Identifier
FI14110776
Recommended Citation
Shen, Chao, "Text Analytics of Social Media: Sentiment Analysis, Event Detection and Summarization" (2014). FIU Electronic Theses and Dissertations. 1739.
https://digitalcommons.fiu.edu/etd/1739
latex package
dissertation.tar.gz (2541 kB)
new version of dissertation.zip
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).