Authors

Chao ShenFollow

Document Type

Dissertation

Degree

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor's Name

Tao Li

First Advisor's Committee Title

Committee Chair

Second Advisor's Name

Shu-Ching Chen

Second Advisor's Committee Title

Committee Member

Third Advisor's Name

Debra VanderMeer

Third Advisor's Committee Title

Committee Member

Fourth Advisor's Name

Jinpeng Wei

Fourth Advisor's Committee Title

Committee Member

Fifth Advisor's Name

Bogdan Carbunar

Fifth Advisor's Committee Title

Committee Member

Keywords

Text Mining, Data Mining on Social Media, Information Retrieval

Date of Defense

10-31-2014

Abstract

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users.

Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs.

In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.

Identifier

FI14110776

dissertation.zip (20090 kB)
latex package

dissertation.tar.gz (2541 kB)
new version of dissertation.zip

Share

COinS
 

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).