Document Type

Dissertation

Degree

Doctor of Philosophy (PhD)

Major/Program

Computer Science

First Advisor's Name

Mark Finlayson

First Advisor's Committee Title

Committee Chair

Second Advisor's Name

Naphtali Rishe

Second Advisor's Committee Title

Committee co-chair

Third Advisor's Name

Shu-Ching Chen

Third Advisor's Committee Title

Committee member

Fourth Advisor's Name

Ning Xie

Fourth Advisor's Committee Title

Committee member

Fifth Advisor's Name

Armando Barreto

Fifth Advisor's Committee Title

Committee member

Keywords

Natural Language Processing, Document Understanding, Document Structure, Semantic Search, Semantics, Section Structure

Date of Defense

7-2-2020

Abstract

Modeling natural human behavior in understanding written language is crucial for developing true artificial intelligence. For people, words convey certain semantic concepts. While documents represent an abstract concept---they are collections of text organized in some logical structure, that is, sentences, paragraphs, sections, and so on. Similar to words, these document structures, are used to convey a logical flow of semantic concepts. Machines however, only view words as spans of characters and documents as mere collections of free-text, missing any underlying meanings behind words and the logical structure of those documents.

Automatic semantic concept detection is the process by which the underlying meanings of words are identified and retrieved. My thesis aims at bridging the semantic gap between automatic concept detection and logical document structure understanding. Academic search is the process of using specialized search engines or bibliographic databases to find academic articles, often involving highly specific academic concepts. It is more specialized than general web or database search and is a critical first step in any research project. Academic search has become increasing challenging in the past few decades as the academic literature has grown exponentially, with a proliferation of new venues and subfields which may contain relevant material and yet are unknown to even well-read researchers or scholars. In this doctoral dissertation, I demonstrate a framework I developed for detecting semantic concepts through modeling and integrating the logical document structure---precisely, the document section structure. Thus, given a set of documents, this framework aims at, identifying the unique section structure of those documents, and, later, using this structure in detecting implicit semantic concepts behind the documents' words, sentences, and sections.

Identifier

FIDC009155

ORCID

https://orcid.org/0000-0002-8486-6615

Files over 15MB may be slow to open. For best results, right-click and select "Save as..."

Share

COinS
 

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).