Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
Mark Finlayson
First Advisor's Committee Title
Committee Chair
Second Advisor's Name
Naphtali Rishe
Second Advisor's Committee Title
Committee co-chair
Third Advisor's Name
Shu-Ching Chen
Third Advisor's Committee Title
Committee member
Fourth Advisor's Name
Ning Xie
Fourth Advisor's Committee Title
Committee member
Fifth Advisor's Name
Armando Barreto
Fifth Advisor's Committee Title
Committee member
Keywords
Natural Language Processing, Document Understanding, Document Structure, Semantic Search, Semantics, Section Structure
Date of Defense
7-2-2020
Abstract
Modeling natural human behavior in understanding written language is crucial for developing true artificial intelligence. For people, words convey certain semantic concepts. While documents represent an abstract concept---they are collections of text organized in some logical structure, that is, sentences, paragraphs, sections, and so on. Similar to words, these document structures, are used to convey a logical flow of semantic concepts. Machines however, only view words as spans of characters and documents as mere collections of free-text, missing any underlying meanings behind words and the logical structure of those documents.
Automatic semantic concept detection is the process by which the underlying meanings of words are identified and retrieved. My thesis aims at bridging the semantic gap between automatic concept detection and logical document structure understanding. Academic search is the process of using specialized search engines or bibliographic databases to find academic articles, often involving highly specific academic concepts. It is more specialized than general web or database search and is a critical first step in any research project. Academic search has become increasing challenging in the past few decades as the academic literature has grown exponentially, with a proliferation of new venues and subfields which may contain relevant material and yet are unknown to even well-read researchers or scholars. In this doctoral dissertation, I demonstrate a framework I developed for detecting semantic concepts through modeling and integrating the logical document structure---precisely, the document section structure. Thus, given a set of documents, this framework aims at, identifying the unique section structure of those documents, and, later, using this structure in detecting implicit semantic concepts behind the documents' words, sentences, and sections.
Identifier
FIDC009155
ORCID
https://orcid.org/0000-0002-8486-6615
Recommended Citation
Banisakher, Deya, "Automatic Learning of Document Section Structure for Ontology-based Semantic Search" (2020). FIU Electronic Theses and Dissertations. 4478.
https://digitalcommons.fiu.edu/etd/4478
Included in
Artificial Intelligence and Robotics Commons, Computational Linguistics Commons, Other Computer Sciences Commons
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).