Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
Naphtali Rishe
First Advisor's Committee Title
Committee Chair
Second Advisor's Name
Maxim Chekmasov
Third Advisor's Name
Raimund Ege
Fourth Advisor's Name
Subbarao Wunnava
Fifth Advisor's Name
Nagarajan Prabakar
Date of Defense
10-23-2000
Abstract
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework.
Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing.
Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well.
This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.
Identifier
FI14051129
Recommended Citation
Beryoza, Dmitriy, "Dynamic data retrieval on the world wide web" (2000). FIU Electronic Theses and Dissertations. 1654.
https://digitalcommons.fiu.edu/etd/1654
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Comments
If you are the rightful copyright holder of this dissertation or thesis and wish to have it removed from the Open Access Collection, please submit a request to dcc@fiu.edu and include clear identification of the work, preferably with URL.