Document Type



Doctor of Philosophy (PhD)


Computer Science

First Advisor's Name

Tao Li

First Advisor's Committee Title

Major Professor

Second Advisor's Name

Sundaraja Sitharama Iyengar

Second Advisor's Committee Title

committee member

Third Advisor's Name

Jainendra K Navlakha

Third Advisor's Committee Title

committee member

Fourth Advisor's Name

Ning Xie

Fourth Advisor's Committee Title

committee member

Fifth Advisor's Name

Debra VanderMeer

Fifth Advisor's Committee Title

committee member


Patent Mining, Patent Analysis, Patent Retrieval, Patent Comparison, Query Generation

Date of Defense



Patent documents are important intellectual resources of protecting interests of individuals, organizations and companies. These patent documents have great research values, beneficial to the industry, business, law, and policy-making communities. Patent mining aims at assisting patent analysts in investigating, processing, and analyzing patent documents, which has attracted increasing interest in academia and industry. However, despite recent advances in patent mining, several critical issues in current patent mining systems have not been well explored in previous studies.

These issues include: 1) the query retrieval problem that assists patent analysts finding all relevant patent documents for a given patent application; 2) the patent documents comparative summarization problem that facilitates patent analysts in quickly reviewing any given patent documents pairs; and 3) the key patent documents discovery problem that helps patent analysts to quickly grasp the linkage between different technologies in order to better understand the technical trend from a collection of patent documents.

This dissertation follows the stream of research that covers the aforementioned issues of existing patent analysis and mining systems. In this work, we delve into three interleaved aspects of patent mining techniques, including (1) PatSearch, a framework of automatically generating the search query from a given patent application and retrieving relevant patents to user; (2) PatCom, a framework for investigating the relationship in terms of commonality and difference between patent documents pairs, and (3) PatDom, a framework for integrating multiple types of patent information to identify important patents from a large volume of patent documents.

In summary, the increasing amount and textual complexity of patent repository lead to a series of challenges that are not well addressed in the current generation systems. My work proposed reasonable solutions to these challenges and provided insights on how to address these challenges using a simple yet effective integrated patent mining framework.



dissertation.rar (18535 kB) (18516 kB)



Rights Statement

Rights Statement

In Copyright. URI:
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).