Date of this Version
6-3-2014
Document Type
Article
Abstract
Background
Ontologies encode relationships within a domain in robust data structures that can be used to annotate data objects, including scientific papers, in ways that ease tasks such as search and meta-analysis. However, the annotation process requires significant time and effort when performed by humans. Text mining algorithms can facilitate this process, but they render an analysis mainly based upon keyword, synonym and semantic matching. They do not leverage information embedded in an ontology's structure.
Methods
We present a probabilistic framework that facilitates the automatic annotation of literature by indirectly modeling the restrictions among the different classes in the ontology. Our research focuses on annotating human functional neuroimaging literature within the Cognitive Paradigm Ontology (CogPO). We use an approach that combines the stochastic simplicity of naïve Bayes with the formal transparency of decision trees. Our data structure is easily modifiable to reflect changing domain knowledge.
Results
We compare our results across naïve Bayes, Bayesian Decision Trees, and Constrained Decision Tree classifiers that keep a human expert in the loop, in terms of the quality measure of the F1-mirco score.
Conclusions
Unlike traditional text mining algorithms, our framework can model the knowledge encoded by the dependencies in an ontology, albeit indirectly. We successfully exploit the fact that CogPO has explicitly stated restrictions, and implicit dependencies in the form of patterns in the expert curated annotations.
Creative Commons License
This work is licensed under a Creative Commons Attribution 2.0 License.
Recommended Citation
Chakrabarti et al.: Statistical algorithms for ontology-based annotation of scientific literature. Journal of Biomedical Semantics 2014 5(Suppl 1):S2.
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Comments
Originally published in Journal of Biomedical Semantics.