Actionable Knowledge Extraction Framework for COVID-19

Date of this Version


Document Type





In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19) containing over 51,000 scholarly articles, including over 40,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. Medical professional including physicians frequently seek answers to specific questions to improve guidelines and decisions. The huge resource of medical literature is important sources to generate new insights that can help medical communities to provide relevant knowledge and overall fight against the infectious disease. There are ongoing attempts to develop intelligent systems to automatically extract relevant knowledge from many unstructured documents. In this paper, we propose an efficient question answering framework based on automatically analyzing thousands of articles to generate both long text answers (sections/ paragraphs) in response to the questions that are posed by medical communities. In the process of developing the framework, we explored natural language processing techniques like query expansion, data preprocessing, and vector space models early. We show the initial results of an example query answering for the incubation period.


Additional authors listed on article available for download