COVID 19 literature NLP models – Viral outbreak topic tuning

Koshoffer, Amy; Wu, Danny; Latessa, Jenny; Kannayyagar, Suraj; Luken, Sally; McCabe, Erin; Edgerton, Ezra; Washington, Dorcas; Lee, James; Powers, Margaret; Hagedorn, Philip

Dataset

COVID 19 literature NLP models – Viral outbreak topic tuning 开放存取 Deposited

Analytics

不可预览

下载文件

Date Uploaded: 10/30/2020
Date Modified: 11/05/2020

The data sets were derived from coronavirus related scientific literature using the CORD-19 dataset released by the Allen Institute of Artificial Intelligence as of July 14, 2020, using the Elasticsearch engine hosted by the Digital Scholarship Center (DSC). Through indexing the full-text and the metadata of the article corpus, the research team generated a full-corpus model and 7 different models corresponding to key viral outbreaks from the past several decades' coronaviruses (SARS-CoV, MERS-CoV, and SARS- CoV-2) and non-coronaviruses (HIV, Zika, H1N1, and Ebola). The targeted subsets of the articles used two or more occurrences of virus-specific keywords drawn from conventions established by the World Health Organization.

创建者

证书