Displaying datasets 1 - 25 of 114 in total

Subject Area

Social Sciences (114)
Life Sciences (0)
Physical Sciences (0)
Technology and Engineering (0)
Uncategorized (0)
Arts and Humanities (0)

Funder

U.S. National Science Foundation (NSF) (25)
U.S. National Institutes of Health (NIH) (24)
Other (23)
U.S. Department of Energy (DOE) (0)
U.S. Department of Agriculture (USDA) (0)
Illinois Department of Natural Resources (IDNR) (0)
U.S. National Aeronautics and Space Administration (NASA) (0)
U.S. Geological Survey (USGS) (0)
Illinois Department of Transportation (IDOT) (0)
U.S. Army (0)

Publication Year

2022 (25)
2020 (23)
2018 (22)
2021 (16)
2019 (15)
2016 (8)
2017 (5)
2023 (0)

License

CC BY (60)
CC0 (54)
custom (0)
published: 2022-01-10
 
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. <b>Additional Resources:</b> - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the <a href="https://docs.google.com/forms/d/e/1FAIpQLSf-J937V6I4sMSxQt7gR3SIbUASR26KXxqSurrkBvlF-CIQnQ/viewform?usp=pp_url"><b>Archer Access Request Form</b></a> so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the <a href="https://forms.gle/6eA2yJUGFMtj5swY7"><b>Archer User Feedback Form</b></a>. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this <a href="https://groups.webservices.illinois.edu/subscribe/123172"><b>form</b></a> to subscribe to it. <b>Citation Guidelines:</b> 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2022. Global News Index and Extracted Features Repository [codebook], v1.1.0. Champaign, IL: University of Illinois. Dec. 16. doi:10.13012/B2IDB-5649852_V3 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2022. Global News Index and Extracted Features Repository [database], v1.1.0. Champaign, IL: University of Illinois. Dec. 16. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V3
keywords: Cline Center; Cline Center for Advanced Social Research; political; social; political science; Global News Index; Archer; news; mass communication; journalism;
published: 2022-07-25
 
Related to the raw entity mentions, this dataset represents the effects of the data cleaning process and collates all of the entity mentions which were too ambiguous to successfully link to the NCBI's taxonomy identifier system.
keywords: synthetic biology; NERC data; species mentions, ambiguous entities
published: 2022-07-25
 
A set of species entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; species mentions
published: 2022-07-25
 
This dataset represents the results of manual cleaning and annotation of the entity mentions contained in the raw dataset (https://doi.org/10.13012/B2IDB-4950847_V1). Each mention has been consolidated and linked to an identifier for a matching concept from the NCBI's taxonomy database.
keywords: synthetic biology; NERC data; species mentions; cleaned data; NCBI TaxonID
published: 2022-07-25
 
This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4950847_V1) and collects entity mentions that were manually determined to be noisy, non-species entities.
keywords: synthetic biology; NERC data; species mentions, noisy entities
published: 2022-07-25
 
This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4950847_V1) for species entities and represents those that were determined to be species (i.e., were not noisy entities) but for which no corresponding concept could be found in the NCBI taxonomy database.
keywords: synthetic biology; NERC data; species mentions, not found entities
published: 2022-07-25
 
Related to the raw entity mentions (https://doi.org/10.13012/B2IDB-4163883_V1), this dataset represents the effects of the data cleaning process and collates all of the entity mentions which were too ambiguous to successfully link to the ChEBI ontology.
keywords: synthetic biology; NERC data; chemical mentions; ambiguous entities
published: 2022-07-25
 
A set of chemical entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; chemical mentions
published: 2022-07-25
 
A set of cell-line entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; cell-line mentions
published: 2022-07-25
 
This dataset represents the results of manual cleaning and annotation of the entity mentions contained in the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1). Each mention has been consolidated and linked to an identifier for a matching concept from the NCBI's taxonomy database.
keywords: synthetic biology; NERC data; chemical mentions; cleaned data; ChEBI ontology
published: 2022-07-25
 
This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1) and collects entity mentions that were manually determined to be noisy, non-chemical entities.
keywords: synthetic biology; NERC data; chemical mentions, noisy entities
published: 2022-07-25
 
This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4163883_V1) for checmical entities and represents those that were determined to be chemicals (i.e., were not noisy entities) but for which no corresponding concept could be found in the ChEBI ontology.
keywords: synthetic biology; NERC data; chemical mentions, not found entities
published: 2022-07-25
 
A set of gene and gene-related entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; gene mentions
published: 2021-05-10
 
This dataset contains data used in publication "Institutional Data Repository Development, a Moving Target" submitted to Code4Lib Journal. It is a tabular data file describing attributes of data files in datasets published in Illinois Data Bank 2016-04-01 to 2021-04-01.
keywords: institutional repository
published: 2022-07-11
 
This dataset was developed as part of an online survey study that explores student characteristics that may predict what one finds helpful in replies to requests for help posted to an online college course discussion forum. 223 college students enrolled in an introductory statistics course were surveyed on their sense of belonging to their course community, as well as how helpful they found 20 examples of replies to requests for help posted to a statistics course discussion forum.
keywords: help-giving; discussion forums; sense of belonging; college student
published: 2022-06-20
 
This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.
keywords: sentence-level parallel corpus; optical character recognition; OCR errors; Project Gutenberg; HathiTrust Digital Library; digital libraries; digital humanities;
published: 2022-07-01
 
The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (<a href="https://doi.org/10.1093/ije/dyv184">Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184</a>), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below. This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes. We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. <a href="https://github.com/infoqualitylab/Scopus_author_info_collection">https://github.com/infoqualitylab/Scopus_author_info_collection</a> ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports. We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016).
keywords: systematic reviews; evidence synthesis; network analysis; public health; salt controversy;
published: 2022-04-21
 
This dataset was created based on the publicly available microdata from PNS-2019, a national health survey conducted by the Instituto Brasileiro de Geografia e Estatistica (IBGE, Brazilian Institute of Geography and Statistics). IBGE is a federal agency responsible for the official collection of statistical information in Brazil – essentially, the Brazilian census bureau. Data on selected variables focusing on biopsychosocial domains related to pain prevalence, limitations and treatment are available. The Fundação Instituto Oswaldo Cruz has detailed information about the PNS, including questionnaires, survey design, and datasets (www.pns.fiocruz.br). The microdata can be found on the IBGE website (https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html?caminho=PNS/2019/Microdados/Dados).
keywords: back pain; health status disparities; biopsychosocial; Brazil
published: 2022-02-20
 
This dataset contains the files used to perform the work savings and recall evaluation in the study titled "Data from Testing a filtering strategy for systematic reviews: Evaluating work savings and recall."
keywords: systematic reviews; machine learning; work savings; recall; search results filtering
published: 2022-02-11
 
The data contains a list of articles given low score by the RCT Tagger and an error analysis of them, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews". Change made in this V3 is that the data is divided into two parts: - Error Analysis of 44 Low Scoring Articles with MEDLINE RCT Publication Type. - Error Analysis of 244 Low Scoring Articles without MEDLINE RCT Publication Type.
keywords: Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews
published: 2022-02-09
 
The data file contains a list of articles with PMIDs information, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".
keywords: Cochrane reviews; Randomized controlled trials; RCT; Automation; Systematic reviews
published: 2022-02-09
 
The data file contains a list of articles and their RCT Tagger prediction scores, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".
keywords: Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews
published: 2022-01-20
 
This dataset provides a 50-state (and DC) survey of state-level enterprise zone laws, including summaries and analyses of zone eligibility criteria, eligible investments, incentives to invest in human capital and affordable housing, and taxpayer eligibility.
keywords: Enterprise Zones; tax incentives; state law
published: 2022-01-20
 
This dataset provides a 50-state (and DC) survey of state-level tax credits modeled after the federal New Markets Tax Credit program, including summaries of the tax credit amount and credit periods, key definitions, eligibility criteria, application process, and degree of conformity to federal law.
keywords: New Markets Tax Credits; NMTC; tax incentives; state law