Displaying 51 - 75 of 135 in total

Datasets

published: 2022-07-25

Jett, Jacob (2022): SBKS - Species Noisy Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7146216_V1

This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4950847_V1) and collects entity mentions that were manually determined to be noisy, non-species entities.

keywords: synthetic biology; NERC data; species mentions, noisy entities

published: 2022-07-25

Jett, Jacob (2022): SBKS - Species Not Found Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5491578_V1

This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4950847_V1) for species entities and represents those that were determined to be species (i.e., were not noisy entities) but for which no corresponding concept could be found in the NCBI taxonomy database.

keywords: synthetic biology; NERC data; species mentions, not found entities

published: 2022-07-25

Jett, Jacob (2022): SBKS - Chemical Ambiguous Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2910468_V1

Related to the raw entity mentions (https://doi.org/10.13012/B2IDB-4163883_V1), this dataset represents the effects of the data cleaning process and collates all of the entity mentions which were too ambiguous to successfully link to the ChEBI ontology.

keywords: synthetic biology; NERC data; chemical mentions; ambiguous entities

published: 2022-07-25

Jett, Jacob (2022): SBKS - Chemical Raw Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4163883_V1

A set of chemical entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.

keywords: synthetic biology; NERC data; chemical mentions

published: 2022-07-25

Jett, Jacob (2022): SBKS - Chemical - Cleaned & Grounded Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3396059_V1

This dataset represents the results of manual cleaning and annotation of the entity mentions contained in the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1). Each mention has been consolidated and linked to an identifier for a matching concept from the NCBI's taxonomy database.

keywords: synthetic biology; NERC data; chemical mentions; cleaned data; ChEBI ontology

published: 2022-07-25

Jett, Jacob (2022): SBKS - Chemical Noisy Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7228767_V1

This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1) and collects entity mentions that were manually determined to be noisy, non-chemical entities.

keywords: synthetic biology; NERC data; chemical mentions, noisy entities

published: 2022-07-25

Jett, Jacob (2022): SBKS - Chemical Not Found Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4570128_V1

This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4163883_V1) for checmical entities and represents those that were determined to be chemicals (i.e., were not noisy entities) but for which no corresponding concept could be found in the ChEBI ontology.

keywords: synthetic biology; NERC data; chemical mentions, not found entities

published: 2022-07-25

Jett, Jacob (2022): SBKS - Genes Raw Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3887275_V1

A set of gene and gene-related entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.

keywords: synthetic biology; NERC data; gene mentions

published: 2021-05-10

Fallaw, Colleen (2021): Data for Institutional Data Repository Development, a Moving Target. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7291801_V1

This dataset contains data used in publication "Institutional Data Repository Development, a Moving Target" submitted to Code4Lib Journal. It is a tabular data file describing attributes of data files in datasets published in Illinois Data Bank 2016-04-01 to 2021-04-01.

keywords: institutional repository

published: 2022-07-25

Jett, Jacob (2022): SBKS - Celllines Raw Entity Mentions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8851803_V1

A set of cell-line entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.

keywords: synthetic biology; NERC data; cell-line mentions

published: 2022-07-11

Jeng, Amos; Bosch, Nigel; Perry, Michelle (2022): Data for: Sense of Belonging Predicts Perceived Helpfulness in Online Peer Help-Giving Interactions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2872989_V1

This dataset was developed as part of an online survey study that explores student characteristics that may predict what one finds helpful in replies to requests for help posted to an online college course discussion forum. 223 college students enrolled in an introductory statistics course were surveyed on their sense of belonging to their course community, as well as how helpful they found 20 examples of replies to requests for help posted to a statistics course discussion forum.

keywords: help-giving; discussion forums; sense of belonging; college student

published: 2022-06-20

Jiang, Ming; Dubnicek, Ryan; Worthey, Glen; Underwood, Ted; Downie, J. Stephen (2022): A Prototype Gutenberg-HathiTrust Sentence-level Parallel Corpus. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1685085_V1

This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.

keywords: sentence-level parallel corpus; optical character recognition; OCR errors; Project Gutenberg; HathiTrust Digital Library; digital libraries; digital humanities;

published: 2022-02-20

Proescholdt, Randi; Hsiao, Tzu-Kun; Schneider, Jodi; Cohen, Aaron; McDonagh, Marian; Smalheiser, Neil (2022): Data from Testing a filtering strategy for systematic reviews: Evaluating work savings and recall. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9257002_V1

This dataset contains the files used to perform the work savings and recall evaluation in the study titled "Data from Testing a filtering strategy for systematic reviews: Evaluating work savings and recall."

keywords: systematic reviews; machine learning; work savings; recall; search results filtering

published: 2022-02-11

Hoang, Khanh Linh; Schneider, Jodi; Kansara, Yogeshwar (2022): Error Analysis. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3407079_V3

The data contains a list of articles given low score by the RCT Tagger and an error analysis of them, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews". Change made in this V3 is that the data is divided into two parts: - Error Analysis of 44 Low Scoring Articles with MEDLINE RCT Publication Type. - Error Analysis of 244 Low Scoring Articles without MEDLINE RCT Publication Type.

keywords: Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews

published: 2022-02-09

Kansara, Yogeshwar; Hoang, Khanh Linh (2022): RCT Tagger Results. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6773581_V3

The data file contains a list of articles and their RCT Tagger prediction scores, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".

keywords: Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews

published: 2022-02-09

Kansara, Yogeshwar; Hoang, Khanh Linh (2022): Articles With PubMed Identifiers. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4623305_V3

The data file contains a list of articles with PMIDs information, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".

keywords: Cochrane reviews; Randomized controlled trials; RCT; Automation; Systematic reviews

published: 2022-02-04

Addepalli, Amulya; Ann Subin, Karen; Schneider, Jodi (2022): Dataset for Testing the Keystone Framework by Analyzing Positive Citations to Wakefield's 1998 Paper. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2532850_V1

keywords: retracted papers; knowledge maintenance; keystone citations, Wakefield; misinformation in science; Information Quality Lab

published: 2022-01-20

Layser, Michelle (2022): Multi-State Survey of State Enterprise Zone Laws (Last Updated Jan. 20, 2022). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8986969_V1

This dataset provides a 50-state (and DC) survey of state-level enterprise zone laws, including summaries and analyses of zone eligibility criteria, eligible investments, incentives to invest in human capital and affordable housing, and taxpayer eligibility.

keywords: Enterprise Zones; tax incentives; state law

published: 2022-01-20

Layser, Michelle (2022): Multi-State Survey of State New Markets Tax Credit Laws (Last Updated Jan. 19, 2022). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6263002_V1

This dataset provides a 50-state (and DC) survey of state-level tax credits modeled after the federal New Markets Tax Credit program, including summaries of the tax credit amount and credit periods, key definitions, eligibility criteria, application process, and degree of conformity to federal law.

keywords: New Markets Tax Credits; NMTC; tax incentives; state law

published: 2022-01-14

Layser, Michelle (2022): Multi-State Survey of State Opportunity Zones Laws (Last Updated Jan. 14, 2022). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4303513_V1

This dataset provides a 50-state (and DC) survey of state-level Opportunity Zones laws, including summaries of states' Opportunity Zone tax preferences, supplemental tax preferences, and approach to Opportunity Zones conformity. Data was last updated on January 14, 2022.

keywords: Opportunity Zones; tax incentives; state law

published: 2021-11-05

Keralis, Spencer D. C.; Yakin, Syamil (2021): Becoming A Trans Inclusive Library - Library Employee Survey. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0888551_V1

This data set contains survey results from a 2021 survey of University of Illinois University Library employees conducted as part of the Becoming A Trans Inclusive Library Project to evaluate the awareness of University of Illinois faculty, staff, and student employees regarding transgender identities, and to assess the professional development needs of library employees to better serve trans and gender non-conforming patrons. The survey instrument is available in the IDEALS repository: http://hdl.handle.net/2142/110080.

keywords: transgender awareness, academic library, gender identity awareness, professional development opportunities

published: 2021-11-05

Keralis, Spencer D. C.; Yakin, Syamil (2021): Becoming A Trans Inclusive Library - Patron Survey. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5994799_V1

This data set contains survey results from a 2021 survey of University of Illinois University Library patrons who identify as transgender or gender non-conforming conducted as part of the Becoming a Trans Inclusive Library Project to assess the experiences of transgender patrons seeking information and services in the University Library. Survey instruments are available in the IDEALS repository: http://hdl.handle.net/2142/110081.

keywords: transgender awareness; academic library; gender identity awareness; patron experience

published: 2021-08-05

Lotspeich-Yadao, Michael (2021): State of Illinois - Common Spatial Geodatabase for the Social Sciences. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4857915_V1

This geodatabase serves two purposes: 1) to provide State of Illinois agencies with a fast resource for the preparation of maps and figures that require the use of shape or line files from federal agencies, the State of Illinois, or the City of Chicago, and 2) as a start for social scientists interested in exploring how geographic information systems (whether this is data visualization or geographically weighted regression) can bring new meaning to the interpretation of their data. All layer files included are relevant to the State of Illinois. Sources for this geodatabase include the U.S. Census Bureau, U.S. Geological Survey, City of Chicago, Chicago Public Schools, Chicago Transit Authority, Regional Transportation Authority, and Bureau of Transportation Statistics.

keywords: State of Illinois; City of Chicago; Chicago Public Schools; GIS; Statistical tabulation areas; hydrography

published: 2021-07-30

Proescholdt, Randi (2021): RISRS Retraction Review - Field Variation Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2070560_V1

This data comes from a scoping review associated with the project called Reducing the Inadvertent Spread of Retracted Science. The data summarizes the fields that have been explored by existing research on retraction, a list of studies comparing retraction in different fields, and a list of studies focused on retraction of COVID-19 articles.

keywords: retraction; fields; disciplines; research integrity

published: 2021-07-22

Hsiao, Tzu-Kun; Schneider, Jodi (2021): Dataset for "Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8255619_V2

This dataset includes five files. Descriptions of the files are given as follows: FILENAME: PubMed_retracted_publication_full_v3.tsv - Bibliographic data of retracted papers indexed in PubMed (retrieved on August 20, 2020, searched with the query "retracted publication" [PT] ). - Except for the information in the "cited_by" column, all the data is from PubMed. - PMIDs in the "cited_by" column that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a retracted paper. There are 7,813 retracted papers. COLUMN HEADER EXPLANATIONS 1) PMID - PubMed ID 2) Title - Paper title 3) Authors - Author names 4) Citation - Bibliographic information of the paper 5) First Author - First author's name 6) Journal/Book - Publication name 7) Publication Year 8) Create Date - The date the record was added to the PubMed database 9) PMCID - PubMed Central ID (if applicable, otherwise blank) 10) NIHMS ID - NIH Manuscript Submission ID (if applicable, otherwise blank) 11) DOI - Digital object identifier (if applicable, otherwise blank) 12) retracted_in - Information of retraction notice (given by PubMed) 13) retracted_yr - Retraction year identified from "retracted_in" (if applicable, otherwise blank) 14) cited_by - PMIDs of the citing papers. (if applicable, otherwise blank) Data collected from iCite. 15) retraction_notice_pmid - PMID of the retraction notice (if applicable, otherwise blank) FILENAME: PubMed_retracted_publication_CitCntxt_withYR_v3.tsv - This file contains citation contexts (i.e., citing sentences) where the retracted papers were cited. The citation contexts were identified from the XML version of PubMed Central open access (PMCOA) articles. - This is part of the data from: Hsiao, T.-K., & Torvik, V. I. (manuscript in preparation). Citation contexts identified from PubMed Central open access articles: A resource for text mining and citation analysis. - Citation contexts that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a citation context associated with one retracted paper that's cited. - In the manuscript, we count each citation context once, even if it cites multiple retracted papers. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) year - Publication year of the citing paper 4) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, tbl_fig_caption = tables and table/figure captions) 5) IMRaD - IMRaD section of the citation context (I = Introduction, M = Methods, R = Results, D = Discussions/Conclusion, NoIMRaD = not identified) 6) sentence_id - The ID of the citation context in a given location. For location information, please see column 4. The first sentence in the location gets the ID 1, and subsequent sentences are numbered consecutively. 7) total_sentences - Total number of sentences in a given location 8) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 9) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 10) citation - The citation context 11) progression - Position of a citation context by centile within the citing paper. 12) retracted_yr - Retraction year of the retracted paper 13) post_retraction - 0 = not post-retraction citation; 1 = post-retraction citation. A post-retraction citation is a citation made after the calendar year of retraction. FILENAME: 724_knowingly_post_retraction_cit.csv (updated) - The 724 post-retraction citation contexts that we determined knowingly cited the 7,813 retracted papers in "PubMed_retracted_publication_full_v3.tsv". - Two citation contexts from retraction notices have been excluded from analyses. ROW EXPLANATIONS - Each row is a citation context. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) pub_type - Publication type collected from the metadata in the PMCOA XML files. 4) pub_type2 - Specific article types. Please see the manuscript for explanations. 5) year - Publication year of the citing paper 6) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, table_or_figure_caption = tables and table/figure captions) 7) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 8) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 9) citation - The citation context 10) retracted_yr - Retraction year of the retracted paper 11) cit_purpose - Purpose of citing the retracted paper. This is from human annotations. Please see the manuscript for further information about annotation. 12) longer_context - A extended version of the citation context. (if applicable, otherwise blank) Manually pulled from the full-texts in the process of annotation. FILENAME: Annotation manual.pdf - The manual for annotating the citation purposes in column 11) of the 724_knowingly_post_retraction_cit.tsv. FILENAME: retraction_notice_PMID.csv (new file added for this version) - A list of 8,346 PMIDs of retraction notices indexed in PubMed (retrieved on August 20, 2020, searched with the query "retraction of publication" [PT] ).

keywords: citation context; in-text citation; citation to retracted papers; retraction

Subject Area

Funder

Publication Year

License

Datasets