Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
University Library, University of Illinois at Urbana-Champaign
Toggle navigation
Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
<
1
2
3
4
5
6
7
8
9
…
21
22
>
25 per page
50 per page
Show All
Displaying datasets 1 - 25 of 550 in total
Clear Filters
Generate Report from Search Results
Subject Area
Life Sciences (292)
Social Sciences (123)
Physical Sciences (78)
Technology and Engineering (49)
Uncategorized (7)
Arts and Humanities (1)
Funder
U.S. National Science Foundation (NSF) (164)
Other (159)
U.S. Department of Energy (DOE) (56)
U.S. National Institutes of Health (NIH) (53)
U.S. Department of Agriculture (USDA) (30)
Illinois Department of Natural Resources (IDNR) (12)
U.S. National Aeronautics and Space Administration (NASA) (5)
U.S. Geological Survey (USGS) (5)
Illinois Department of Transportation (IDOT) (3)
U.S. Army (2)
Publication Year
2022 (111)
2021 (108)
2020 (96)
2019 (72)
2018 (59)
2023 (39)
2017 (35)
2016 (30)
License
CC0 (314)
CC BY (220)
custom (16)
published: 2023-06-01
Trapp, Robert (2023): tornado-PGW. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4479773_V1
published: 2023-06-01
Pan, Chao; Peng, Jianhao; Chien, Eli; Milenkovic, Olgica (2023): Embedded dataset in Poincare Balls. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6901251_V1
This dataset contains four real-world sub-datasets with data embedded into Poincare ball models, including Olsson's single-cell RNA expression data, CIFAR10, Fashion-MNIST and mini-ImageNet. Each sub-dataset has two corresponding files: one is the data file, the other one is the pre-computed reference points for each class in the sub-dataset. Please refer to our paper (https://arxiv.org/pdf/2109.03781.pdf) and codes (https://github.com/thupchnsky/PoincareLinearClassification) for more details.
keywords:
Hyperbolic space; Machine learning; Poincare ball models; Perceptron algorithm; Support vector machine
published: 2023-06-01
Storms, Suzanna (2023): RT-LAMP as diagnostic tool for Influenza-A Virus detection in swine. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2079467_V1
Results of RT-LAMP reactions for influenza A virus diagnostic development.
keywords:
swine influenza; LAMP; gBlock
published: 2023-05-30
Clem, C. Scott; Hart, Lily V.; McElrath, Thomas C. (2023): Primary Occurrence Data for "Clem, Hart, & McElrath. 2023. A century of Illinois hover flies (Diptera: Syrphidae): Museum and citizen science data reveal recent range expansions, contractions, and species of potential conservation significance". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1613645_V1
Primary occurrence data for Clem, Hart, & McElrath. 2023. A century of Illinois hover flies (Diptera: Syrphidae): Museum and citizen science data reveal recent range expansions, contractions, and species of potential conservation significance. Included are a license.txt file, the cleaned occurrences from each of the six merged datasets, and a cleaned, merged dataset containing all occurrence records in one spreadsheet, formatted according to Darwin Core standards, with a few extra fields such as GBIF identifiers that were included in some of the original downloads.
keywords:
csv; occurrences; syrphidae; hover flies; flies; biodiversity; darwin core; darwin-core; GBIF; citizen science; iNaturalist
published: 2023-03-08
Majeed, Fahd; Khanna, Madhu (2023): Code and Data for "Carbon Mitigation Payments Can Reduce the Riskiness of Bioenergy Crop Production". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6296964_V1
A stochastic domination analysis model was developed to examine the effect that emerging carbon markets can have on the spatially varying returns and risk profiles of bioenergy crops relative to conventional crops. The code is written in MATLAB, and includes the calculated output. See the README file for instructions to run the code.
keywords:
bioenergy crops; economic modeling; stochastic domination analysis model;
published: 2019-10-23
Ouldali, Hadjer; Sarthak, Kumar; Ensslen, Tobias; Piguet, Fabien; Manivet, Philippe; Pelta, Juan; Behrends, Jan C.; Aksimentiev, Aleksei; Oukhaled, Abdelghani (2019): Experiment and simulation raw data for "Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4905767_V1
Raw MD simulation trajectory, input and configuration files, SEM current data, and experimental raw data accompanying the publication, "Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore". README.md contains a description of all associated files.
keywords:
molecular dynamics; protein sequencing; aerolysin; nanopore sequencing
published: 2023-04-06
Yao, Lehan; Lyu, Zhiheng; Li, Jiahui; Chen, Qian (2023): Data for Unsupervised Sinogram Inpainting for Nanoparticle Electron Tomography (UsiNet) for missing wedge correction. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7963044_V1
Example data for https://github.com/chenlabUIUC/UsiNet The data contains computer simulated and experimental tilting series (or sinograms) of gold nanoparticles. Two training data examples are provided: 1. simulated_data.zip 2. experimental_data.zip In each zip folder, we include an image_data.zip and a training_data.zip. The former is for viewing and only the latter is needed for model training. For more details, please refer to our GitHub repository.
keywords:
electron tomography; deep learning
published: 2023-04-12
Towns, John; Hart, David (2023): XSEDE: Allocations Awards and Usage for the NSF Cyberfrastructure Portfolio, 2004-2022. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3731847_V1
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 through the XSEDE operational period, which ended August 31, 2022. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation. Because the XSEDE program has ended, the allocation_award_history file includes all allocations activity initiated via XSEDE processes through August 31, 2022. The Resource Providers and successor program to XSEDE agreed to honor all project allocations made during XSEDE. Thus, allocation awards that extend beyond the end of XSEDE may not reflect all activity that may ultimately be part of the project award. Similarly, allocation usage data only reflects usage reported through August 31, 2022, and may not reflect all activity that may ultimately be conducted by projects that were active beyond XSEDE.
keywords:
allocations; cyberinfrastructure; XSEDE
published: 2023-05-02
Larsen, Ryan; Stanke, Kayla L. ; Rund, Laurie; Leyshon, Brian; Louie, Allison; Steelman, Andrew (2023): Dataset for "Automated identification of piglet brain issue from MRI images using Region-Based Convolutional Neural Networks". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5784165_V1
This dataset includes structural MRI head scans of 32 piglets, at 28 days of age, scanned at the University of Illinois. The dataset also includes manually drawn brain masks of each of the piglets. The dataset also includes brain masks that were generated automatically using Region-Based Convolutional Neural Networks (Mask R-CNN), trained on the manually drawn brain masks.
keywords:
Brain extraction; Machine learning; MRI; Piglet; neural networks
published: 2023-01-05
Tonks, Adam (2023): Data for the paper "Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3628170_V1
This is the data used in the paper "Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data". A preprint may be found at https://doi.org/10.48550/arXiv.2212.11367 Code from the Github repository https://github.com/adtonks/mosquito_GNN can be used with the data here to reproduce the paper's results. v1.0.0 of the code is also archived at https://doi.org/10.5281/zenodo.7897830
keywords:
west nile virus; machine learning; gnn; mosquito; trap; graph neural network; illinois; geospatial
published: 2023-05-08
Stickley, Samuel; Fraterrigo, Jennifer (2023): Microclimate Species Distribution Models for Plethodontid Salamanders in Great Smoky Mountains National Park. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1549958_V1
This dataset includes microclimate species distribution models at a ~3 m2 spatial resolution and free-air temperature species distribution models at ~0.85 km2 spatial resolution for three plethodontid salamander species (Demognathus wrighti, Desmognathus ocoee, and Plethodon jordani) across Great Smoky Mountains National Park. We also include heatmaps representing the differences between microclimate and free-air species distribution models and polygon layers representing the fragmented habitat for each species' predicted range. All datasets include predictions for 2010, 2030, and 2050.
keywords:
Ecological niche modeling, microclimate, species distribution model, spatial resolution, range loss, suitable habitat, plethodontid salamanders, montane ecosystems
published: 2023-05-08
Bieber, John (2023): Dataset for Food availability influences angling vulnerability in muskellunge. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8452275_V1
Dataset for Food availability influences angling vulnerability in muskellunge
published: 2022-08-08
Shen, Chengze; Liu, Baqiao; Williams, Kelly; Warnow, Tandy (2022): Datasets for SALMA: Scalable ALignment using MAFFT-add. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1
This upload contains all datasets used in Experiments 2 and 3 of the SALMA paper (pending submission): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "SALMA: Scalable ALignment using MAFFT-Add". The zip file has the following structure (presented as an example): salma_paper_datasets/ |_README.md |_10aa/ |_crw/ |_homfam/ |_aat/ | |_... |_... |_het/ |_5000M2-het/ | |_... |_5000M3-het/ ... |_rec_res/ Generally, the structure can be viewed as: [category]/[dataset]/[replicate]/[alignment files] # Categories: 1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate. 2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM). 3. homfam: There are the 10 largest Homfam datasets, each with one replicate. 4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates. 5. rec\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper. # Alignment files There are at most 6 `.fasta` files in each sub-directory: 1. `all.unaln.fasta`: All unaligned sequences. 2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included. 3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences). 4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included. 5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences). 6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included. >If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing. >If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing. >If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing. # Additional file(s) 1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.
keywords:
SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published: 2023-05-02
Lee, Jou; Schneider, Jodi (2023): Crossref data for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9099305_V1
Tab-separated value (TSV) file. 14745 data rows. Each data row represents publication metadata as retrieved from Crossref (http://crossref.org) 2023-04-05 when searching for retracted publications. Each row has the following columns: Index - Our index, starting with 0. DOI - Digital Object Identifier (DOI) for the publication Year - Publication year associated with the DOI. URL - Web location associated with the DOI. Title - Title associated with the DOI. May be blank. Author - Author(s) associated with the DOI. Journal - Publication venue (journal, conference, ...) associated with the DOI RetractionYear - Retraction Year associated with the DOI. May be blank. Category - One or more categories associated with the DOI. May be blank. Our search was via the Crossref REST API and searched for: Update_type=( 'retraction', 'Retraction', 'retracion', 'retration', 'partial_retraction', 'withdrawal','removal')
keywords:
retraction; metadata; Crossref; RISRS
published: 2023-05-02
Lee, Jou; Schneider, Jodi (2023): Keywords for manual field assignment for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8847584_V1
We used these keywords to identify categories for journals and conferences not in Scopus, for our manuscript "Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science". These 4 text files each contains a keywords in the form: 'keyword1', 'keyword2', 'keyword3', ..... The file title indicates the name of the category: HealthScience.txt LifeScience.txt PhysicalScience.txt SocialScience.txt Each file was generated from a combination of software and manual review in an iterative process in which we: - First included keywords found using Yet Another Keyword Extractor <https://pypi.org/project/yake/> on the Scopus source list as of January 2023 <https://www.elsevier.com/?a=91122>. - Second, assigned journals and conferences to one or more categories, when they matched a keyword from that category list. - Reviewed uncategorized items to manually curate additional keywords in English and close cognates (e.g., Kardiologie). Titles in other languages or using terminology with multiple potential meanings were left uncategorized.
keywords:
scientometrics; field; keywords; science of science; meta-science; RISRS
published: 2023-04-06
Warnow, Tandy; Park, Minhyuk (2023): INDELible simulated datesets with sequence length heterogeneity. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0900513_V1
This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.
keywords:
sequence length heterogeneity;indelible;computational biology;multiple sequence alignment
published: 2023-04-19
Ferrer, Astrid (2023): Assembly of wood-inhabiting archaeal, bacterial and fungal communities along a salinity gradient: common taxa are broadly distributed but locally abundant in preferred habitats. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3000894_V1
Supplemental data sets for the Manuscript entitled " Assembly of wood-inhabiting archaeal, bacterial and fungal communities along a salinity gradient: common taxa are broadly distributed but locally abundant in preferred habitats"
keywords:
wood decomposition; aquatic fungi; aquatic bacteria; aquatic archaea; microbial succession; microbial life-history
published: 2023-04-05
Hartman, Jordan H. ; Tiemann, Jeremy S. ; Sherwood, Joshua L.; Willink, Philip W.; Ash, Kurt T. ; Davis, Mark A. ; Larson, Eric (2023): Data for "Eastern banded killifish (Fundulus diaphanus diaphanus) in Lake Michigan and connected watersheds: the invasion of a non-native subspecies". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9766947_V1
Data associated with the manuscript "Eastern banded killifish (Fundulus diaphanus diaphanus) in Lake Michigan and connected watersheds: the invasion of a non-native subspecies" by Jordan H. Hartman, Jeremy S. Tiemann, Joshua L. Sherwood, Philip W. Willink, Kurt T. Ash, Mark A. Davis, and Eric R. Larson. For this project, we sampled 109 locations in Lake Michigan and connected waters and found 821 total banded killifish. Using mitochondrial DNA analysis, we found 31 eastern and 25 western haplotypes which split our banded killifish into 422 eastern banded killifish and 398 western banded killifish. This dataset provides the sampling locations, banded killifish haplotypes, frequency of those haplotypes per location, accession numbers in GenBank, and the associated mitochondrial DNA sequences.
keywords:
intraspecific invasion; Lake Michigan; mtDNA; native transplant
published: 2023-04-12
Han, Edmund; Nahid, Shahriar Muhammad; Rakib, Tawfiqur; Nolan, Gillian; F. Ferrari, Paolo; Hossain, M. Abir ; Schleife, André ; Nam, SungWoo; Ertekin, Elif; van der Zande, Arend; Huang, Pinshane (2023): Data for Bend-induced ferroelectric domain walls in α-In2Se3. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1187822_V1
STEM images of kinks in α-In2Se3, DFT calculation of bending of α-In2Se3, PFM on as exfoliated and controllably bend α-In2Se3
published: 2023-04-02
Lee, Yuanyao; Khanna, Madhu; Chen, Luoye (2023): Code and Data for "Quantifying Uncertainties in Greenhouse Gas Savings and Mitigation Costs with Cellulosic Biofuels". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4326514_V1
Use of cellulosic biofuels from non-feedstocks are modeled using the BEPAM (Biofuel and Environmental Policy Analysis Model) model to quantifying the uncertainties about induced land use change effects, net greenhouse gas saving potential, and economic costs. The code is in GAMS, general algebraic modeling language. NOTE: Column 3 is titled "BAU" in "merged_BAU.gdx", "merged_RFS.gdx", and "merged_CEM.gdx", but contains "RFS" data in "merged_RFS.gdx" and "CEM" data in "merged_CEM.gdx".
keywords:
cellulosic biomass; BEPAM; economic modeling
published: 2023-03-27
Littlefield, Alexander; Xie, Dajie; Richards, Corey; Ocier, Christian; Gao, Haibo; Messinger, Jonah; Ju, Lawrence; Gao, Jingxing; Edwards, Lonna; Braun, Paul; Goddard, Lynford (2023): Data for Enabling High Precision Gradient Index Control in Subsurface Multiphoton Lithography. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3190140_V1
This dataset contains the full data used in the paper titled "Enabling High Precision Gradient Index Control in Subsurface Multiphoton Lithography," available at https://doi.org/10.1021/acsphotonics.2c01950 . The data used for Table 1 can be found in the dataset for the related Figure 8. Some supplemental figures' data can be found in the main figures data: Figure S2's data is contained in Figure 6. Figure S4 and Table S1 data is derived from Figure 6. Figure S9 is derived from Figure 7. Figure S10 is contained in Figure 7. Figure S12 is derived from Figure 6 and the Python code prism-fringe-analysis. Figures without a data file named after them do not have any data affiliated with them and are purely graphical representations.
published: 2023-03-30
Njuguna, Joyce; Clark, Lindsay; Lipka, Alexander; Anzoua, Kossonou; Bagmet, Larisa; Chebukin, Pavel; Dwiyanti, Maria; Dzyubenko, Elena; Dzyubenko, Nicolay; Ghimire, Bimal; Jin, Xiaoli; Johnson, Douglas; Kjeldsen, Jens; Nagano, Hironori; Oliverira, Ivone; Peng, Junhua; Petersen, Karen; Sabitov, Andrey; Seong, Eun; Yamada, Toshihiko; Yoo, Ji; Yu, Chang; Zhao, Hu; Munoz, Patricio; Long , Stephen; Sacks, Erik (2023): Impact of genotype-calling methodologies on genome-wide association and genomic prediction in polyploids. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4829913_V1
This dataset contains all data used in the paper "Impact of genotype-calling methodologies on genome-wide association and genomic prediction in polyploids". The dataset includes genotypes and phenotypic data from two autotetraploid species Miscanthus sacchariflorus and Vaccinium corymbosum that was used used for genome wide association studies and genomic prediction and the scripts used in the analysis.
keywords:
Polyploid; allelic dosage; Bayesian genotype-calling; Genome-wide association; Genomic prediction
published: 2023-03-28
Hsiao, Tzu-Kun; Torvik, Vetle (2023): OpCitance: Citation contexts identified from the PubMed Central open access articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4353270_V2
Sentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. <b>Files</b>: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE. • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs. • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable. • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component. • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-". • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence. • progression: Text progression of the citation context/sentence. <b>Supplementary Files</b> • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • intxt_cit_license_fromPMC.tsv – This file contains the CC licensing information for each article. The licensing information is from PMC's file lists [2], retrieved on June 19, 2020, and March 9, 2023. It should be noted that the license information for 189,855 PMCIDs is <b>NO-CC CODE</b> in the file lists, and 521 PMCIDs are absent in the file lists. The absence of CC licensing information does not indicate that the article lacks a CC license. For example, PMCID: 6156294 (<b>NO-CC CODE</b>) and PMCID: 6118074 (absent in the PMC's file lists) are under CC-BY licenses according to their PDF versions of articles. The intxt_cit_license_fromPMC.tsv file has two columns: • pmcid: PMCID of the article. • license: The article’s CC license information provided in PMC’s file lists. The value is nan when an article is not present in the PMC’s file lists. [2] https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/ • Supplementary_File_1.zip – This file contains the code for generating the dataset.
keywords:
citation context; in-text citation; inline citation; bibliometrics; science of science
published: 2023-03-24
Zhang, Jun (2023): Potential Impacts on Ozone and Climate from a Proposed Fleet of Supersonic Aircraft. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0038951_V1
This datasets provide basis of our analysis in the paper - Potential Impacts on Ozone and Climate from a Proposed Fleet of Supersonic Aircraft. All datasets here can be categorized into emission data and model output data (WACCM). All the model simulations (background and perturbation) were run to steady-state and only the datasets used in analysis are archived here.
keywords:
NetCDF; Supersonic aircraft; Stratospheric ozone; Climate
published: 2023-03-16
Park, Minhyuk; Tabatabaee, Yasamin; Warnow, Tandy; Chacko, George (2023): Data For Well-Connected Communities In Real Networks. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0908742_V1
Curated networks and clustering output from the manuscript: Well-Connected Communities in Real-World Networks https://arxiv.org/abs/2303.02813
keywords:
Community detection; clustering; open citations; scientometrics; bibliometrics