Illinois Data Bank Dataset Search Results
Results
published:
2023-06-29
Pandit, Akshay; Karakoc, Deniz Berfin; Konar, Megan
(2023)
This database provides estimates of agricultural and food commodity flows [in both tons and $US] between the US and China for the year 2017. Pairwise information is provided between US states and Chinese provinces, and US counties and Chinese provinces for 7 Standardized Classification of Transported Goods (SCTG) commodity categories. Additionally, crosswalks are provided to match Harmonized System (HS) codes and China's Multi-Regional Input Output (MRIO) commodity sectors to their corresponding SCTG commodity codes. The included SCTG commodities are:
- SCTG 01: Iive animals and fish
- SCTG 02: cereal grains
- SCTG 03: agricultural products (except for animal feed, cereal grains, and forage products)
- SCTG 04: animal feed, eggs, honey, and other products of animal origin
- SCTG 05: meat, poultry, fish, seafood, and their preparations
- SCTG 06: milled grain products and preparations, and bakery products
- SCTG 07: other prepared foodstuffs, fats and oils
For additional information, please see the related paper by Pandit et al. (2022) in Environmental Research Letters. ADD DOI WHEN RECEIVED
keywords:
Food flows; High-resolution; County-scale; Bilateral; United States; China
published:
2024-11-07
Fernandez-Materan, Francelys; Olivos-Caicedo, Kelly; Daniel, Steven; Walden, Kimberly; Fields, Christopher; Hernandez, Alvaro; Alves, Joao; Ridlon, Jason
(2024)
This dataset is part of a genome annoucement. The main folder PROKKA_results contain nine Prokka v.1.14.6 annotation files from nine Clostridium scindens genome sequences. Each file provide 12 output files including predicted protein sequences (.faa), nucleotide sequences of the predicted coding regions (.ffn), nucleotide sequence of the genome (.fna and .fsa), annotated genome in GenBank format (.gbk), steps recording performed during the annotation process (.log), error messages or warnings (.err), annotations in Sequin format (.sqn), summary of the annotations in tabular (.tbl), tab-separated values (.tsv) and plain text (.txt) formats.
keywords:
Clostridium scindens; genome annotation; PROKKA;
published:
2024-11-07
Zheng, Heng; Fu, Yuanxi; Vandel, Ellie; Schneider, Jodi
(2024)
This dataset consists of the 286 publications retrieved from Web of Science and Scopus on July 6, 2023 as citations for Willoughby et al., 2014:
Patrick H. Willoughby, Matthew J. Jansma, and Thomas R. Hoye (2014). A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042
We added the DOIs of the citing publications into a Zotero collection. Then we exported all 286 DOIs in two formats: a .csv file (data export) and an .rtf file (bibliography).
<b>Willoughby2014_286citing_publications.csv</b> is a Zotero data export of the citing publications.
<b>Willoughby2014_286citing_publications.rtf</b> is a bibliography of the citing publications, using a variation of the American Psychological Association style (7th edition) with full names instead of initials.
To create <b>Willoughby2014_citation_contexts.csv</b>, HZ manually extracted the paragraphs that contain a citation marker of Willoughby et al., 2014. We refer to these paragraphs as the citation contexts of Willoughby et al., 2014. Manual extraction started with 286 citing publications but excluded 2 publications that are not in English, those with DOIs 10.13220/j.cnki.jipr.2015.06.004 and 10.19540/j.cnki.cjcmm.20200604.201
The silver standard aimed to triage the citing publications of Willoughby et al., 2014 that are at risk of propagating unreliability due to a code glitch in a computational chemistry protocol introduced in Willoughby et al., 2014. The silver standard was created stepwise:
First one chemistry expert (YF) manually annotated the corpus of 284 citing publications in English, using their full text and citation contexts. She manually categorized publications as either at risk of propagating unreliability or not at risk of propagating unreliability, with a rationale justifying each category.
Then we selected a representative sample of citation contexts to be double annotated. To do this, MJS turned the full dataset of citation contexts (Willoughby2014_citation_contexts.csv) into word embeddings, clustered them using similarity measures using BERTopic's HDBS, and selected representative citation contexts based on the centroids of the clusters.
Next the second chemistry expert (EV) annotated the 77 publications associated with the citation contexts, considering the full text as well as the citation contexts.
<b>double_annotated_subset_77_before_reconciliation.csv</b> provides EV and YF's annotation before reconciliation.
To create the silver standard YF, EV, and JS discussed differences and reconciled most differences. YF and EV had principled reasons for disagreeing on 9 publications; to handle these, YF updated the annotations, to create the silver standard we use for evaluation in the remainder of our JCDL 2024 paper (<b>silver_standard.csv</b>)
<b>Inter_Annotator_Agreement.xlsx</b> indicates publications where the two annotators made opposite decisions and calculates the inter-annotator agreement before and after reconciliation together.
<b>double_annotated_subset_77_before_reconciliation.csv</b> provides EV and YF's annotation after reconciliation, including applying the reconciliation policy.
keywords:
unreliable cited sources; knowledge maintenance; citations; scientific digital libraries; scholarly publications; reproducibility; unreliability propagation; citation contexts
published:
2017-10-11
McEntee, Kenneth B.
(2017)
The International Registry of Reproductive Pathology Database is part of pioneering work done by Dr. Kenneth McEntee to comprehensively document thousands of disease cases studies. His large and comprehensive collection of case reports and physical samples was complimented by development of the International Registry of Reproductive Pathology Database in the 1980s. The original FoxPro Database files and a migrated access version were completed by the College of Veterinary Medicine in 2016. Access CSV files were completed by the University of Illinois Library in 2017.
keywords:
Animal Pathology; Databases; Veterinary Medicine
published:
2023-05-02
Lee, Jou; Schneider, Jodi
(2023)
Tab-separated value (TSV) file.
14745 data rows. Each data row represents publication metadata as retrieved from Crossref (http://crossref.org) 2023-04-05 when searching for retracted publications.
Each row has the following columns:
Index - Our index, starting with 0.
DOI - Digital Object Identifier (DOI) for the publication
Year - Publication year associated with the DOI.
URL - Web location associated with the DOI.
Title - Title associated with the DOI. May be blank.
Author - Author(s) associated with the DOI.
Journal - Publication venue (journal, conference, ...) associated with the DOI
RetractionYear - Retraction Year associated with the DOI. May be blank.
Category - One or more categories associated with the DOI. May be blank.
Our search was via the Crossref REST API and searched for:
Update_type=(
'retraction',
'Retraction',
'retracion',
'retration',
'partial_retraction',
'withdrawal','removal')
keywords:
retraction; metadata; Crossref; RISRS
published:
2024-09-16
Wu, Steven; Smith, Hannah
(2024)
This dataset describes an analysis of research documents about the debate between hydrogen fuel cells and
lithium-ion batteries within the context of electric vehicles.
To create this dataset, we first analyzed news articles on the topic of sustainable development. We searched for related science using keywords in Google Scholar. We then identified subtopics and selected one specific subtopic: electric vehicles. We started to identify positions and players about electric vehicles [1].
Within electric vehicles, we started searching in OpenAlex for a topic of reasonable size (about 300 documents) related to a scientific or technical debate. We narrowed to electric vehicles and batteries, then trained a cluster model [2] on OpenAlex’s keywords to develop some possible search queries, and chose one.
Our final search query (May 7, 2024) returned 301 document in OpenAlex:
Title & abstract includes: Electric Vehicle + Hydrogen + Battery
filter is Lithium-ion Battery Management in Electric Vehicle
We used a Python script and the Scopus API to find missing abstracts and DOIs [3].
To identify relevant documents, we used a combination of Abstractkr [4] and manual screening. As a starting point for Abstractkr [4], one person manually screened 200 documents by checking the abstracts for “hydrogen fuel cells” and “battery comparisons”. Then we used Abstractkr [4] to predict the relevance of the remaining documents based on the title, abstract, and keywords. The settings we used were single screening, ordered by most likely to be relevant, and 0 pilot size. We set a threshold of 0.6 for the predictions. After screening and predictions, 176 documents remained
keywords:
controversy mapping; sustainable development; evidence synthesis; OpenAlex; Abstrackr; Scopus; meta-analysis; electric vehicle; hydrogen fuel cells; battery
published:
2022-07-25
This dataset represents the results of manual cleaning and annotation of the entity mentions contained in the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1). Each mention has been consolidated and linked to an identifier for a matching concept from the NCBI's taxonomy database.
keywords:
synthetic biology; NERC data; chemical mentions; cleaned data; ChEBI ontology
published:
2022-07-25
This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4163883_V1) for checmical entities and represents those that were determined to be chemicals (i.e., were not noisy entities) but for which no corresponding concept could be found in the ChEBI ontology.
keywords:
synthetic biology; NERC data; chemical mentions, not found entities
published:
2024-11-01
Zhang, Ziliang; Eddy, William C.; Stuchiner, Emily R.; DeLucia, Evan H.; Yang, Wendy
(2024)
This dataset includes data on soil nitrous oxide fluxes, soil properties, and climate presented in the manuscript, "A conceptual model explaining spatial variation in soil nitrous oxide emissions in agricultural fields," published in Commucations Earth & Environment. Please refer to that publication for details about methodologies used to generate these data and for the experimental design.
keywords:
soil nitrous oxide emissions; gross nitrous oxide production; gross nitrous oxide consumption; N2O; denitrification; maize; cannon model
published:
2025-06-24
Ge, Jiankai; Weatherspoon, Howard; Peters, Baron
(2025)
This supporting information file contains codes related to pending publication Ge et al. Proc. Nat. Acad. Sci. USA, (revisions in review). The contents include a Mathematica code that solves the Laplace transformed equations and generates figures from the paper. A python code is included for generation of Figure 5 in the main text.
keywords:
Population balance model; Covalent organic framework; Nucleation; Growth;
published:
2022-07-25
A set of gene and gene-related entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords:
synthetic biology; NERC data; gene mentions
published:
2023-06-06
Korobskiy, Dmitriy; Chacko, George
(2023)
This dataset is derived from the COCI, the OpenCitations Index of Crossref open DOI-to-DOI references (opencitations.net). Silvio Peroni, David Shotton (2020). OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies, 1(1): 428-444. https://doi.org/10.1162/qss_a_00023 We have curated it to remove duplicates, self-loops, and parallel edges. These data were copied from the Open Citations website on May 6, 2023 and subsequently processed to produce a node list and an edge-list. Integer_ids have been assigned to the DOIs to reduce memory and storage needs when working with these data. As noted on the Open Citation website, each record is a citing-cited pair that uses DOIs as persistent identifiers.
keywords:
open citations; bibliometrics; citation network; scientometrics
published:
2024-04-19
Zhang, Yue; Zhao, Helin; Huang, Siyuan; Hossain, Mohhamad Abir; van der Zande, Arend
(2024)
Read me file for the data repository
*******************************************************************************
This repository has raw data for the publication "Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process Induced Strain". We arrange the data following the figure in which it first appeared. For all electrical transfer measurement, we provide the up-sweep and down-sweep data, with voltage units in V and conductance unit in S. All Raman modes have unit of cm^-1.
*******************************************************************************
How to use this dataset
All data in this dataset is stored in binary Numpy array format as .npy file.
To read a .npy file: use the Numpy module of the python language, and use np.load() command.
Example: suppose the filename is example_data.npy. To load it into a python program, open a Jupyter notebook, or in the python program, run:
import numpy as np
data = np.load("example_data.npy")
Then the example file is stored in the data object.
*******************************************************************************
published:
2025-08-08
Remmers, Justin J.; Allen, Maximilian; Green, Austin M.
(2025)
Count histories from camera traps and remotely sensed covariate data used in N-mixture modeling to assess the site use intensity of raccoons in Illinois.
published:
2024-05-29
Raghavan, Arjun; Romanelli, Marisa; Madhavan, Vidya
(2024)
Data from manuscript Atomic-Scale Visualization of a Cascade of Magnetic Orders in the Layered Antiferromagnet GdTe3, to be published in npj Quantum Materials. Powerpoint file has details on how the data can be opened and how the data are labeled.
keywords:
Scanning Tunneling Microscopy; Physics; GdTe3; Rare-Earth Tritellurides
published:
2025-02-08
Anne, Lahari; Park, Minhyuk; Warnow, Tandy; Chacko, George
(2025)
The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647).
The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/).
Input Networks:
The CEN can be downloaded from the Illinois Data Bank:
https://databank.illinois.edu/datasets/IDB-0908742 -> cen_pipeline.tar.gz -> S1_cen_cleaned.tsv
The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where
a - name of inspirational network, e.g., cit_hepph
b - the resolution value used when clustering a with the Leiden algorithm optimizing the Constant Potts Model, e.g., 0.01
c- the RECCS option used to approximate edge count and connectivity in the real world network, e.g., v1
Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14)
Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment.
Input Network : CEN
Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows:
cen_<resolution><cm_status><reccs_version>sample<replicate_id>.tsv
where:
cen – Indicates the network was modeled on the Curated Exosome Network (CEN).
resolution – The resolution parameter used in clustering the input network with Leiden-CPM (0.01).
cm_status – Either cm (CM-treated input clustering) or no_cm (input clustering without CM treatment).
reccs_version – The RECCS version used to generate the synthetic network (v1 or v2).
replicate_id – The specific replicate (ranging from 0 to 2 for each configuration).
For example:
cen_0.01_cm_v1_sample_0.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, CM-treated input, and generated using RECCSv1 (first replicate).
cen_0.01_no_cm_v2_sample_1.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, without CM treatment, and generated using RECCSv2 (second replicate).
The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz.
keywords:
Community Detection; Synthetic Networks; Stochastic Block Model (SBM);
published:
2020-06-03
Zhang, Jun; Wuebbles, Donald; Kinnison, Douglas; Baughcum, Steven
(2020)
This datasets provide basis of our analysis in the paper - Potential Impacts of Supersonic Aircraft on Stratospheric Ozone and Climate. All datasets here can be categorized into emission data and model output data (WACCM). All the model simulations (background and perturbation) were run to steady-state and only the datasets used in analysis are archived here.
keywords:
NetCDF; Supersonic aircraft; Stratospheric ozone; Climate
published:
2022-07-25
A set of cell-line entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords:
synthetic biology; NERC data; cell-line mentions
published:
2025-04-24
Includes two files (.csv) behind all analyses and results in the paper published with the same title. <b>1) 'sites.species.counts'</b> is the raw 2018-2022 data from Angella Moorehouse (Illinois Nature Preserves Commission) including her 456 identified pollinator species and her raw counts per site (there may be a few errors of identification or naming, and there will always be name changes over time). Headers in columns F through Q correspond to the remnant-site labels in Figure 1 and Table 1 of the paper. Columns R to AB are the “nonremnant” sites, which have not been uniquely labelled since the specific sites aren't referenced anywhere in the manuscript. <b>2) 'C.scores'</b> has the 265 species assigned empirical C values (empirical.C) along with the four sets of expert C values and their confidence ranks (low, medium, high), and the Illinois/Indiana conservation ranks (S-ranks), following the methods described in the paper.
Other headers in these files:
- taxa.code: four-letter abbreviation for genus and specific name
- genus: genus name
- species: specific epithet
- common.name: English name
- group: general pollinator taxa group
- empirical.C: empirically estimated conservatism score
- expert#.C: conservatism score assigned by each of four experts
- expert#.conf: expert's confidence in their conservatism score
Blank cells in the site-species abundance matrix indicates species absence (or non-detection)
Blank cells in C.scores.csv indicates missing S-ranks and unassigned C-scores (with associated missing confidence ranks) where experts lacked knowledge or confidence
keywords:
ecological conservatism; indicator values; pollinator conservation; prairie ecosystems; protected areas; remnant communities
published:
2025-11-19
Nahid, Shahriar Muhammad; Dong, Haiyue; Nolan, Gillian; Nam, Sungwoo; Mason, Nadya; Huang, Pinshane; van der Zande, Arend
(2025)
Room-temperature transfer curves; Benchmarking conductance; STEM images of charged domain walls; Temperature-dependent transfer curves; Scaling of conductance, hopping length, threshold voltage, trap density, and field-effect mobility with temperature; Magnetotransport data; Optical, AFM, and PFM image of different field-effect transistors; STEM images of contacts; Output and transfer curves of FETs; Temperature scaling of subthreshold swing and threshold voltage difference; Comparison of maximum field-effect mobility for different structures;
published:
2022-08-05
Liu, Baqiao; Shen, Chengze; Warnow, Tandy
(2022)
Simulated sequences provide a way to evaluate multiple sequence alignment (MSA) methods where the ground truth is exactly known. However, the realism of such simulated conditions often comes under question compared to empirical datasets. In particular, simulated data often does not display heterogeneity in the sequence lengths, a common feature in biological datasets. In order to imitate sequence length heterogeneity, we here present a set of data that are evolved under a mixture model of indel lengths, where indels have an occasional chance of being promoted to long indels (emulating large insertion/deletion events, e.g., domain-level gain/loss). This dataset is otherwise (e.g., in GTR parameters) analogous to the 1000M condition as presented in the SATe paper (doi: 10.1126/science.1171243) but with 5000 sequences and simulated with INDELible (http://abacus.gene.ucl.ac.uk/software/indelible/).
For more information, see README.txt. For the INDELible control files, see https://github.com/ThisBioLife/5000M-234-het.
keywords:
simulated data; sequence length heterogeneity; multiple sequence alignment;
published:
2023-07-28
Njuguna, Joyce; Clark, Lindsay; Lipka , Alexander; Anzoua, Kossonou; Bagmet, Larisa; Chebukin, Pavel; Dwiyanti, Maria; Dzyubenko, Elena; Dzyubenko, Nicolay; Ghimire, Bimal; Jin, Xiaoli; Johnson, Douglas; Nagano, Hironori; Peng, Junhua; Petersen, Karen; Sabitov, Andrey; Seong, Eun; Yamada, Toshihiko; Yoo, Ji; Yu, Chang; Zhao, Hu; Long, Stephen; Sacks, Erik
(2023)
The dataset is for a study conducted to understand genome-wide association (GWA) and genomic prediction of biomass yield and 14 yield-components traits in Miscanthus sacchariflorus. We evaluated a diversity panel with 590 accessions of M. sacchariflorus grown across four years in one subtropical and three temperate locations and genotyped with 268,109 single nucleotide polymorphisms (SNPs).
keywords:
Miscanthus sacchariflorus; genome-wide association analysis; genomic prediction; bioenergy; biomass
published:
2025-09-30
Huber, George; Guest, Jeremy; Santiago-Martinez, Leoncio; Bhagwat, Sarang; Kim, Min Soo
(2025)
This study advances the production of potassium sorbate (KS) from triacetic acid lactone (TAL) utilizing food-grade solvents, ethanol (EtOH) and isopropyl alcohol (IPA). We have previously demonstrated the route to produce KS from TAL in tetrahydrofuran (THF) as the main solvent, but the use of THF is associated with environmental and health risks especially for food applications. The process employs a catalytic approach in food-grade solvents and includes three main steps: hydrogenation, etherification and hydrolysis, and ring-opening hydrolysis to produce KS from TAL. In the synthesis of KS from TAL, the use of IPA leads to higher yields and reduced reaction times compared to EtOH. As a result, the overall reaction time in IPA was reduced to 35.7 h, compared to 42.1 h in our previous study using THF and EtOH, while achieving a comparable KS yield of 84% from TAL. The synthesized KS exhibits a trans-2, trans-4 geometrical configuration, identical to that of commercially available KS. Through techno-economic analysis (TEA) and life cycle assessment (LCA), we estimated full-scale production of KS from sugarcane with the developed process in IPA could achieve a minimum product selling price (MPSP) of $8.27 per kg with a range of $7.06–10.16 per kg [5th–95th percentiles from 6000 Monte Carlo simulations] and a carbon intensity (CI) of 13.7 [9.6–18.6] kg CO2-eq per kg. This study highlights the synthesis of KS from TAL using food-grade solvents, demonstrating improved economic viability and environmental sustainability compared to our previous research (MPSP of $9.68 per kg [$8.47–11.45 per kg] and CI of 16.2 [12.0–21.2] kg CO2-eq per kg), as the total required reaction decreases while achieving the comparable overall yield of KS from TAL.
keywords:
bioproducts; catalysis
published:
2021-11-04
Dawson, Matthew; Guzman Ruiz, Christian; Curtis, Jeffrey H.; Acosta, Mario C.; Zhu, Shupeng; Dabdub, Donald; Conley, Andrew; West, Matthew; Riemer, Nicole; Jorba, Oriol
(2021)
This dataset contains all the data for the results section in the study presented in the paper entitled "Chemistry Across Multiple Phases (CAMP) version 1.0: An integrated multi-phase chemistry mode" submitted to Geoscientific Model Development (GMD). In this paper, two sets of simulations were run to test CAMP with this results included here. This consists of (1) box model inputs and outputs presented in Section 4.2 for modal, binned and particle-resolved simulations to compare the application of identical chemical mechanisms to different aerosol representations and (2) the 3D Eulerian output presented in Section 4.3.
keywords:
Atmospheric chemistry; Aerosols and particles; Numerical Modeling
published:
2018-09-26
Cure, Anne; Calla, Bernarda; Berenbaum, May; Schuler, Mary
(2018)
Nucleotide sequences from wild parsnip CYP71AJ4 (angelic in synthase. <a href ="https://www.ncbi.nlm.nih.gov/nuccore/EF191021">Genbank EF191021</a>) were obtained by Sanger sequencing. Seeds from individual plants from different populations were harvested to obtain corresponding cDNA. The cDNA was cloned and directly sequenced. Aminoacid translations were obtained using standard codon usage. Alignments of CYP71AJ4 sequences (involved in angular furanocoumarin biosynthesis) with as the reference sequence. Consistent amino acid variabilities were found between some populations. The relationship between sequencing variability and selective pressure is not yet known.
keywords:
Pastinaca sativa; parsnip; furanocoumarins; psoralen