Illinois Data Bank Dataset Search Results
Results
published:
2023-03-16
Park, Minhyuk; Tabatabaee, Yasamin; Warnow, Tandy; Chacko, George
(2023)
Curated networks and clustering output from the manuscript: Well-Connected Communities in Real-World Networks https://arxiv.org/abs/2303.02813
keywords:
Community detection; clustering; open citations; scientometrics; bibliometrics
published:
2024-02-16
Mohasel Arjomandi, Hossein; Korobskiy, Dmitriy; Chacko, George
(2024)
This dataset contains five files. (i) open_citations_jan2024_pub_ids.csv.gz, open_citations_jan2024_iid_el.csv.gz, open_citations_jan2024_el.csv.gz, and open_citation_jan2024_pubs.csv.gz represent a conversion of Open Citations to an edge list using integer ids assigned by us. The integer ids can be mapped to omids, pmids, and dois using the open_citation_jan2024_pubs.csv and open_citations_jan2024_pub_ids.scv files. The network consists of 121,052,490 nodes and 1,962,840,983 edges. Code for generating these data can be found https://github.com/chackoge/ERNIE_Plus/tree/master/OpenCitations.
(ii) The fifth file, baseline2024.csv.gz, provides information about the metadata of PubMed papers. A 2024 version of PubMed was downloaded using Entrez and parsed into a table restricted to records that contain a pmid, a doi, and has a title and an abstract. A value of 1 in columns indicates that the information exists in metadata and a zero indicates otherwise. Code for generating this data: https://github.com/illinois-or-research-analytics/pubmed_etl. If you use these data or code in your work, please cite https://doi.org/10.13012/B2IDB-5216575_V1.
keywords:
PubMed
published:
2025-08-16
Park, Minhyuk; Lamy, João AC; Rodrigues, Esther CC; Ferreira, Felipe Mariano; Vu-Le, The-Anh; Warnow, Tandy; Chacko, George
(2025)
The data within consist of compressed output files in the form of edgelists (*.edgelist.gz) and nodelists (*.aux.parquet) from large citation network simulations using an agent-based model. The code and instructions are available at: <a href="https://github.com/illinois-or-research-analytics/SASCA">https://github.com/illinois-or-research-analytics/SASCA</a>. In addition, we provide a distribution of citation frequencies drawn from a random sample of PubMed journal articles (pooled_50k_pubmed_unique.csv) and a table of recencies- the frequency with which citations are made to the previous year, the year before that and so on (recency_probs_percent_stahl_filled.csv). A manuscript describing the SASCA-s simulator has been submitted for review and will be referenced in a future version of this data repository if it is accepted. The prefixes sj and er refer to the real world and Erdos-Renyi random graph respectively that were used to initiate simulations. These 'seed' networks are available from the Github site referenced above.
keywords:
benchmark networks; agent-based models; simulation; citation
published:
2025-08-17
These codes implement the master equation microkinetic modeling (ME-MKM) calculations of Adams et al. (J. Phys. Chem. C 2025, 129, 15, 7285–7294), as well as the automatic derivatives for activation energies and reaction orders in their follow-up work (in review).
keywords:
Microkinetic model; master equation; periodic tiling; catalysis; adsorption;
published:
2025-09-08
Zinnen, Jack; Chase, Marissa; Charles, Brian; Harmon-Threatt, Alexandra; Matthews, Jeffrey
(2025)
This is the data set for the article entitled "Pollinator seed mixes are phenologically dissimilar to prairie remnants," a manuscript pending publication in Restoration Ecology. This represents the core phenology data of prairie remnant and pollinator seed mixes that were used for the main analyses. Note that additional data associated with the manuscript are intended to be published as a supplement in the journal.
* In this V2, a second tab was added to the Rest.Ecol.data.xlsx file. This new sheet listed original data source citations that match the RELIX data base, a sister project.
keywords:
native plants; ecological restoration; tallgrass prairie; native plant materials
published:
2025-09-08
Lee, DoKyoung; Heaton, Emily; Umar, Muhammad; Jang, Chunhwa; Namoi, Nictor
(2025)
Purpose-grown perennial herbaceous species are nonfood crops specifically cultivated for bioenergy production and have the potential to secure bioenergy feedstock resources while enhancing ecosystem services. This study assessed soil greenhouse gas emissions (CO2 and N2O), nitrate (NO3-N) leaching reduction potential, evapotranspiration (ET), and water-use efficiency (WUE) of bioenergy switchgrass (Panicum virgatum L.) in comparison to corn (Zea mays L.). The study was conducted on field-scale plots in Urbana, IL, during the 2020–2022 growing seasons. Switchgrass was established in 2020 and urea-fertilized at 56 kg N ha−1 year−1. Corn management followed best management practices for the US Midwest, including no-till and 202 kg N ha−1 year−1 fertilization, applied as urea–ammonium nitrate (32%). Our results showed lower direct N2O emissions in switchgrass compared to corn. Although soil CO2 emissions did not differ significantly during the establishment year, emissions in subsequent years were over 50% higher in switchgrass than in corn, likely due to increased belowground biomass, which was over five times higher in switchgrass. Nitrate-N leaching decreased as the switchgrass stand matured, reaching 80% lower than in corn by the third year. Differences in ET and WUE between corn and switchgrass were not significant; however, results indicate a trend toward reduced WUE in switchgrass under drought, driven by lower aboveground biomass production. Our study demonstrates that switchgrass can be implemented at a commercial scale without negatively impacting the hydrological cycle, while potentially reducing N losses through nitrate-N leaching and soil N2O emissions, and enhancing belowground C storage.
keywords:
field data; perennial bioenergy grasses; soil; switchgrass
published:
2025-09-08
Singh, Vijay; Raj, Tirath
(2025)
Miscanthus x giganteus (Mxg) is a promising perennial crop for producing natural colorants, renewable fuels, and bioproducts. However, natural recalcitrance and high pretreatment cost are major barriers to their complete conversion. In this study, a green processing method has been investigated for efficient recovery of natural pigments (anthocyanins), fermentable sugars, and pure lignin from Mxg genotypes using choline chloride-based natural deep eutectic solvents (NADES) systems. Interestingly, choline chloride: lactic acid (ChCl: LA) NADES-processed biomass resulted in 67.8 ± 2.1 μg g−1 of anthocyanins from dry biomass. A maximum of 87.4%–94.1% glucose yield was achieved after enzymatic saccharification. The effective extraction of lignin with high purity with higher β-aryl ether (βO4) bonds from advanced crops is crucial for lignin valorization. Notably, highly pure lignin (≈93.4% ± 1.4%) is achieved after low-temperature NADES pretreatment while retaining lignin’s native structure. 31P nuclear magnetic resonance demonstrated that total phenolics for ChCl: LA-lignin resulted in 1.20 mmol g−1 hydroxyls. The relative monolignol composition of syringyl (S), guaiacyl (G), and p-hydroxyphenyl (H) is 19.0, 65.7, and 14.3%, respectively, as evidenced by heteronuclear single quantum coherence analysis. This study provides a novel approach for obtaining high-purity lignin for catalytic depolymerization for oligomers and bifunctional monoaromatics production and leverages current cellulosic biorefinery technologies.
keywords:
biomass analytics; feedstock bioprocessing; inter-brc; miscanthus
published:
2025-09-06
4D-STEM datasets for solution-treated (CrCoNi)93Al4Ti2Nb MEA in [111], [112], and [114] zone. Data used for Ultramicroscopy article "Differentiating electron diffuse scattering via 4D-STEM spatial fluctuation and correlation analysis in complex FCC alloys". Experiment details can be found in the paper. Data-specific details are listed in the Readme file.
keywords:
4D-STEM; MEA; Electron Diffuse-Scattering; FluCor
published:
2025-08-01
Beach, Cheyenne R.; Koop, Jennifer A.H.; Fournier, Auriel M.V.
(2025)
Data from the 2025 publication in the Wilson Journal of Ornithology with the same name.
keywords:
Lesser Scaup; Waterfowl; Transmitter Effects
published:
2025-05-27
Rani, Sonia; Cao, Xi; Baptista, Alejandro E.; Hoffmann, Axel; Pfaff, Wolfgang
(2025)
This dataset contains all raw and processed data used to generate the figures in the main text and supplementary material of the paper "High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit." The data can be used to reproduce the plots and validate the analysis. Accompanying Jupyter notebooks provide step-by-step analysis pipelines for figure generation. The dataset also includes drawings for the mechanical samples used to perform the experiment. In addition, the dataset provides ANSYS HFSS electromagnetic simulation files used to design and analyze the resonator structures and estimate field distributions.
keywords:
superconducting qubit; magnon sensing; hybrid quantum systems; spin-photon coupling; magnon decay; cavity QED
published:
2025-06-26
Zhang, Ruolin; Kontou, Eleftheria
(2025)
This dataset supports the analysis presented in the study on curbside electric vehicle (EV) charging infrastructure planning in San Francisco and the published paper titled "Urban electric vehicle infrastructure: Strategic planning for curbside charging." It includes spatial data layers and tabular data used to evaluate location suitability under multiple criteria, such as demand, accessibility, and environmental benefits. This dataset can be used to replicate the multi-criteria decision-making framework, perform additional spatial analyses, or inform policy decisions related to EV infrastructure siting in urban environments.
keywords:
Electric Vehicles; Curbside Charging Stations; Multi-Criteria Decision-Making; Suitability Analysis; Urban Infrastructure
published:
2025-07-23
Dalling, James William
(2025)
Supplementary data and code associated with the Biogeosciences paper published by Cecilia Prada et al. "Soil and Biomass Carbon Storage is Much Higher in Central American than Andean Montane Forests". There are 16 files associated with this paper
(1) AGB.csv providing the site, plot, treeID, mnemn, family, agb, and AGcarbon for each tree in the dataset. Column headings are described in the file AGB_metadata.csv
(2) AGB_metadata.csv Metadata (column descriptions) for AGB.csv
(3) CWD_D.csv Complete information on the downed coarse woody debris (CWD) measured in each plot
(4) CWD_D_metadata.csv Metadata (column descriptions) for CWD_D.csv
(5) CWD_S.csv Complete information on the standing coarse woody debris measured in each plot
(6) CWD_S_metadata.csv Metadata (column descriptions) for CWD_S.csv
(7) SoilC.csv Estimated soil carbon storage (Mg C) at each sampling location in each plot
(8) SoilC_metadata.csv Metadata (column descriptions) for SoilC.csv
(9) Table.csv Data source, soil carbon value (Mg C) and elevation from published data sources
(10) Table_metadata.csv Metadata (column descriptions) for Table.csv
(11) TableS1.csv Data source, above ground carbon value (Mg C) and elevation from published data sources
(12) TableS1_metadata.csv Metadata (column descriptions) for TableS1.csv
(13) RScript.R Annotated code for data analysis and figures
(14)Full_dataset.csv Full set of environmental data and carbon data by plot
(15) Full_dataset_metadata.csv Metadata (column descriptions) for Full_dataset.csv
(16) Species list and species codes.csv Full family, genus and species names for the species codes (column mnemn in AGB.csv)
keywords:
tropical forest; carbon storage
published:
2025-08-07
Vu-Le, The-Anh; Chacko, George; Warnow, Tandy
(2025)
Dataset generated using the technique described in "EC-SBM synthetic network generator". This contains multiple synthetic networks with ground-truth community structure, which can be used to evaluate community detection methods.
Note:
* networks.zip contains the synthetic networks
keywords:
network science; synthetic networks; community detection; tsv
published:
2025-09-01
Chronic wasting disease (CWD) surveillance data from Illinois and Wisconsin, USA between the fiscal years 2003 and 2022 (calendar years 2002 and 2021). Data is reported at the township level as defined by the US Public Survey System. CWD cases, animals tested for CWD, and the apparent prevalence calculated from these values are given by township and fiscal year. Data has been anonymized by replacing original township names with identification numbers to maintain the privacy of landowners. Variables include Tests, Cases, and nonlinear transformations of Tests and Cases (inverse, square root, and log transformations).
keywords:
chronic wasting disease; cwd; white-tailed deer; deer; cervid; prion; apparent prevalence; prevalence; surveillance
published:
2025-08-04
Hartman, Theodore; Studt, Jacob; VanLoocke, Andy; McDaniel, Marshall; Howe, Adina; Masters, Michael D. ; Mitchell, Corey; DeLucia, Evan H.; Heaton, Emily
(2025)
This dataset contains the data used for the publication “Aboveground rather than belowground productivity drives variability in Miscanthus x giganteus net primary productivity”. This dataset contains Miscanthus x giganteus biomass, carbon, and nitrogen tissue data for aboveground and belowground plant parts collected in 2021 for three different sites in Iowa with three different nitrogen application rates. Data at the Iowa sites were collected via biometric hand harvesting, belowground excavations, and soil coring both in-clump and beside-clump. Data were collected at two collection timepoints to calculate the contributions of belowground parts to Miscanthus x giganteus net primary productivity. This dataset also includes Miscanthus x giganteus and Switchgrass soil coring and excavation data collected in 2012 at the University of Illinois Urbana Champaign Energy Farm.
keywords:
Miscanthus; Net Primary Productivity; Excavation; Nitrogen fertilization; Translocation; Belowground Biomass; Carbon
published:
2025-08-01
Martin, Duncan G; Aspray, Elise K; Li, Shuai; Leakey, Andrew DB; Ainsworth, Elizabeth A
(2025)
Physiological and yield data from a three year field experiment of soybean exposed to elevated ozone stress and reduced soil moisture at the SoyFACE experiment.
keywords:
soybean; ozone; drought; photosynthesis; yield
published:
2025-08-28
Purba, Denissa Sari Darmawi; Pei, Xingrui; Kontou, Eleftheria
(2025)
This dataset contains both processed and raw data that were leveraged to conduct analysis presented fully in the report "Community Vulnerability Assessment for Electric Vehicle Travelers Responsive to Extreme Flooding" and partially in the under review paper "Vulnerability Assessment of Electric Vehicles and their Charging Station Network during Evacuations".
keywords:
electric vehicles; vulnerability assessment; flooding events; evacuation; charging infrastructure
published:
2025-08-14
Bao, Wencheng; Kontou, Eleftheria
(2025)
Data and code for the paper titled "Electric Vehicle Charging Stations at Risk from Hazardous Events and Power Outages: Analytics and Resilience Implications" published in Renewable and Sustainable Energy Reviews journal (https://doi.org/10.1016/j.rser.2025.116144).
keywords:
electric vehicles; hazardous events; charging infrastructure; power outages; resilience
published:
2025-07-14
Hossain, Mohammad Tanver; Piorkowski, Dakota; Lowe, Andrew; Eom, Wonsik; Shetty, Abhishek; Tawfick, Sameh; Fudge, Douglas; Ewoldt, Randy
(2025)
Data accompanying the article "Physics of Unraveling and Micromechanics of Hagfish Threads".
Abstract of the article:
Hagfish slime is a unique biological material composed of mucus and protein threads that rapidly deploy into a cohesive network when deployed in seawater. The forces involved in thread deployment and interactions among mucus and threads are key to understanding how hagfish slime rapidly assembles into a cohesive, functional network. Despite extensive interest in its biophysical properties, the mechanical forces governing thread deployment and interaction remain poorly quantified. Here, we present the first direct in situ measurements of the micromechanical forces involved in hagfish slime formation, including mucus mechanical properties, skein peeling force, thread–mucus adhesion, and thread–thread cohesion. Using a custom glass-rod force sensing system, we show that thread deployment initiates when peeling forces exceed a threshold of approximately 6.8 nN. To understand the flow strength required for unraveling, we used a rheo-optic setup to impose controlled shear flow, enabling us to directly observe unraveling dynamics and determine the critical shear rate for unraveling of the skeins, which we then interpreted using an updated peeling-based force balance model. Our results reveal that thread–mucus adhesion dominates over thread–thread adhesion and that deployed threads contribute minimally to bulk shear rheology at constant flow rate. These findings clarify the physics underlying the rapid, flow-triggered assembly of hagfish slime and inform future designs of synthetic deployable fiber–gel systems.
keywords:
supplementary data; hagfish slime; unraveling skeins
published:
2025-05-29
Ruess, P.J.; Hanley, Jackie; Konar, Megan
(2025)
These data support Ruess et al (2025) "Drought impacts to water footprints and virtual water transfers of counties of the United States", Water Resources Research, 61, e2024WR037715, https://doi.org/10.1029/2024WR037715.
The dataset contains estimates for Virtual Water Content (VWC) and Virtual Water Trade (VWT) for nine unique combinations of three crop categories (cereal grains, produce, and animal feed) and three water sources (surface water withdrawals, groundwater withdrawals, and groundwater depletion) for the years 2012 and 2017 within the Continental United States. The VWC is calculated by dividing irrigation withdrawal estimates (m3) by the production (tons) at the county resolution. The VWT is calculated by multiplying the VWC by the estimated county level food flows (tons) from Karakoc et al. (2022). All VWC estimates are provided at the county resolution according to county GEOID and are given in units of m3/ton. All VWT estimates are given in pairs of origin and destination GEOID’s and provided in units of m3.
When using, please cite as:
Ruess, P.J., Hanley, J., and Konar, M. (2025) "Drought impacts to water footprints and virtual water transfers of counties of the United States", Water Resources Research, 61, e2024WR037715, doi: 10.1029/2024WR037715.
keywords:
irrigation; water footprints; supply chains
published:
2025-08-26
Kraft, Mary L.; Fisher, Gregory L.; Chini, Corryn E.; Gorman, Brittney L.; Brunet, Melanie A.
(2025)
This dataset consists of the time-of-flight secondary ion mass spectrometry (TOF-SIMS) depth profiling data that was collected with a PHI nanoTOF II Parallel Imaging MS/MS instrument from a 70 micron by 70 micron region on a recombinant HEK cell labeled with a stain that accumulates in the endoplasmic reticulum (ER-Tracker Blue White DPX, Invitrogen).
keywords:
TOF-SIMS; secondary ion mass spectrometry; depth profiling; endoplasmic reticulum; fluorine; total ion count; TIC image; ion image, tandem mass spectrometry imaging, ER-tracker
published:
2025-06-23
Kleiman, Diego; Feng, Jiangyan; Xue, Zhengyuan; Shukla, Diwakar
(2025)
This repository contains data and model weights associated with the publication "ESMDynamic: Fast and Accurate Prediction of Protein Dynamic Contact Maps from Single Sequences". It includes the datasets used for training and evaluating a dynamic contact prediction model, ESMDynamic, as well as a script for conversion and usage.
keywords:
Computational biology; Structural biology; Molecular dynamics; Machine learning; Protein modeling; Bioinformatics; Biophysics; Artificial intelligence
published:
2023-09-19
Salami, Malik Oyewale; Lee, Jou; Schneider, Jodi
(2023)
We used the following keywords files to identify categories for journals and conferences not in Scopus, for our STI 2023 paper "Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science".
The first four text files each contains keywords/content words in the form: 'keyword1', 'keyword2', 'keyword3', .... The file title indicates the name of the category:
file1: healthscience_words.txt
file2: lifescience_words.txt
file3: physicalscience_words.txt
file4: socialscience_words.txt
The first four files were generated from a combination of software and manual review in an iterative process in which we:
- Manually reviewed venue titles were not able to automatically categorize using the Scopus categorization or extending it as a resource.
- Iteratively reviewed uncategorized venue titles to manually curate additional keywords as content words indicating a venue title could be classified in the category healthscience, lifescience, physicalscience, or socialscience. We used English content words and added words we could automatically translate to identify content words. NOTE: Terminology with multiple potential meanings or contain non-English words that did not yield useful automatic translations e.g., (e.g., Al-Masāq) were not selected as content words.
The fifth text file is a list of stopwords in the form: 'stopword1', 'stopword2, 'stopword3', ...
file5: stopwords.txt
This file contains manually curated stopwords from venue titles to handle non-content words like 'conference' and 'journal,' etc.
This dataset is a revision of the following dataset:
Version 1: Lee, Jou; Schneider, Jodi: Keywords for manual field assignment for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign Data Bank.
Changes from Version 1 to Version 2:
- Added one author
- Added a stopwords file that was used in our data preprocessing.
- Thoroughly reviewed each of the 4 keywords lists. In particular, we added UTF-8 terminology, removed some non-content words and misclassified content words, and extensively reviewed non-English keywords.
keywords:
health science keywords; scientometrics; stopwords; field; keywords; life science keywords; physical science keywords; science of science; social science keywords; meta-science; RISRS
published:
2025-08-21
Lu, Yi; Sweedler, Jonathan; Zhou, Shuaizhen; Zhou, Yu
(2025)
Engineering efficient biocatalysts is essential for metabolic engineering to produce valuable bioproducts from renewable resources. However, due to the complexity of cellular metabolic networks, it is challenging to translate success in vitro into high performance in cells. To meet such a challenge, an accurate and efficient quantification method is necessary to screen a large set of mutants from complex cell culture and a careful correlation between the catalysis parameters in vitro and performance in cells is required. In this study, we employed a mass-spectrometry based high-throughput quantitative method to screen new mutants of 2-pyrone synthase (2PS) for triacetic acid lactone (TAL) biosynthesis through directed evolution in E. coli. From the process, we discovered two mutants with the highest improvement (46 fold) in titer and the fastest kcat (44 fold) over the wild type 2PS, respectively, among those reported in the literature. A careful examination of the correlation between intracellular substrate concentration, Michaelis-Menten parameters and TAL titer for these two mutants reveals that a fast reaction rate under limiting intracellular substrate concentrations is important for in-cell biocatalysis. Such properties can be tuned by protein engineering and synthetic biology to adopt these engineered proteins for the maximum activities in different intracellular environments.
keywords:
catalysis; mass spectrometry; metabolic engineering
published:
2025-08-21
Viral vectors provide an increasingly versatile platform for transformation-free reagent delivery to plants. RNA viral vectors can be used to induce gene silencing, overexpress proteins, or introduce gene editing reagents; however, they are often constrained by carrying capacity or restricted tropism in germline cells. Site-specific recombinases that catalyze precise genetic rearrangements are powerful tools for genome engineering that vary in size and, potentially, efficacy in plants. In this work, we show that viral vectors based on tobacco rattle virus (TRV) deliver and stably express four recombinases ranging in size from ∼0.6 to ∼1.5 kb and achieve simultaneous marker removal and reporter activation through targeted excision in transgenic Nicotiana benthamiana lines. TRV vectors with Cre, FLP, CinH, and Integrase13 efficiently mediated recombination in infected somatic tissue and led to heritable modifications at high frequency. An excision-activated Ruby reporter enabled simple and high-resolution tracing of infected cell lineages without the need for molecular genotyping. Together, our experiments broaden the scope of viral recombinase delivery and offer insights into infection dynamics that may be useful in developing future viral vectors.
keywords:
gene editing; genome engineering; plant transformation