Illinois Data Bank Dataset Search Results
Results
published:
2025-07-28
McCumber, Corinne; Salami, Malik Oyewale
(2025)
This project investigates retraction indexing agreement in PubMed between 2024-07-03 and 2025-05-09 in order to address an API limitation that resulted in 199 items being excluded from analysis in "Analyzing the consistency of retraction indexing". PubMed was queried on 2024-07-03 and on 2025-05-09 using the search “Retracted Publication[PT]”. PubMed is only able to return 10,000 items when queried via the E-Utilities API. When the pipeline was run 2024-07-03, the search between 2020 and 2024 returned 10,199 items, meaning that an expected 199 items indexed as retracted in PubMed were excluded. This dataset uses and compares information from PubMed as of 2025-05-09 to attempt to identify those 199 items.
keywords:
retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS; PubMed
published:
2025-07-25
Mori, Jameson; Rivera, Nelda; Brown, William; Skinner, Daniel; Schlichting, Peter; Novakofski, Jan; Mateus-Pinilla, Nohra
(2025)
This dataset contains the pregnancy status of wild, white-tailed deer (Odocoileus virginianus) from northern Illinois culled as part of the Illinois Department of Natural Resources' chronic wasting disease (CWD) surveillance program. Fiscal years 2005 through 2024 are included. A fiscal year is the time between July 1st of one calendar year and June 30th of the next. Variables in this dataset include the pregnancy status, CWD infection status, age, weight, and day of mortality for each female deer, as well as the deer land cover utility (LCU) score for the TRS, township, or county from which the deer was culled. The deer population density of the county is also included. Data have been anonymized for landowner privacy reasons so that the location and year are not identifiable, but will give the same modeling results by maintaining how the data are grouped. The R code used to conduct the regression modeling is also included.
keywords:
cervid; Cervidae, chronic wasting disease; CWD; reproduction; white-tailed deer; Odocoileus virginianus; pregnancy; regression
published:
2025-06-22
Stickley, Samuel; Crawford, John; Peterman, William; Fraterrigo, Jennifer
(2025)
keywords:
terrestrial salamanders, microhabitat, physiology, mechanistic models, ecological niche models, climate change, Great Smoky Mountains National Park
published:
2019-09-01
Jackson, Nicole; Konar, Megan; Debaere, Peter; Estes, Lyndon
(2019)
Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014.
The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).
keywords:
global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series
published:
2025-07-21
Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W.
(2025)
This dataset includes image stacks, annotated counts, and ground-truth masks from two high-resolution sediment cores extracted from Laguna Pallcacocha, in El Cajas National Park, Ecuadorian Andes by Moy et al. (2002) and Hagemans et al. (2021). The first core (PAL 1999, from Moy et al. (2002)) extends through the Holocene (11,600 cal. yr. BP - present). There are a total of 900 annotated image stacks and masks in the PAL 1999 domain. The second core (PAL IV, from Hagemans et al. (2021)) captures the 20th century. There are 2986 annotated image stacks and masks in the PAL IV domain.
Different microscopes and annotations tools were used to image and annotate each core and there are corresponding differences in naming conventions and file formats. Thus, we organized our data separately for the PAL 1999 and the PAL IV domains. The three letter codes used to label our pollen annotations are in the file: “Pollen_Identification_Codes.xlsx”.
Both domain directories contain:
• Image stacks organized by subdirectory
• Annotations within each image stack directory, containing specimen identifications using a three letter code and coordinates defining bounding boxes or circles
• Ground-truth distance-transform masks for each image stack
The zip file "bestValModel_encoder.paramOnly.zip" is the trained pollen detection model produced from the images and annotations in this dataset.
Please cite this dataset as:
Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W. (2025): Slide scans, annotated pollen counts, and trained pollen detection models for fossil pollen samples from Laguna Pallcacocha, El Cajas National Park, Ecuador . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4207757_V1
Please also include citations of the original publications from which these data are taken:
Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” bioRxiv, January 1, 2025. https://doi.org/10.1101/2025.01.05.631390.
Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” Paleobiology, 2025 [in press].
Feng, J. T. (2023). Open-world deep learning applied to pollen detection (MS thesis, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/120168
keywords:
continual learning; deep learning; domain gaps; open-world; palynology; pollen grain detection; taxonomic bias
published:
2024-11-15
Blanke, Steven; Ringling, Megan; Tan, Ivilyn; Oh, Seung
(2024)
This page contains the data for the manuscript "Vacuolating cytotoxin A interactions with the host cell surface". This manuscript is currently in prep.
keywords:
Steven R Blanke; Vacuolating cytotoxin A; VacA; Helicobacter pylori; protein binding; sphingomyelin; cell surface
published:
2024-11-13
Tang, Zhichu; Chen, Wenxiang; Yin, Kaijun; Busch, Robert; Hou, Hanyu; Lin, Oliver; Lyu, Zhiheng; Zhang, Cheng; Yang, Hong; Zuo, Jian-Min ; Chen, Qian
(2024)
These datasets are for the four-dimensional scanning transmission electron microscopy (4D-STEM) and electron energy loss spectroscopy (EELS) experiments for cathode nanoparticles at different states. The raw 4D-STEM experiment datasets were collected by TEM image & analysis software (FEI) and were saved as SER files. The raw 4D-STEM datasets of SER files can be opened and viewed in MATLAB using our analysis software package of imToolBox available at https://github.com/flysteven/imToolBox. The raw EELS datasets were collected by DigitalMicrograph software and were saved as DM4 files. The raw EELS datasets can be opened and viewed in DigitalMicrograph software or using our analysis codes available at https://github.com/chenlabUIUC/OrientedPhaseDomain. All the datasets are from the work "Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion" (2024).
The 4D-STEM experiment data include four example datasets for cathode nanoparticles collected at pristine and discharged states. Each dataset contains a stack of diffraction patterns collected at different probe positions scanned across the cathode nanoparticle.
1. Pristine untreated nanoparticle: "Pristine U-NP.ser"
2. Pristine 200ºC heated nanoparticle: "Pristine H200-NP.ser"
3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP.ser"
4. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP.ser"
The EELS experiment data includes six example datasets for cathode nanoparticles collected at different states (in "EELS datasets.zip") as described below. Each EELS dataset contains the zero-loss and core-loss EELS spectra collected at different probe positions scanned across the cathode nanoparticle.
1. Pristine untreated nanoparticle: "Pristine U-NP EELS.zip"
2. Pristine 200ºC heated nanoparticle: "Prisitne H200-NP EELS.zip"
3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP EELS.zip"
4. Untreated nanoparticle after first charge in Zn-ion batteries: "Charged U-NP EELS.zip"
5. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP EELS.zip"
6. 200ºC heated nanoparticle after first charge in Zn-ion batteries: "Charged H200-NP EELS.zip"
The details of the software package and codes that can be used to analyze the 4D-STEM datasets and EELS datasets are available at: https://github.com/chenlabUIUC/OrientedPhaseDomain. Once our paper is formally published, we will update the relationship of these datasets with our paper.
keywords:
4D-STEM; EELS; defects; strain; cathode; nanoparticle; energy storage
published:
2024-10-10
Zeiri, Offer; Hatzis, Katherine Marie; Gomez, Maurea; Cook, Emily A; Kincanon, Maegen; Murphy, Catherine
(2024)
keywords:
Gold nanorods, Surface enhanced Raman spectroscopy, SERS, Polyoxometalates
published:
2025-06-24
Ge, Jiankai; Weatherspoon, Howard; Peters, Baron
(2025)
This supporting information file contains codes related to pending publication Ge et al. Proc. Nat. Acad. Sci. USA, (revisions in review). The contents include a Mathematica code that solves the Laplace transformed equations and generates figures from the paper. A python code is included for generation of Figure 5 in the main text.
keywords:
Population balance model; Covalent organic framework; Nucleation; Growth;
published:
2024-09-16
Wu, Steven; Smith, Hannah
(2024)
This dataset describes an analysis of research documents about the debate between hydrogen fuel cells and
lithium-ion batteries within the context of electric vehicles.
To create this dataset, we first analyzed news articles on the topic of sustainable development. We searched for related science using keywords in Google Scholar. We then identified subtopics and selected one specific subtopic: electric vehicles. We started to identify positions and players about electric vehicles [1].
Within electric vehicles, we started searching in OpenAlex for a topic of reasonable size (about 300 documents) related to a scientific or technical debate. We narrowed to electric vehicles and batteries, then trained a cluster model [2] on OpenAlex’s keywords to develop some possible search queries, and chose one.
Our final search query (May 7, 2024) returned 301 document in OpenAlex:
Title & abstract includes: Electric Vehicle + Hydrogen + Battery
filter is Lithium-ion Battery Management in Electric Vehicle
We used a Python script and the Scopus API to find missing abstracts and DOIs [3].
To identify relevant documents, we used a combination of Abstractkr [4] and manual screening. As a starting point for Abstractkr [4], one person manually screened 200 documents by checking the abstracts for “hydrogen fuel cells” and “battery comparisons”. Then we used Abstractkr [4] to predict the relevance of the remaining documents based on the title, abstract, and keywords. The settings we used were single screening, ordered by most likely to be relevant, and 0 pilot size. We set a threshold of 0.6 for the predictions. After screening and predictions, 176 documents remained
keywords:
controversy mapping; sustainable development; evidence synthesis; OpenAlex; Abstrackr; Scopus; meta-analysis; electric vehicle; hydrogen fuel cells; battery
published:
2025-02-08
Anne, Lahari; Park, Minhyuk; Warnow, Tandy; Chacko, George
(2025)
The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647).
The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/).
Input Networks:
The CEN can be downloaded from the Illinois Data Bank:
https://databank.illinois.edu/datasets/IDB-0908742 -> cen_pipeline.tar.gz -> S1_cen_cleaned.tsv
The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where
a - name of inspirational network, e.g., cit_hepph
b - the resolution value used when clustering a with the Leiden algorithm optimizing the Constant Potts Model, e.g., 0.01
c- the RECCS option used to approximate edge count and connectivity in the real world network, e.g., v1
Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14)
Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment.
Input Network : CEN
Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows:
cen_<resolution><cm_status><reccs_version>sample<replicate_id>.tsv
where:
cen – Indicates the network was modeled on the Curated Exosome Network (CEN).
resolution – The resolution parameter used in clustering the input network with Leiden-CPM (0.01).
cm_status – Either cm (CM-treated input clustering) or no_cm (input clustering without CM treatment).
reccs_version – The RECCS version used to generate the synthetic network (v1 or v2).
replicate_id – The specific replicate (ranging from 0 to 2 for each configuration).
For example:
cen_0.01_cm_v1_sample_0.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, CM-treated input, and generated using RECCSv1 (first replicate).
cen_0.01_no_cm_v2_sample_1.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, without CM treatment, and generated using RECCSv2 (second replicate).
The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz.
keywords:
Community Detection; Synthetic Networks; Stochastic Block Model (SBM);
published:
2025-05-21
Punyasena, Surangi W.; Adaime, Marc-Elie; Jaramillo, Carlos
(2025)
This dataset includes a total of 16 images of 2 extant species of Podocarpus (Podocarpaceae) and 23 images of fossil specimens of the morphogenus Podocarpidites.
The images were taken using a Zeiss LSM 880 microscope with Airyscan confocal superresolution at 630x magnification (63x/NA 1.4 oil DIC). The images are in the original CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or open microscopy software, such as ImageJ. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/us/products/software/zeiss-zen/czi-image-file-format.html]
For Podocarpus (modern specimens):
Each folder is labelled by genus and contain all images corresponding to that genus. Detailed information about the folders, files, and specimens can be found in the Excel file "METADATA_Podocarpus_extant.csv". This file includes metadata on: species, slide ID, collection, folder name file name and notes.
Images are of pollen grains from slides in the Florida Museum of Natural History collections.
For Podocarpidites (fossil specimens):
Each image is named after the sample from which it was derived. Detailed information about the specimens can be found in the Excel file "METADATA_ Podocarpidites_fossil.csv". This file includes metadata: the fossil type (Taxon), the slide and sample name (Slide Info), the location of the sample locality (Country, Latitude, Longitude), the age of the sample (Min age, Max age), the location of the specimen on the sample slide (England Finder coordinates), and the image file name.
Images are of fossil pollen from slides in Smithsonian Tropical Research Institute collections.
Please cite this dataset and listed publications when using these images.
keywords:
optical superresolution microscopy; Zeiss Airyscan; CZI images; conifer; saccate pollen; Podocarpus; Podocarpidites
published:
2024-08-06
Xing, Yuqing; Bae, Seokjin; Madhavan, Vidya
(2024)
This is the raw topographies (without linear background subtraction) related to the publication: https://www.nature.com/articles/s41586-024-07519-5
published:
2025-04-02
Pastrana-Otero, Isamar; Godbole, Apurva R.; Kraft, Mary L.
(2025)
This dataset contains Raman spectra, each acquired from an individual, living, cell entrapped within a soft or stiff gelatin methacrylate hydrogel or from a cell-free region of the hydrogel sample. Spectra were acquired from the following cell types: Madin-Darby Canine Kidney cell (MDCK); Chinese hamster ovary cell (CHO-K1); transfected CHO-K1 cell that expressed the SNAP-tag and HaloTag reporter proteins fused to an organelle-specific protein (CHO-T); human monocyte-like cell (THP-1); inactive macrophage-like (M0-like); active anti-inflammatory macrophage-like (M2-like), pro/anti-inflammatory macrophage-like (M1/M2-like). These spectra are useful for identifying whether the hydrogel matrix obscures the Raman spectral signatures that are characteristic of each of these cell types.
keywords:
Raman spectroscopy; 3D cell culture; single-cell spectrum; hydrogel scaffold; collagen scaffold; macrophage spectra; macrophage differentiation; THP-1 line; noninvasive phenotype identification; vibrational spectroscopy
published:
2025-04-30
This dataset represents the results of targeted eDNA assays via quantitative PCR for two imperiled freshwater species.
keywords:
Environmental DNA, Freshwater Mussel, Salamander, Conventional Surveys, Endangered Species, Habitat Use, Artificial Structures
published:
2025-03-28
8-bit RGB realizations of a stochastic image model (SIM) of the **kinds** of things seen in fluorescence microscopy of biological samples. Note that no attempt was made to model a particular tissue, sample, or microscope. Distinct image features are seen in each color channel. The first public mention of these SIMs is in "Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure" by Frank Brooks and Rucha Deshpande. Manuscript on ArXiv and submitted for publication.
keywords:
image models; fluorescence microscopy; training data; image-to-image translation; generative model evaluation
published:
2025-06-16
Blanc-Betes, Elena; Gomez-Casanovas, Nuria; Bernacchi, Carl; Boughton, Elizabeth; Yang, Wendy; DeLucia, Evan
(2025)
Biometric, and ground-based and eddy covariance flux data to investigate the impact of sugarcane expansion across subtropical Florida on the carbon (C) budget over a three-year rotation.
Dataset includes: three-year record of daily fluxes, NPP and SOC input measurements, and estimates of carbon use efficiency and net ecosystem carbon balance in sugarcane and improved and semi-native pastures following pasture conversion to sugarcane.
keywords:
land use change; sugarcane expansion; bioenergy; carbon budget; CUE; NECB
published:
2025-02-23
Bondarenko, Nikita; Podladchikov, Yury; Williams-Stroud, Sherilyn; Makhnenko, Roman
(2025)
Dataset with numerical routines and laboratory testing data associated with the manuscript: Bondarenko, N., Podladchikov, Y., Williams‐Stroud, S., & Makhnenko, R. (2025). Stratigraphy‐induced localization of microseismicity during CO2 injection in Illinois Basin. Journal of Geophysical Research: Solid Earth, 130, e2024JB029526. https://doi.org/10.1029/2024JB029526
keywords:
Illinois Basin Decatur Project; Induced Seismicity; GPU; Numerical modeling
published:
2024-08-29
Li, Shuai; Montes, Christopher; Aspray, Elise; Ainsworth, Elizabeth
(2024)
Over the past 15 years, soybean seed yield response to season-long elevated O3 concentrations [O3] and to year-to-year weather conditions was studied using free-air O3 concentration enrichment (O3-FACE) in the field at the SoyFACE facility in Central Illinois. Elevated [O3] significantly reduced seed yield across cultivars and years. However, our results quantitatively demonstrate that weather conditions, including soil water availability and air temperature, did not alter yield sensitivity to elevated [O3] in soybean.
keywords:
drought, elevated O3, heat, O3-FACE, soybean, yield
published:
2025-05-01
Wang, Weiwei; Khanna, Madhu
(2025)
BEPAM, Biofuel and Environmental Policy Analysis Model, models the agricultural sector and determines economically optimal land-use and feedstock mix at the US scale by maximizing the sum of agricultural sector consumers’ and producers’ surplus subject to various resource balances, land availability, and technological constraints under a range of biomass prices, from zero to $140 Mg-1 over the 2016-2030 period. Here BEPAM is used to model SAF production using energy crops and crop residues. BEPAM uses the GAMS format and uses yield and GHG balance projections from the biogeochemical model, DayCent.
keywords:
BEPAM; Energy crops; direct and indirect land use change; soil carbon sequestration; fossil fuel displacement; economic incentives
published:
2025-04-28
Alvarez, Jennifer; Fraterrigo, Jennifer; Dalling, James
(2025)
Dataset of the standing dead trees at Trelease Woods in 2022. Dataset contains volume, biomass, decay class, and GPS coordinates for each standing dead tree.
keywords:
old-growth; temperate forest; standing deadwood; census data
published:
2025-04-27
Alvarez, Jennifer; Fraterrigo, Jennifer; Dalling, James
(2025)
Downed woody debris census data for Trelease Woods collected in the summer of 2022. Dataset contains volume, biomass, decay class, and GPS coordinates for each downed woody debris piece.
keywords:
Old-growth; temperate forest; downed woody debris; coarse woody debris; census data
published:
2025-04-27
Alvarez, Jennifer; Fraterrigo, Jennifer; Dalling, James
(2025)
Soil data for ten soil cores collected at Trelease Woods in 2022. Soil samples were analyzed with an elemental analyzer via combustion to obtain total carbon (C) and nitrogen. A subset of these samples were analyzed using the Walkley-Black method to obtain organic C. A calibration curve relating organic C and total C was created using these data.
keywords:
old-growth; temperate forest; soil carbon; soil nitrogen; nutrient cycling
published:
2025-01-27
Zinnen, Jack; Chase, Marissa; Charles, Brian; Meissen, Justin; Matthews, Jeffrey
(2025)
This is the core data for RELIX, a dataset of vascular plant species presence for 353 prairie remnants in the Midwestern United States and associated dataset of prairie remnant metadata. The primary data file contains a list of the vascular plant species observed in the prairie remnants, as well as a metadata table with more information about the prairie remnant in question and the species list itself. The data was compiled from a variety of written sources, private and published, chronicling observations made between the mid-twentieth century and 2021. It also contains a supplementary data table of vascular plant species observed in at least 8 of the prairie remnants in RELIX, as well as a list of acknowledgements for the associated manuscript.
keywords:
prairie peninsula; prairie relict; prairie soil; species inventories; tallgrass prairie
published:
2024-07-28
Xing, Yuqing; Bae, Seokjin; Madhavan, Vidya
(2024)
This is a set of topographies to study the magnetic field response of RbV3Sb5 (related to Fig.4 of https://www.nature.com/articles/s41586-024-07519-5)