Illinois Data Bank Dataset Search Results
Results
published:
2025-06-22
Stickley, Samuel; Crawford, John; Peterman, William; Fraterrigo, Jennifer
(2025)
keywords:
terrestrial salamanders, microhabitat, physiology, mechanistic models, ecological niche models, climate change, Great Smoky Mountains National Park
published:
2019-09-01
Jackson, Nicole; Konar, Megan; Debaere, Peter; Estes, Lyndon
(2019)
Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014.
The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).
keywords:
global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series
published:
2025-07-21
Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W.
(2025)
This dataset includes image stacks, annotated counts, and ground-truth masks from two high-resolution sediment cores extracted from Laguna Pallcacocha, in El Cajas National Park, Ecuadorian Andes by Moy et al. (2002) and Hagemans et al. (2021). The first core (PAL 1999, from Moy et al. (2002)) extends through the Holocene (11,600 cal. yr. BP - present). There are a total of 900 annotated image stacks and masks in the PAL 1999 domain. The second core (PAL IV, from Hagemans et al. (2021)) captures the 20th century. There are 2986 annotated image stacks and masks in the PAL IV domain.
Different microscopes and annotations tools were used to image and annotate each core and there are corresponding differences in naming conventions and file formats. Thus, we organized our data separately for the PAL 1999 and the PAL IV domains. The three letter codes used to label our pollen annotations are in the file: “Pollen_Identification_Codes.xlsx”.
Both domain directories contain:
• Image stacks organized by subdirectory
• Annotations within each image stack directory, containing specimen identifications using a three letter code and coordinates defining bounding boxes or circles
• Ground-truth distance-transform masks for each image stack
The zip file "bestValModel_encoder.paramOnly.zip" is the trained pollen detection model produced from the images and annotations in this dataset.
Please cite this dataset as:
Feng, Jennifer T.; van den Berg, Thya; Donders, Timme H.; Kong, Shu; Puthanveetil Satheesan, Sandeep; Punyasena, Surangi W. (2025): Slide scans, annotated pollen counts, and trained pollen detection models for fossil pollen samples from Laguna Pallcacocha, El Cajas National Park, Ecuador . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4207757_V1
Please also include citations of the original publications from which these data are taken:
Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” bioRxiv, January 1, 2025. https://doi.org/10.1101/2025.01.05.631390.
Feng, Jennifer T., Sandeep Puthanveetil Satheesan, Shu Kong, Timme H. Donders, and Surangi W. Punyasena. “Addressing the ‘Open World’: Detecting and Segmenting Pollen on Palynological Slides with Deep Learning.” Paleobiology, 2025 [in press].
Feng, J. T. (2023). Open-world deep learning applied to pollen detection (MS thesis, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/120168
keywords:
continual learning; deep learning; domain gaps; open-world; palynology; pollen grain detection; taxonomic bias
published:
2024-11-15
Blanke, Steven; Ringling, Megan; Tan, Ivilyn; Oh, Seung
(2024)
This page contains the data for the manuscript "Vacuolating cytotoxin A interactions with the host cell surface". This manuscript is currently in prep.
keywords:
Steven R Blanke; Vacuolating cytotoxin A; VacA; Helicobacter pylori; protein binding; sphingomyelin; cell surface
published:
2024-11-13
Tang, Zhichu; Chen, Wenxiang; Yin, Kaijun; Busch, Robert; Hou, Hanyu; Lin, Oliver; Lyu, Zhiheng; Zhang, Cheng; Yang, Hong; Zuo, Jian-Min ; Chen, Qian
(2024)
These datasets are for the four-dimensional scanning transmission electron microscopy (4D-STEM) and electron energy loss spectroscopy (EELS) experiments for cathode nanoparticles at different states. The raw 4D-STEM experiment datasets were collected by TEM image & analysis software (FEI) and were saved as SER files. The raw 4D-STEM datasets of SER files can be opened and viewed in MATLAB using our analysis software package of imToolBox available at https://github.com/flysteven/imToolBox. The raw EELS datasets were collected by DigitalMicrograph software and were saved as DM4 files. The raw EELS datasets can be opened and viewed in DigitalMicrograph software or using our analysis codes available at https://github.com/chenlabUIUC/OrientedPhaseDomain. All the datasets are from the work "Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion" (2024).
The 4D-STEM experiment data include four example datasets for cathode nanoparticles collected at pristine and discharged states. Each dataset contains a stack of diffraction patterns collected at different probe positions scanned across the cathode nanoparticle.
1. Pristine untreated nanoparticle: "Pristine U-NP.ser"
2. Pristine 200ºC heated nanoparticle: "Pristine H200-NP.ser"
3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP.ser"
4. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP.ser"
The EELS experiment data includes six example datasets for cathode nanoparticles collected at different states (in "EELS datasets.zip") as described below. Each EELS dataset contains the zero-loss and core-loss EELS spectra collected at different probe positions scanned across the cathode nanoparticle.
1. Pristine untreated nanoparticle: "Pristine U-NP EELS.zip"
2. Pristine 200ºC heated nanoparticle: "Prisitne H200-NP EELS.zip"
3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP EELS.zip"
4. Untreated nanoparticle after first charge in Zn-ion batteries: "Charged U-NP EELS.zip"
5. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP EELS.zip"
6. 200ºC heated nanoparticle after first charge in Zn-ion batteries: "Charged H200-NP EELS.zip"
The details of the software package and codes that can be used to analyze the 4D-STEM datasets and EELS datasets are available at: https://github.com/chenlabUIUC/OrientedPhaseDomain. Once our paper is formally published, we will update the relationship of these datasets with our paper.
keywords:
4D-STEM; EELS; defects; strain; cathode; nanoparticle; energy storage
published:
2024-10-10
Zeiri, Offer; Hatzis, Katherine Marie; Gomez, Maurea; Cook, Emily A; Kincanon, Maegen; Murphy, Catherine
(2024)
keywords:
Gold nanorods, Surface enhanced Raman spectroscopy, SERS, Polyoxometalates
published:
2025-06-24
Ge, Jiankai; Weatherspoon, Howard; Peters, Baron
(2025)
This supporting information file contains codes related to pending publication Ge et al. Proc. Nat. Acad. Sci. USA, (revisions in review). The contents include a Mathematica code that solves the Laplace transformed equations and generates figures from the paper. A python code is included for generation of Figure 5 in the main text.
keywords:
Population balance model; Covalent organic framework; Nucleation; Growth;
published:
2024-09-16
Wu, Steven; Smith, Hannah
(2024)
This dataset describes an analysis of research documents about the debate between hydrogen fuel cells and
lithium-ion batteries within the context of electric vehicles.
To create this dataset, we first analyzed news articles on the topic of sustainable development. We searched for related science using keywords in Google Scholar. We then identified subtopics and selected one specific subtopic: electric vehicles. We started to identify positions and players about electric vehicles [1].
Within electric vehicles, we started searching in OpenAlex for a topic of reasonable size (about 300 documents) related to a scientific or technical debate. We narrowed to electric vehicles and batteries, then trained a cluster model [2] on OpenAlex’s keywords to develop some possible search queries, and chose one.
Our final search query (May 7, 2024) returned 301 document in OpenAlex:
Title & abstract includes: Electric Vehicle + Hydrogen + Battery
filter is Lithium-ion Battery Management in Electric Vehicle
We used a Python script and the Scopus API to find missing abstracts and DOIs [3].
To identify relevant documents, we used a combination of Abstractkr [4] and manual screening. As a starting point for Abstractkr [4], one person manually screened 200 documents by checking the abstracts for “hydrogen fuel cells” and “battery comparisons”. Then we used Abstractkr [4] to predict the relevance of the remaining documents based on the title, abstract, and keywords. The settings we used were single screening, ordered by most likely to be relevant, and 0 pilot size. We set a threshold of 0.6 for the predictions. After screening and predictions, 176 documents remained
keywords:
controversy mapping; sustainable development; evidence synthesis; OpenAlex; Abstrackr; Scopus; meta-analysis; electric vehicle; hydrogen fuel cells; battery
published:
2025-02-08
Anne, Lahari; Park, Minhyuk; Warnow, Tandy; Chacko, George
(2025)
The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647).
The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/).
Input Networks:
The CEN can be downloaded from the Illinois Data Bank:
https://databank.illinois.edu/datasets/IDB-0908742 -> cen_pipeline.tar.gz -> S1_cen_cleaned.tsv
The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where
a - name of inspirational network, e.g., cit_hepph
b - the resolution value used when clustering a with the Leiden algorithm optimizing the Constant Potts Model, e.g., 0.01
c- the RECCS option used to approximate edge count and connectivity in the real world network, e.g., v1
Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14)
Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment.
Input Network : CEN
Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows:
cen_<resolution><cm_status><reccs_version>sample<replicate_id>.tsv
where:
cen – Indicates the network was modeled on the Curated Exosome Network (CEN).
resolution – The resolution parameter used in clustering the input network with Leiden-CPM (0.01).
cm_status – Either cm (CM-treated input clustering) or no_cm (input clustering without CM treatment).
reccs_version – The RECCS version used to generate the synthetic network (v1 or v2).
replicate_id – The specific replicate (ranging from 0 to 2 for each configuration).
For example:
cen_0.01_cm_v1_sample_0.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, CM-treated input, and generated using RECCSv1 (first replicate).
cen_0.01_no_cm_v2_sample_1.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, without CM treatment, and generated using RECCSv2 (second replicate).
The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz.
keywords:
Community Detection; Synthetic Networks; Stochastic Block Model (SBM);
published:
2025-05-21
Punyasena, Surangi W.; Adaime, Marc-Elie; Jaramillo, Carlos
(2025)
This dataset includes a total of 16 images of 2 extant species of Podocarpus (Podocarpaceae) and 23 images of fossil specimens of the morphogenus Podocarpidites.
The images were taken using a Zeiss LSM 880 microscope with Airyscan confocal superresolution at 630x magnification (63x/NA 1.4 oil DIC). The images are in the original CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or open microscopy software, such as ImageJ. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/us/products/software/zeiss-zen/czi-image-file-format.html]
For Podocarpus (modern specimens):
Each folder is labelled by genus and contain all images corresponding to that genus. Detailed information about the folders, files, and specimens can be found in the Excel file "METADATA_Podocarpus_extant.csv". This file includes metadata on: species, slide ID, collection, folder name file name and notes.
Images are of pollen grains from slides in the Florida Museum of Natural History collections.
For Podocarpidites (fossil specimens):
Each image is named after the sample from which it was derived. Detailed information about the specimens can be found in the Excel file "METADATA_ Podocarpidites_fossil.csv". This file includes metadata: the fossil type (Taxon), the slide and sample name (Slide Info), the location of the sample locality (Country, Latitude, Longitude), the age of the sample (Min age, Max age), the location of the specimen on the sample slide (England Finder coordinates), and the image file name.
Images are of fossil pollen from slides in Smithsonian Tropical Research Institute collections.
Please cite this dataset and listed publications when using these images.
keywords:
optical superresolution microscopy; Zeiss Airyscan; CZI images; conifer; saccate pollen; Podocarpus; Podocarpidites
published:
2024-08-06
Xing, Yuqing; Bae, Seokjin; Madhavan, Vidya
(2024)
This is the raw topographies (without linear background subtraction) related to the publication: https://www.nature.com/articles/s41586-024-07519-5
published:
2025-04-02
Pastrana-Otero, Isamar; Godbole, Apurva R.; Kraft, Mary L.
(2025)
This dataset contains Raman spectra, each acquired from an individual, living, cell entrapped within a soft or stiff gelatin methacrylate hydrogel or from a cell-free region of the hydrogel sample. Spectra were acquired from the following cell types: Madin-Darby Canine Kidney cell (MDCK); Chinese hamster ovary cell (CHO-K1); transfected CHO-K1 cell that expressed the SNAP-tag and HaloTag reporter proteins fused to an organelle-specific protein (CHO-T); human monocyte-like cell (THP-1); inactive macrophage-like (M0-like); active anti-inflammatory macrophage-like (M2-like), pro/anti-inflammatory macrophage-like (M1/M2-like). These spectra are useful for identifying whether the hydrogel matrix obscures the Raman spectral signatures that are characteristic of each of these cell types.
keywords:
Raman spectroscopy; 3D cell culture; single-cell spectrum; hydrogel scaffold; collagen scaffold; macrophage spectra; macrophage differentiation; THP-1 line; noninvasive phenotype identification; vibrational spectroscopy
published:
2025-04-30
This dataset represents the results of targeted eDNA assays via quantitative PCR for two imperiled freshwater species.
keywords:
Environmental DNA, Freshwater Mussel, Salamander, Conventional Surveys, Endangered Species, Habitat Use, Artificial Structures
published:
2025-03-28
8-bit RGB realizations of a stochastic image model (SIM) of the **kinds** of things seen in fluorescence microscopy of biological samples. Note that no attempt was made to model a particular tissue, sample, or microscope. Distinct image features are seen in each color channel. The first public mention of these SIMs is in "Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure" by Frank Brooks and Rucha Deshpande. Manuscript on ArXiv and submitted for publication.
keywords:
image models; fluorescence microscopy; training data; image-to-image translation; generative model evaluation
published:
2025-06-16
Blanc-Betes, Elena; Gomez-Casanovas, Nuria; Bernacchi, Carl; Boughton, Elizabeth; Yang, Wendy; DeLucia, Evan
(2025)
Biometric, and ground-based and eddy covariance flux data to investigate the impact of sugarcane expansion across subtropical Florida on the carbon (C) budget over a three-year rotation.
Dataset includes: three-year record of daily fluxes, NPP and SOC input measurements, and estimates of carbon use efficiency and net ecosystem carbon balance in sugarcane and improved and semi-native pastures following pasture conversion to sugarcane.
keywords:
land use change; sugarcane expansion; bioenergy; carbon budget; CUE; NECB
published:
2025-02-23
Bondarenko, Nikita; Podladchikov, Yury; Williams-Stroud, Sherilyn; Makhnenko, Roman
(2025)
Dataset with numerical routines and laboratory testing data associated with the manuscript: Bondarenko, N., Podladchikov, Y., Williams‐Stroud, S., & Makhnenko, R. (2025). Stratigraphy‐induced localization of microseismicity during CO2 injection in Illinois Basin. Journal of Geophysical Research: Solid Earth, 130, e2024JB029526. https://doi.org/10.1029/2024JB029526
keywords:
Illinois Basin Decatur Project; Induced Seismicity; GPU; Numerical modeling
published:
2024-08-29
Li, Shuai; Montes, Christopher; Aspray, Elise; Ainsworth, Elizabeth
(2024)
Over the past 15 years, soybean seed yield response to season-long elevated O3 concentrations [O3] and to year-to-year weather conditions was studied using free-air O3 concentration enrichment (O3-FACE) in the field at the SoyFACE facility in Central Illinois. Elevated [O3] significantly reduced seed yield across cultivars and years. However, our results quantitatively demonstrate that weather conditions, including soil water availability and air temperature, did not alter yield sensitivity to elevated [O3] in soybean.
keywords:
drought, elevated O3, heat, O3-FACE, soybean, yield
published:
2025-04-27
Alvarez, Jennifer; Fraterrigo, Jennifer; Dalling, James
(2025)
Downed woody debris census data for Trelease Woods collected in the summer of 2022. Dataset contains volume, biomass, decay class, and GPS coordinates for each downed woody debris piece.
keywords:
Old-growth; temperate forest; downed woody debris; coarse woody debris; census data
published:
2025-01-27
Zinnen, Jack; Chase, Marissa; Charles, Brian; Meissen, Justin; Matthews, Jeffrey
(2025)
This is the core data for RELIX, a dataset of vascular plant species presence for 353 prairie remnants in the Midwestern United States and associated dataset of prairie remnant metadata. The primary data file contains a list of the vascular plant species observed in the prairie remnants, as well as a metadata table with more information about the prairie remnant in question and the species list itself. The data was compiled from a variety of written sources, private and published, chronicling observations made between the mid-twentieth century and 2021. It also contains a supplementary data table of vascular plant species observed in at least 8 of the prairie remnants in RELIX, as well as a list of acknowledgements for the associated manuscript.
keywords:
prairie peninsula; prairie relict; prairie soil; species inventories; tallgrass prairie
published:
2024-07-28
Xing, Yuqing; Bae, Seokjin; Madhavan, Vidya
(2024)
This is a set of topographies to study the magnetic field response of RbV3Sb5 (related to Fig.4 of https://www.nature.com/articles/s41586-024-07519-5)
published:
2024-08-15
Gounder, Babu; Kadiyan, Lakshya; Sarker, Zafar Waziha
(2024)
This study acquired publicly available Shell annual reports. Reports were selected for the years since the UN investigation in 2011, resulting in documents from 2012 to 2023.
keywords:
environmental justice; ethics of care; indigenous communities; Niger River Delta; oil spills
published:
2025-04-26
Alvarez, Jennifer; Fraterrigo, Jennifer; Dalling, James
(2025)
Census data collected at Trelease Woods in 1936 with information on tree species, stem count, diameter at breast height (DBH), and basal area. The plot boundaries from the 1936 census were georeferenced to subset 2018 census data for a direct comparison between the two census years.
keywords:
old-growth; temperate forest; species composition; forest dynamics; historical data
published:
2025-03-17
Pelech, Elena; Evers, Jochem; Bernacchi, Carl
(2025)
A mechanistic functional structural plant model. The .gsz file includes a parameterised maize and soybean to be used in GRoIMP software https://grogra.de/. The current model is parameterised to maize cultivar DKC63-21RIB and soybean cultivar AG36X6 for the 2019 growing season in Champaign, IL USA.
keywords:
Functional structural plant model; intercropping; plant architecture; maize; soybean
published:
2025-01-27
Shen, Chengze; Wedell, Eleanor; Pop, Mihai; Warnow, Tandy
(2025)
The zip file contains the benchmark data used for the TIPP3 simulation study. See the README file for more information.
keywords:
TIPP3;abundance profile;reference database;taxonomic identification;simulation
published:
2024-07-09
Yan, Bin; Dietrich, Christopher; Yu, Xiaofei; Jiang, Yan; Dai, Renhuai; Du, Shiyu; Cai, Chenyang; Yang, Maofa; Zhang, Feng
(2024)
The included files are the alignments of DNA or amino acid sequences used for phylogenetic analyses of Auchenorrhyncha (Insecta: Hemiptera) in the manuscript by Bin et al. submitted to the journal “Systematic Entomology.” The files are plain text in either FASTA (.fa or .fas suffix) or PHYLIP (.phy suffix) format. Matrix0 is the set of all loci after multiple sequence alignment and trimming (hereafter called). Matrix1 consists of loci having 75% average bootstrap support and 80% taxon completeness (hereafter called Matrix1). Matrix2 consists of loci having 75% average bootstrap support and 95% completeness. Matrix2_nt12 is the same as Matrix2 but with third codon positions excluded. More details on how the datasets were compiled is provided in the Methods section of the manuscript file, also included as a PDF. Supplemental figures for the submitted manuscript are also provided as a PDF for additional information.
keywords:
Insecta; Phylogeny; DNA sequence; Evolution