Dataset Search

Displaying 26 - 50 of 782 in total

Filters

Subject Area

Life Sciences (481)

Social Sciences (118)

Physical Sciences (114)

Technology and Engineering (61)

Uncategorized

Funder

U.S. Department of Energy (DOE) (233)

Other (206)

U.S. National Science Foundation (NSF) (177)

U.S. National Institutes of Health (NIH) (67)

U.S. Department of Agriculture (USDA) (46)

Illinois Department of Natural Resources (IDNR) (20)

U.S. Geological Survey (USGS) (6)

Illinois Department of Transportation (IDOT) (3)

U.S. National Aeronautics and Space Administration (NASA) (3)

U.S. Army (3)

Publication Year

2025 (242)

2022 (87)

2021 (82)

2024 (79)

2020 (65)

2023 (51)

2019 (49)

2018 (48)

2026 (48)

2017 (21)

2016 (10)

License

CC BY (431)

CC0 (328)

custom (23)

Illinois Data Bank Dataset Search Results

Results

published: 2026-03-02

Data for Rewiring Yeast Metabolism for Producing 2,3-Butanediol and Two Downstream Applications: Techno-Economic Analysis and Life Cycle Assessment of Methyl Ethyl Ketone (MEK) and Agricultural Biostimulant Production

Lee, Jae Won; Bhagwat, Sarang; Kuanyshev, Nurzhan; Cho, Young; Sun, Liang; Lee, Ye-Gi; Cortes-Pena, Yoel; Li, Yalin; Rao, Christopher; Guest, Jeremy; Jin, Yong-Su (2026)

Rising concerns for sustainability and global climate change have driven the development of sustainable production pathways for biofuels and chemicals from lignocellulosic biomass via integrated biological and chemical processes. We constructed an engineered Saccharomyces cerevisiae capable of producing 2,3-butanediol (2,3-BDO) from glucose without accumulating ethanol and glycerol, which hinder downstream processing of 2,3-BDO, through extensive metabolic reprogramming. Specifically, we introduced heterologous 2,3-BDO biosynthetic enzymes and deleted the major isozymes of ethanol and glycerol biosynthetic enzymes. In addition, we introduced an NAD+ regenerating Pyruvate-Malate (PM) cycle and enhanced the NAD+ regenerating capability of the PM cycle to resolve the redox imbalance from the deletion of ethanol and glycerol production pathways. The resulting engineered yeast produced 109.9 g/L of 2,3-BDO with a productivity of 1.0 g/L/h and a yield of 0.36 g/g glucose in a fed-batch fermentation. We also conducted techno-economic analysis (TEA) and life cycle assessment (LCA) of the production of methyl ethyl ketone (MEK) through catalytic dehydration of 2,3-BDO. A TEA based on the experimental results indicated that the minimum product selling price (MPSP) was estimated to be $1.90/kg. Regarding cradle-to-grave LCA, 100-year global warming potential (GWP100) and fossil energy consumption (FEC) were found to be 0.37 kg CO2 eq/kg and 3.1 MJ/kg, respectively. These results demonstrated the feasibility of cost-competitive and sustainable bio-based MEK production via yeast fermentation. In addition, we explored the possibility of using the fermentation broth containing 2,3-BDO as a biostimulant inducing drought tolerance in plants. As a result, the yeast 2,3-BDO fermentation broth can induce drought tolerance in Arabidopsis thaliana without a complicated purification process.

keywords: Economics; Metabolomics

published: 2026-03-02

Data for Immediate Impacts of Soybean Cover Crop on Bacterial Community Composition and Diversity in Soil Under Long-Term Saccharum Monoculture

Mula-Michel, Himaya; White, Paul; Hale, Anna (2026)

Saccharum yield decline results from long-term monoculture practices. Changes in cropping management can improve soil health and productivity. Below-ground bacterial community diversity and composition across soybean (Glycine max (L.) Merr) cover crop, Saccharum monoculture (30+ year) and fallowed soil were determined. Near full length (~1,400 base pairs) of 16S rRNA gene sequences were extracted from the rhizospheres of sugarcane and soybean and fallowed soil were compared. Higher soil bacterial diversity was observed in the soybean cover crop than sugarcane monoculture across all measured indices (observed operationational taxonomic units, Chao1, Shannon, reciprocal Simpson and Jackknife). Acidocateria, Proteobacteria, Bacteroidetes and Planctomycetes were the most abundant bacterial phyla across the treatments. Indicator species analysis identified nine indicator phyla. Planctomycetes, Armatimonadetes and candidate phylum FBP were associated with soybean; Proteobacteria and Firmicutes were linked with sugarcane and Gemmatimonadetes, Nitrospirae, Rokubacteria and unclassified bacteria were associated with fallowed soil. Non-metric multidimensional scaling analysis showed distinct groupings of bacterial operational taxonomic units (97% identity) according to management system (soybean, sugarcane or fallow) indicating compositional differences among treatments. This is confirmed by the results of the multi-response permutation procedures (A = 0.541, p = 0.00045716). No correlation between soil parameters and bacterial community structure was observed according to Mantel test (r = 211865, p = 0.14). Use of soybean cover-crop fostered bacterial diversity and altered community structure. This indicates cover crops could have a restorative effect and potentially promote sustainability in long-term Saccharum production systems.

keywords: Field Data; Genomics

published: 2026-03-02

Data for Transposon Signatures of Allopolyploid Genome Evolution

Session, Adam; Rokhsar, Daniel (2026)

Hybridization brings together chromosome sets from two or more distinct progenitor species. Genome duplication associated with hybridization, or allopolyploidy, allows these chromosome sets to persist as distinct subgenomes during subsequent meioses. Here, we present a general method for identifying the subgenomes of a polyploid based on shared ancestry as revealed by the genomic distribution of repetitive elements that were active in the progenitors. This subgenome-enriched transposable element signal is intrinsic to the polyploid, allowing broader applicability than other approaches that depend on the availability of sequenced diploid relatives. We develop the statistical basis of the method, demonstrate its applicability in the well-studied cases of tobacco, cotton, and Brassica napus, and apply it to several cases: allotetraploid cyprinids, allohexaploid false flax, and allooctoploid strawberry. These analyses provide insight into the origins of these polyploids, revise the subgenome identities of strawberry, and provide perspective on subgenome dominance in higher polyploids.

keywords: Genomics

published: 2026-03-01

Data for Boophis williamsi Abundance Estimates

Edmonds, Devin A.; Fanomezantsoa, Rebecca E.; Rabibisoa, Nirhy H. C.; Roberts, Sam H. (2026)

This dataset contains ecological and demographic data for William’s bright‑eyed frog (Boophis williamsi), a critically endangered amphibian restricted to the Ankaratra Massif in Madagascar’s central highlands. Field surveys were conducted between September 2018 – March 2019 and July 2021 across ten 100‑m stream transects to estimate abundance and identify habitat associations for both tadpoles and adult frogs. Data include repeated counts of individuals and associated habitat variables (e.g., canopy cover, substrate type, stream depth, discharge, and temperature). Abundance was estimated using N‑mixture models implemented in R (version 4.3.1) with the ubms package, with separate models for tadpoles and frogs to account for differences in detection probability. The dataset consists of multiple CSV files capturing microhabitat, environmental variables, and raw survey count data (y_frogs.csv and y_tadpoles.csv) and an R script (boophis_abundance.R) used for model fitting. The dataset was compiled for an article accepted in the Herpetological Journal by the British Herpetological Society and is intended to support long‑term monitoring and conservation planning for B. williamsi and other threatened amphibians in Madagascar.

keywords: amphibian conservation; biodiversity conservation; detection probability; endangered species; N-mixture model

published: 2024-07-29

A Citation Graph from OpenAlex (Works)

Caetano Machado Lopes, Lorran; Chacko, George (2024)

This dataset consists of a citation graph. It was constructed by downloading and parsing the Works section of the Open Alex catalog of the global research system. Open Alex (see citation below) contains detailed information about scholarly research, including articles, authors, journals, institutions, and their relationships. The data were downloaded on 2024-07-15. The dataset comprises two compressed (.xz) files. 1) filename: openalexID_integer_id_hasDOI.parquet.xz. The tabular data within contains three columns: openalex_id, integer_id, and hasDOI. Each row represents a record with the following data types: • openalex_id: A unique identifier from the Open Alex catalog. • integer_id: An integer representing the new identifier (assigned by the authors) • hasDOI: An integer (0 or 1) indicating whether the record has a DOI (0 for no, 1 for yes). 2) filename: citation_table.tsv.xz This edgelist of citations has two columns (no header) of integer values that represent citing and cited integer_id, respectively. Summary Features • Total Nodes (Documents): 256,997,006 • Total Edges (citations): 2,148,871,058 • Documents with DOIs: 163,495,446 • Edges between documents with DOIs: 1,936,722,541 [corrected to 2,148,788,148 edges Nov 13, 2025] • Count of unique nodes in edgelist 111,453,719 [updated Nov 13, 2025] Note: Nov 13, 2025. An improved curation process will be applied to a future version of this dataset Note: Nov 13, 2025. The code used to generate these files can be found here: https://github.com/illinois-or-research-analytics/lorran_openalex/

keywords: citation networks; Open Alex

published: 2021-05-17

An Assessment of the Impacts of Climate Change in Illinois

Wuebbles, D; Angel, J; Petersen, K; Lemke, A.M. (2021)

Please cite as: Wuebbles, D., J. Angel, K. Petersen, and A.M. Lemke, (Eds.), 2021: An Assessment of the Impacts of Climate Change in Illinois. The Nature Conservancy, Illinois, USA. https://doi.org/10.13012/B2IDB-1260194_V1 Climate change is a major environmental challenge that is likely to affect many aspects of life in Illinois, ranging from human and environmental health to the economy. Illinois is already experiencing impacts from the changing climate and, as climate change progresses and temperatures continue to rise, these impacts are expected to increase over time. This assessment takes an in-depth look at how the climate is changing now in Illinois, and how it is projected to change in the future, to provide greater clarity on how climate change could affect urban and rural communities in the state. Beyond providing an overview of anticipated climate changes, the report explores predicted effects on hydrology, agriculture, human health, and native ecosystems.

keywords: Climate change; Illinois; Public health; Agriculture; Environment; Water; Hydrology; Ecosystems

published: 2026-02-25

Data for Locus Coeruleus-Amygdala Circuit Disrupts Prefrontal Control to Impair Fear Extinction

Bayer, Hugo; Binette , Annalise; Sweck, Samantha; Juliano, Vitor; Plas, Samantha; Ferst, Lara; Hassell Jr, James; Maren, Stephen (2026)

Raw data from the article "Locus Coeruleus-Amygdala Circuit Disrupts Prefrontal Control to Impair Fear Extinction", which is accepted for publication in PNAS.

keywords: Basolateral Amygdala; Fear conditioning; Infralimbic cortex; Learning and Memory; Norepinephrine

published: 2026-02-10

Triad kinetics

Ejiogu, Emmanuel; Peters, Baron (2026)

This dataset contains the jupyter notebook and microsoft excel data used to reproduce the results from the eponymous paper. 1. "pourahmady data.xlsx" contains NMR data for triad and dyad sequences in a PVC/Polyethylene copolymer. V is a vinyl chloride segment (-CH2CHCl-) and E is an ethylene segment (-CH2CH2-) VE is the dyad -CH2CHCl-CH2CH2- VC_frac_1 = fraction of vinyl chloride segments obtained from 13C-NMR VC_frac_2 = fraction of vinyl chloride segments obtained from elemental analysis 2. "Triad_Kinetics.ipynb" contains code that fit data from "pourahmady data.xlsx"

published: 2024-12-11

Pretrained models for MMAudio

Cheng, Ho Kei (2024)

MMAudio pretrained models. These models can be used in the open-sourced codebase https://github.com/hkchengrex/MMAudio Note: mmaudio_large_44k_v2.pth and Readme.txt are added to this V2. Other 4 files stay the same.

published: 2026-02-20

Data for Yield from Iowa’s first commercial miscanthus fields: implications of spatial variability for productivity and sustainability beyond research plots

Emran, Shah-Al; Petersen, Bryan M; Roney, Heather Elizabeth ; Masters, Michael David ; Varela, Sebastian; Hedrick, Travis; Leakey, Andrew D.B. ; VanLoocke, Andy; Heaton, Emily A. (2026)

This dataset contains biomass yield measurements and associated vegetation index data collected from commercial Miscanthus × giganteus fields in eastern Iowa during the 2022–2023 growing seasons. The data support the analyses presented in the article: “Yield From Iowa's First Commercial Miscanthus Fields: Implications of Spatial Variability for Productivity and Sustainability Beyond Research Plots.” We collected 105 ground-truth biomass samples from four mature commercial fields (>4 years old) covering 92.81 ha. Samples were taken from 3 m² quadrats that were hand-harvested in alignment with commercial harvest timing. Stem biomass (excluding leaves) was weighed, moisture-corrected, and converted to dry-matter yield expressed in Mg DM ha⁻¹. Sampling locations were selected to capture spatial variability visible in aerial imagery and were recorded using RTK GPS. Each biomass observation was paired with vegetation indices derived from high-resolution PlanetScope satellite imagery (3 m resolution). Images were acquired throughout the growing season, and indices were calculated to evaluate their ability to predict end-of-season biomass yield. Statistical and machine learning approaches were used to identify key predictors, and a linear regression model based on end-of-July Green Normalized Difference Vegetation Index (GNDVI) was developed and evaluated. This repository includes the data used in that modeling workflow. Management practices, economic data, full imagery time series, and additional methodological details are described in the associated publication and are not included here. The dataset consists of three comma-separated value (CSV) files: 1. Combine_Groundtruth_Yield_VI_22_23.csv This file contains ground-truth biomass yield measurements and associated key vegetation index values collected during the 2022 and 2023 growing seasons. Rows: 105 observations Columns: Year — Year of observation (2022 or 2023) Field — Field location identifier Sample_number — Unique sample identifier GNDVI_End_Jul — Green Normalized Difference Vegetation Index calculated at end of July GNDVI_End_Aug — Green Normalized Difference Vegetation Index calculated at end of August NDRE_End_Aug — Normalized Difference Red Edge index calculated at end of August Biomass_Stem_Yield_MgDM/ha — Measured stem biomass yield (megagrams dry matter per hectare) 2. trainData_GNDVI.csv This file contains the subset of observations used to train the predictive relationship between July GNDVI and biomass yield. Rows: 76 observations Columns: Unnamed: 0 — Row index retained from the original data processing workflow GNDVI_End_Jul — GNDVI at end of July Stem_Yield_MgDM/ha — Observed stem biomass yield (Mg DM ha⁻¹) 3. testData_GNDVI.csv This file contains the test dataset used to evaluate model performance. Rows: 29 observations Columns: Unnamed: 0 — Row index retained from the original data processing workflow GNDVI_End_Jul — GNDVI at end of July Predicted_Yield_MgDM/ha — Model-predicted stem biomass yield (Mg DM ha⁻¹) Observed_Yield_MgDM/ha — Measured stem biomass yield (Mg DM ha⁻¹)

keywords: Potential yield, yield gap, in-field management, yield prediction, remote sensing, spatial variability, profitability, Miscanthus × giganteus, M×g

published: 2026-02-19

Improving individual committor estimates and data efficiency in reaction coordinate tests with the Empirical Bayes method

Gurumoorthi, Akshay; Peters, Baron (2026)

The dataset contains a jupyter notebook intended for anyone who wants to apply the Empirical Bayes method described in the paper titled 'Data for Improving individual committor estimates and data efficiency in reaction coordinate tests with the Empirical Bayes method' to committor data with a simple and lucid python script.

published: 2026-02-17

Cline Center Coup d’État Project Dataset

Peyton, Buddy; Bajjalieh, Joseph; Martin, Michael; Gerald, Andrea (2026)

Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019, Chin, Carter and Wright 2021). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.2.2 corrects an error in version 2.2.1 in which the “conspiracy” designation was mistakenly assigned to coup_id: 40411262025. Version 2.2.2 resolves this issue by removing the incorrect designation. Version 2.2.1 adds 67 additional coup events. 47 of these came from examining the Colpus dataset (Chin, Carter, and Wright 2021), and 20 of these events were added to the data set in the normal annual review of potential new coup events. This version also updates the coding to events in Mali in 2012, Serbia in 2000 and Chad in 1979. Version 2.2.0 adds 94 additional coup events. 66 of these came from examining Powell and Thyne’s “discarded” events and 28 of these events were added to the data set in the normal annual review of potential new coup events. This version also updates the coding to events in Brazil in 1945 and the Congo in 1968. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 as a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data set. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011) Version 1.0.0 was released in 2013. This version consolidated coup data taken from the following sources: • The Center for Systemic Peace (Marshall and Marshall, 2007) • The World Handbook of Political and Social Indicators (Taylor and Jodice, 1983) • Coup d’Ètat: A Practical Handbook (Luttwak, 1979) • The Cline Center’s Social, Political and Economic Event Database (SPEED) Project (Nardulli, Althaus and Hayes, 2015) • Government Change in Authoritarian Regimes – 2010 Update (Svolik and Akcinaroglu, 2006) Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.2.2 Codebook.pdf - This 18-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2026 2. Coup Data 2.2.2.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1,161 observations. Revised February 2026 3. Source Document v2.2.2.pdf - This 365-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2026 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in Markdown language. Revised February 2026 Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2026. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.2. February 17. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V10 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Michael Martin, and Andrea Gerald. 2026. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.2. February 17. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V10

published: 2026-02-11

Data for The MagPIE2 Dataset: Magnetic Field-Based Mapping, Localization, and SLAM

Hanley, David; Lee, Jongwon; Choi, Su Yeon; Bretl, Timothy (2026)

If you use this dataset, please cite both the dataset and the associated data paper (bibtex is below). @ARTICLE{11386847, author={Hanley, David and Lee, Jongwon and Choi, Su Yeon and Bretl, Timothy}, journal={IEEE Transactions on Instrumentation and Measurement}, title={The MagPIE2 Dataset for Mapping, Localization, and Simultaneous Localization and Mapping Using Magnetic Fields}, year={2026}, volume={}, number={}, pages={1-1}, keywords={Magnetometers;Magnetic field measurement;Magnetic fields;Pedestrians;Location awareness;Buildings;Simultaneous localization and mapping;Measurement errors;Hardware;Calibration;Localization;mapping;SLAM;dataset;benchmark;magnetometer;magnetic field}, doi={10.1109/TIM.2026.3662919}} We present a dataset for the evaluation of magnetic field-based robotic and pedestrian localization, mapping, and SLAM methods. This dataset contains magnetometer and inertial measurement unit data collected from inside three buildings both a pedestrian and a ground robot. Data were collected at different heights simultaneously, both with and without changes in the placement of objects that may affect magnetometer measurements. In total, approximately 689 square meters of floor space was covered by this dataset. This dataset is archivally stored. We provide a GitHub site which is meant to serve as a forum to post issues with the dataset, share code using the dataset, and to resolve problems: <a href="https://github.com/hanley6/MagPIE2Forum">https://github.com/hanley6/MagPIE2Forum</a> Note that while the dataset is meant to be permanently stored, this forum is not meant to guarantee perennial support and its existence will be dependent on the policies of GitHub. How is the dataset organized? The data is divided into the following parts at a high level and more detailed information can be found in the Readme: 1. The walking portion of the dataset: CSL_WLK.zip, DCL_WLK.zip, Talbot_WLK.zip, and WLK_Misc.zip. 2. The robot portion of the dataset: Robot_Dataset.zip. 3. Motor interference tests: Motor_Interference_Test.zip. 4. Ground truth evaluation: Ground_Truth_Evaluation.zip. 5. Quick start results: Quick_Start_Results.zip. How is data recorded and stored? Data is generally collected in the form of ROS bag files. Each ROS bag has Intel Realsense camera images, magnetometer readings, IMU readings, timestamps, and more as applicable for each file in the dataset. Each bag file has an associated metadata file written as a YAML file. This contains general information about each bag file including the start and stop time, who collected the bag file (during the pedestrian portion of the dataset), and the approximate location where data was collected. In several cases, additional comma separated (csv) files of the dataset where included either as a convenient supplement to ROS bag files (e.g., csv files of magnetometer calibration data) or because they serve as human readable quick start results. How does one set up and run files on the dataset? The files are stored in ROS bags and are, therefore, meant to be run using the Robot Operating System. Information regarding how to use the Robot Operating System as well as installation instructions are available at: <a href="https://ros.org/">https://ros.org/</a>

keywords: Localization; mapping; SLAM; dataset; benchmark; magnetometer; magnetic field

published: 2025-12-23

study of liquid suction cup detachment mechanism

Aly, Abdallah; A. Saif, M. Taher (2025)

The uploaded data is part of the paper titled: Self-Modifying Percolation Governs Detachment in Soft Suction Wet Adhesion, which shows the detachment mechanism of liquid suction-based adhesion.

published: 2026-01-14

Data for: Sequence constraints predispose Class D GPCR STE2 to follow an atypical activation mechanism

Bansal, Prateek; Shukla, Diwakar (2026)

This dataset contains the .npy and .pkl files required to reproduce the plots in the study.

keywords: GPCR; activation; STE2; Class D; molecular dynamics

published: 2025-05-07

Data for "Environmental DNA Metabarcoding of Vertebrates from Central Illinois, United States, 2023-2024"

Reves, Olivia; Larson, Eric (2025)

Data collected at 71 study sites from 2023 to 2024 for Reves, Olivia P. (2025): Using Environmental DNA Metabarcoding to Inform Biodiversity Conservation in Agricultural Landscapes. Master's thesis, University of Illinois Urbana-Champaign. Files include study site information, taxa by site matrices for vertebrates from environmental DNA metabarcoding using multiple mitochondrial DNA primers (COI, 12S), and bird species audibly detected by a phone app at study sites.

keywords: agricultural conservation; biodiversity; eDNA; environmental DNA; Illinois; metabarcoding; riparian buffers; stream flow; vertebrates

published: 2016-05-19

New York City Taxi Trip Data (2010-2013)

Donovan, Brian; Work, Dan (2016)

This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.

keywords: taxi;transportation;New York City;GPS

published: 2025-02-07

Arecibo ISR lag profile data 2016 September Campaign

Wang, Binghui; Kudeki, Erhan (2025)

Incoherent scatter radar datasets collected during the September 2016 campaign at Arecibo have been deposited in this databank. The lag products of the ISR data are stored as lag profile matrices with 5 minutes of integration time. The data is organized in a Python dictionary format, with each file containing 12 lag profile matrices representing one hour of observation. A sample Python script is provided to illustrate its usage.

published: 2025-05-14

Data for egg hyperspectral image

Song, Di (2025)

1228 egg hyperspectral images, the wavelength from 400 nm to 900 nm.

published: 2026-01-22

Data and Code for Tracking the Hidden Trade of Non-native Pet Amphibians in the United States

Edmonds, Devin; Du, Jane; Stickley, Samuel; Sucre, Samuel (2026)

This dataset contains data and R scripts used to analyze the trade of non-native pet amphibians in the United States by integrating online classified advertisements with U.S. Fish and Wildlife Service import records. The data include records of amphibian advertisements, U.S. imports, taxonomic reference lists, and conservation status information. The dataset supports analyses identifying domestically produced species, species entering U.S. markets through unrecorded or unofficial trade pathways, and price differences associated with documented and undocumented trade. The dataset supports the analyses presented in an associated peer-reviewed publication in Biological Conservation.

keywords: amphibian; biocommerce; biosecurity; conservation; LEMIS; pet trade; species laundering; wildlife trade

published: 2026-01-23

Code for "Emulating 2D Materials with Magnons"

Kaman, Bobby; Lim, Jinho; Liu, Yingkai; Hoffmann, Axel (2026)

Data related to a publication, "Emulating 2D Materials with magnons" to be published, but also as a preprint on arXiv https://arxiv.org/abs/2601.03210. It contains scripts for the simulation program Mumax3, and python scripts for conversion and analysis.

keywords: micromagnetics; mumax; tight-binding; spin waves; magnons

published: 2026-01-20

Dataset for "CAMUS: Scalable Phylogenetic Network Estimation"

Willson, James; Warnow, Tandy (2026)

Dataset from "CAMUS: Scalable Phylogenetic Network Estimation." This dataset contains simulated phylogenetic networks, gene trees, and sequence data. - camus-dataset.tar.xz is the main archive containing all the simulated data. More details about the files and directories it contains can be found in README.md - scripts.zip contains various scripts used in the simulation study.

keywords: evolution; computational biology; bioinformatics; phylogenetics

published: 2026-01-21

Data for "Examining Organic Acid Production Potential and Growth-Coupled Strategies in Issatchenkia orientalis Using Constraint-Based Modeling"

Suthers, Patrick; Maranas, Costas (2026)

Growth-coupling product formation can facilitate strain stability by aligning industrial objectives with biological fitness. Organic acids make up many building block chemicals that can be produced from sugars obtainable from renewable biomass. Issatchenkia orientalis is a yeast strain tolerant to acidic conditions and is thus a promising host for industrial production of organic acids. Here, we use constraint-based methods to assess the potential of computationally designing growth-coupled production strains for I. orientalis that produce 22 different organic acids under aerobic or microaerobic conditions. We explore native and engineered pathways using glucose or xylose as the carbon substrates as proxy constituents of hydrolyzed biomass. We identified growth-coupled production strategies for 37 of the substrate-product pairs, with 15 pairs achieving production for any growth rate. We systematically assess the strain design solutions and categorize the underlying principles involved.

keywords: Bioproducts; Modeling

published: 2026-01-19

Data for International (Fair) Trade in Air-Quality-Related Mortality

Wang, Shiyuan (2026)

Note: The GTAP dataset includes a total of 140 regions, some of which are aggregated regions. For all map-related supplementary files (S11, S12, S13), we assign values to each individual country to enhance visualization. Countries within the same aggregated region are assigned the same regional value to maintain consistency across the map. Data S1 (separate file): S1.csv- CSV file detailing production-related deaths for the GTAP dataset. Rows: Each row represents a country where deaths occur as a result of production activities. Columns: Each column represents a country-sector pair on the production side. Values: The values indicate the number of deaths caused by production activities in the country-sector listed in each column and occurring in the country listed in each row. Data S2 (separate file): S2.csv- CSV file detailing production-related deaths for the EORA dataset. Structure: The file has the same structure as S1.csv. Data S3 (separate file): S3.csv- CSV file detailing consumption-related deaths for the GTAP dataset. Rows: Each row represents a country where deaths occur as a result of consumption activities. Columns: Each column represents a consumption country. Values: The values indicate the number of deaths caused by consumption activities in the country listed in the column and occurring in the country listed in the row. Data S4 (separate file): S4.csv- CSV file detailing consumption-related deaths for the EORA dataset. Structure: The file has the same structure as S3.csv. Data S5 (folder of files): S5.zip- a folder containing 141 CSV files, each named after a country's 3-digit code (e.g., USA.csv, CHN.csv), representing production-related spatial PM₂.₅ concentration patterns for all GTAP countries. Rows: Each row corresponds to a grid cell. Columns: Each column represents an industrial sector. The final column, "geometry," contains the spatial coordinates (latitude and longitude) for each grid cell. Values: Each value indicates the PM₂.₅ concentration level (in µg/m³) attributable to emissions from the specified sector in the given country, as they occur in each grid cell. Data S6 (folder of files): S6.zip- a folder containing 188 CSV files, each named after a country's 3-digit code, representing production-related spatial PM₂.₅ concentration patterns for all EORA countries. Structure: Each file follows the same format as those in S5.zip, with rows representing grid cells and columns representing industrial sectors, plus a "geometry" column containing spatial coordinates. Data S7 (separate file): S7.csv- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all GTAP countries. Rows: Each row represents a grid cell. Columns: Apart from the last column ("geometry"), which contains spatial information for each grid cell in latitude-longitude coordinates, each column represents a consumption country. Values: Each value indicates the PM₂.₅ concentration level caused by each country’s consumption process and occurring in each grid cell, measured in µg/m³. Data S8 (separate file): S8.csv- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all EORA countries. Structure: The file has the same structure as S7.csv. Data S9 (separate file): S9.csv- CSV file listing the total net bidirectional export of deaths for all countries in GTAP, displaying only positive values. Columns: "from": The country that exports more consumption-related deaths. "to": The country that imports more consumption-related deaths. "values": The net export of deaths between these two countries, calculated as the difference between the deaths flowing from "from" to "to" and those from "to" to "from." Data S10 (separate file): S10.csv- CSV file listing the total net bidirectional export of deaths for all countries in EORA, displaying only positive values. Structure: The file has the same structure as S9.csv. Data S11 (separate file): S11.csv- CSV file listing the Value of Statistical Lives (VSLs), and consumption-related externalities under three scenarios—Business as Usual (BAU), Global Community (GC), and Fair Trade in Deaths (FTD)—along with externalities per GDP and their differences for GTAP countries. Columns: VSL, BAU_Externality, GC_Externality, FTD_Externality BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC Data S12 (separate file): S12.csv- Same as S11.csv, but for EORA countries. Structure: Identical to S11.csv. Data S13 (separate file): S13.csv- purpose: Includes data used to generate Figures 1, 2, 3, and 5 in the main text. Columns: country_code: 3-letter country code GTAP_region, continent, population, GDP, GDP_capita, VSL export_of_death, import_of_death, net_export, net_export_capita allforeign_world, G50foreign_world, G100foreign_world cause_allforeign_world, cause_L30foreign_world, cause_L50foreign_world BAU_Externality, GC_Externality, FTD_Externality BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC geometry (used for visualization) Data S14 (separate file): S14.xlsx- this Excel file contains six sheets summarizing cross-model Pearson correlation coefficients between sectoral economic activity fractions and transboundary mortality impact metrics, based on both GTAP and EORA datasets. Sheets: Output_fraction_GTAP Direct_demand_fraction_GTAP Final_demand_fraction_GTAP Output_fraction_EORA Direct_demand_fraction_EORA Final_demand_fraction_EORA Rows: Each row represents an economic sector. Columns: G50foreign_world: Fraction of deaths attributable to final demand from regions where demand per capita is more than 50% higher than in the current country. cause_L50foreign_world: Fraction of deaths caused by consumption within the current country but occurring in countries with more than 50% lower demand per capita. Values: Each value represents the Pearson correlation between the sectoral fraction and the corresponding transboundary mortality metric. Data S15 (separate file): S15.csv- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of production-based premature deaths. Column Producer: The producing country–sector pair responsible for the emissions leading to health impacts. Column Affected Country: The country where the resulting premature deaths occur. Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis. Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each producer–affected country pair. Data S16 (separate file): S16.csv- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of consumption-based premature deaths. Column Consumer: The consuming country whose final demand drives the global production and associated health impacts. Column Affected Country: The country where the resulting premature deaths occur. Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis. Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each consumer–affected country combination.

published: 2025-09-18

Data from Assessing Precipitation, Evapotranspiration, and NDVI as Controls of U.S. Great Plains Plant Production

Chen, Maosi; Parton, William J.; Hartman, Melannie D.; Del Grosso, Stephen J.; Smith, William K.; Knapp, Alan; Lutz, Susan; Derner, Justin; Tucker, Compton; Ojima, Dennis; Volesky, Jerry; Stephenson, Mitchell B.; Schacht, Walter H.; Gao, Wei (2025)

Productivity throughout the North American Great Plains grasslands is generally considered to be water limited, with the strength of this limitation increasing as precipitation decreases. We hypothesize that cumulative actual evapotranspiration water loss (AET) from April to July is the precipitation‐related variable most correlated to aboveground net primary production (ANPP) in the U.S. Great Plains (GP). We tested this by evaluating the relationship of ANPP to AET, precipitation, and plant transpiration (Tr). We used multi‐year ANPP data from five sites ranging from semiarid grasslands in Colorado and Wyoming to mesic grasslands in Nebraska and Kansas, mean annual NRCS ANPP, and satellite‐derived normalized difference vegetation index (NDVI) data. Results from the five sites showed that cumulative April‐to‐July AET, precipitation, and Tr were well correlated (R2: 0.54–0.70) to annual changes in ANPP for all but the wettest site. AET and Tr were better correlated to annual changes in ANPP compared to precipitation for the drier sites, and precipitation in August and September had little impact on productivity in drier sites. April‐to‐July cumulative precipitation was best correlated (R2 = 0.63) with interannual variability in ANPP in the most mesic site, while AET and Tr were poorly correlated with ANPP at this site. Cumulative growing season (May‐to‐September) NDVI (iNDVI) was strongly correlated with annual ANPP at the five sites (R2 = 0.90). Using iNDVI as a surrogate for ANPP, we found that county‐level cumulative April–July AET was more strongly correlated to ANPP than precipitation for more than 80% of the GP counties, with precipitation tending to perform better in the eastern more mesic portion of the GP. Including the ratio of AET to potential evapotranspiration (PET) improved the correlation of AET to both iNDVI and mean county‐level NRCS ANPP. Accounting for how different precipitation‐related variables control ANPP (AET in drier portion, precipitation in wetter portion) provides opportunity to develop spatially explicit forecasting of ANPP across the GP for enhancing decision‐making by land managers and use of grassland ANPP for biofuels.

keywords: Sustainability;Field Data;Modeling