Dataset Search

Displaying 26 - 50 of 583 in total

Filters

Subject Area

Life Sciences (353)

Social Sciences (89)

Physical Sciences (86)

Technology and Engineering (49)

Uncategorized

Arts and Humanities (2)

Funder

U.S. Department of Energy (DOE) (188)

Other (155)

U.S. National Science Foundation (NSF) (141)

U.S. National Institutes of Health (NIH) (48)

U.S. Department of Agriculture (USDA) (37)

Illinois Department of Natural Resources (IDNR) (12)

U.S. Geological Survey (USGS) (3)

U.S. National Aeronautics and Space Administration (NASA) (2)

U.S. Army (2)

Illinois Department of Transportation (IDOT) (1)

Publication Year

2025 (188)

2024 (65)

2022 (58)

2021 (54)

2020 (46)

2023 (44)

2026 (36)

2019 (33)

2018 (32)

2016 (13)

2017 (12)

2011 (1)

2015 (1)

License

CC BY (330)

CC0 (240)

custom (13)

Illinois Data Bank Dataset Search Results

Results

published: 2024-07-29

A Citation Graph from OpenAlex (Works)

Caetano Machado Lopes, Lorran; Chacko, George (2024)

This dataset consists of a citation graph. It was constructed by downloading and parsing the Works section of the Open Alex catalog of the global research system. Open Alex (see citation below) contains detailed information about scholarly research, including articles, authors, journals, institutions, and their relationships. The data were downloaded on 2024-07-15. The dataset comprises two compressed (.xz) files. 1) filename: openalexID_integer_id_hasDOI.parquet.xz. The tabular data within contains three columns: openalex_id, integer_id, and hasDOI. Each row represents a record with the following data types: • openalex_id: A unique identifier from the Open Alex catalog. • integer_id: An integer representing the new identifier (assigned by the authors) • hasDOI: An integer (0 or 1) indicating whether the record has a DOI (0 for no, 1 for yes). 2) filename: citation_table.tsv.xz This edgelist of citations has two columns (no header) of integer values that represent citing and cited integer_id, respectively. Summary Features • Total Nodes (Documents): 256,997,006 • Total Edges (citations): 2,148,871,058 • Documents with DOIs: 163,495,446 • Edges between documents with DOIs: 1,936,722,541 [corrected to 2,148,788,148 edges Nov 13, 2025] • Count of unique nodes in edgelist 111,453,719 [updated Nov 13, 2025] Note: Nov 13, 2025. An improved curation process will be applied to a future version of this dataset Note: Nov 13, 2025. The code used to generate these files can be found here: https://github.com/illinois-or-research-analytics/lorran_openalex/

keywords: citation networks; Open Alex

published: 2021-05-17

An Assessment of the Impacts of Climate Change in Illinois

Wuebbles, D; Angel, J; Petersen, K; Lemke, A.M. (2021)

Please cite as: Wuebbles, D., J. Angel, K. Petersen, and A.M. Lemke, (Eds.), 2021: An Assessment of the Impacts of Climate Change in Illinois. The Nature Conservancy, Illinois, USA. https://doi.org/10.13012/B2IDB-1260194_V1 Climate change is a major environmental challenge that is likely to affect many aspects of life in Illinois, ranging from human and environmental health to the economy. Illinois is already experiencing impacts from the changing climate and, as climate change progresses and temperatures continue to rise, these impacts are expected to increase over time. This assessment takes an in-depth look at how the climate is changing now in Illinois, and how it is projected to change in the future, to provide greater clarity on how climate change could affect urban and rural communities in the state. Beyond providing an overview of anticipated climate changes, the report explores predicted effects on hydrology, agriculture, human health, and native ecosystems.

keywords: Climate change; Illinois; Public health; Agriculture; Environment; Water; Hydrology; Ecosystems

published: 2026-02-25

Data for Locus Coeruleus-Amygdala Circuit Disrupts Prefrontal Control to Impair Fear Extinction

Bayer, Hugo; Binette , Annalise; Sweck, Samantha; Juliano, Vitor; Plas, Samantha; Ferst, Lara; Hassell Jr, James; Maren, Stephen (2026)

Raw data from the article "Locus Coeruleus-Amygdala Circuit Disrupts Prefrontal Control to Impair Fear Extinction", which is accepted for publication in PNAS.

keywords: Basolateral Amygdala; Fear conditioning; Infralimbic cortex; Learning and Memory; Norepinephrine

published: 2026-02-10

Triad kinetics

Ejiogu, Emmanuel; Peters, Baron (2026)

This dataset contains the jupyter notebook and microsoft excel data used to reproduce the results from the eponymous paper. 1. "pourahmady data.xlsx" contains NMR data for triad and dyad sequences in a PVC/Polyethylene copolymer. V is a vinyl chloride segment (-CH2CHCl-) and E is an ethylene segment (-CH2CH2-) VE is the dyad -CH2CHCl-CH2CH2- VC_frac_1 = fraction of vinyl chloride segments obtained from 13C-NMR VC_frac_2 = fraction of vinyl chloride segments obtained from elemental analysis 2. "Triad_Kinetics.ipynb" contains code that fit data from "pourahmady data.xlsx"

published: 2026-02-20

Data for Yield from Iowa’s first commercial miscanthus fields: implications of spatial variability for productivity and sustainability beyond research plots

Emran, Shah-Al; Petersen, Bryan M; Roney, Heather Elizabeth ; Masters, Michael David ; Varela, Sebastian; Hedrick, Travis; Leakey, Andrew D.B. ; VanLoocke, Andy; Heaton, Emily A. (2026)

This dataset contains biomass yield measurements and associated vegetation index data collected from commercial Miscanthus × giganteus fields in eastern Iowa during the 2022–2023 growing seasons. The data support the analyses presented in the article: “Yield From Iowa's First Commercial Miscanthus Fields: Implications of Spatial Variability for Productivity and Sustainability Beyond Research Plots.” We collected 105 ground-truth biomass samples from four mature commercial fields (>4 years old) covering 92.81 ha. Samples were taken from 3 m² quadrats that were hand-harvested in alignment with commercial harvest timing. Stem biomass (excluding leaves) was weighed, moisture-corrected, and converted to dry-matter yield expressed in Mg DM ha⁻¹. Sampling locations were selected to capture spatial variability visible in aerial imagery and were recorded using RTK GPS. Each biomass observation was paired with vegetation indices derived from high-resolution PlanetScope satellite imagery (3 m resolution). Images were acquired throughout the growing season, and indices were calculated to evaluate their ability to predict end-of-season biomass yield. Statistical and machine learning approaches were used to identify key predictors, and a linear regression model based on end-of-July Green Normalized Difference Vegetation Index (GNDVI) was developed and evaluated. This repository includes the data used in that modeling workflow. Management practices, economic data, full imagery time series, and additional methodological details are described in the associated publication and are not included here. The dataset consists of three comma-separated value (CSV) files: 1. Combine_Groundtruth_Yield_VI_22_23.csv This file contains ground-truth biomass yield measurements and associated key vegetation index values collected during the 2022 and 2023 growing seasons. Rows: 105 observations Columns: Year — Year of observation (2022 or 2023) Field — Field location identifier Sample_number — Unique sample identifier GNDVI_End_Jul — Green Normalized Difference Vegetation Index calculated at end of July GNDVI_End_Aug — Green Normalized Difference Vegetation Index calculated at end of August NDRE_End_Aug — Normalized Difference Red Edge index calculated at end of August Biomass_Stem_Yield_MgDM/ha — Measured stem biomass yield (megagrams dry matter per hectare) 2. trainData_GNDVI.csv This file contains the subset of observations used to train the predictive relationship between July GNDVI and biomass yield. Rows: 76 observations Columns: Unnamed: 0 — Row index retained from the original data processing workflow GNDVI_End_Jul — GNDVI at end of July Stem_Yield_MgDM/ha — Observed stem biomass yield (Mg DM ha⁻¹) 3. testData_GNDVI.csv This file contains the test dataset used to evaluate model performance. Rows: 29 observations Columns: Unnamed: 0 — Row index retained from the original data processing workflow GNDVI_End_Jul — GNDVI at end of July Predicted_Yield_MgDM/ha — Model-predicted stem biomass yield (Mg DM ha⁻¹) Observed_Yield_MgDM/ha — Measured stem biomass yield (Mg DM ha⁻¹)

keywords: Potential yield, yield gap, in-field management, yield prediction, remote sensing, spatial variability, profitability, Miscanthus × giganteus, M×g

published: 2026-02-17

Cline Center Coup d’État Project Dataset

Peyton, Buddy; Bajjalieh, Joseph; Martin, Michael; Gerald, Andrea (2026)

Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019, Chin, Carter and Wright 2021). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.2.2 corrects an error in version 2.2.1 in which the “conspiracy” designation was mistakenly assigned to coup_id: 40411262025. Version 2.2.2 resolves this issue by removing the incorrect designation. Version 2.2.1 adds 67 additional coup events. 47 of these came from examining the Colpus dataset (Chin, Carter, and Wright 2021), and 20 of these events were added to the data set in the normal annual review of potential new coup events. This version also updates the coding to events in Mali in 2012, Serbia in 2000 and Chad in 1979. Version 2.2.0 adds 94 additional coup events. 66 of these came from examining Powell and Thyne’s “discarded” events and 28 of these events were added to the data set in the normal annual review of potential new coup events. This version also updates the coding to events in Brazil in 1945 and the Congo in 1968. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 as a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data set. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011) Version 1.0.0 was released in 2013. This version consolidated coup data taken from the following sources: • The Center for Systemic Peace (Marshall and Marshall, 2007) • The World Handbook of Political and Social Indicators (Taylor and Jodice, 1983) • Coup d’Ètat: A Practical Handbook (Luttwak, 1979) • The Cline Center’s Social, Political and Economic Event Database (SPEED) Project (Nardulli, Althaus and Hayes, 2015) • Government Change in Authoritarian Regimes – 2010 Update (Svolik and Akcinaroglu, 2006) Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.2.2 Codebook.pdf - This 18-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2026 2. Coup Data 2.2.2.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1,161 observations. Revised February 2026 3. Source Document v2.2.2.pdf - This 365-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2026 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in Markdown language. Revised February 2026 Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2026. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.2. February 17. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V10 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Michael Martin, and Andrea Gerald. 2026. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.2. February 17. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V10

published: 2026-02-11

Data for The MagPIE2 Dataset: Magnetic Field-Based Mapping, Localization, and SLAM

Hanley, David; Lee, Jongwon; Choi, Su Yeon; Bretl, Timothy (2026)

If you use this dataset, please cite both the dataset and the associated data paper (bibtex is below). @ARTICLE{11386847, author={Hanley, David and Lee, Jongwon and Choi, Su Yeon and Bretl, Timothy}, journal={IEEE Transactions on Instrumentation and Measurement}, title={The MagPIE2 Dataset for Mapping, Localization, and Simultaneous Localization and Mapping Using Magnetic Fields}, year={2026}, volume={}, number={}, pages={1-1}, keywords={Magnetometers;Magnetic field measurement;Magnetic fields;Pedestrians;Location awareness;Buildings;Simultaneous localization and mapping;Measurement errors;Hardware;Calibration;Localization;mapping;SLAM;dataset;benchmark;magnetometer;magnetic field}, doi={10.1109/TIM.2026.3662919}} We present a dataset for the evaluation of magnetic field-based robotic and pedestrian localization, mapping, and SLAM methods. This dataset contains magnetometer and inertial measurement unit data collected from inside three buildings both a pedestrian and a ground robot. Data were collected at different heights simultaneously, both with and without changes in the placement of objects that may affect magnetometer measurements. In total, approximately 689 square meters of floor space was covered by this dataset. This dataset is archivally stored. We provide a GitHub site which is meant to serve as a forum to post issues with the dataset, share code using the dataset, and to resolve problems: <a href="https://github.com/hanley6/MagPIE2Forum">https://github.com/hanley6/MagPIE2Forum</a> Note that while the dataset is meant to be permanently stored, this forum is not meant to guarantee perennial support and its existence will be dependent on the policies of GitHub. How is the dataset organized? The data is divided into the following parts at a high level and more detailed information can be found in the Readme: 1. The walking portion of the dataset: CSL_WLK.zip, DCL_WLK.zip, Talbot_WLK.zip, and WLK_Misc.zip. 2. The robot portion of the dataset: Robot_Dataset.zip. 3. Motor interference tests: Motor_Interference_Test.zip. 4. Ground truth evaluation: Ground_Truth_Evaluation.zip. 5. Quick start results: Quick_Start_Results.zip. How is data recorded and stored? Data is generally collected in the form of ROS bag files. Each ROS bag has Intel Realsense camera images, magnetometer readings, IMU readings, timestamps, and more as applicable for each file in the dataset. Each bag file has an associated metadata file written as a YAML file. This contains general information about each bag file including the start and stop time, who collected the bag file (during the pedestrian portion of the dataset), and the approximate location where data was collected. In several cases, additional comma separated (csv) files of the dataset where included either as a convenient supplement to ROS bag files (e.g., csv files of magnetometer calibration data) or because they serve as human readable quick start results. How does one set up and run files on the dataset? The files are stored in ROS bags and are, therefore, meant to be run using the Robot Operating System. Information regarding how to use the Robot Operating System as well as installation instructions are available at: <a href="https://ros.org/">https://ros.org/</a>

keywords: Localization; mapping; SLAM; dataset; benchmark; magnetometer; magnetic field

published: 2025-12-23

study of liquid suction cup detachment mechanism

Aly, Abdallah; A. Saif, M. Taher (2025)

The uploaded data is part of the paper titled: Self-Modifying Percolation Governs Detachment in Soft Suction Wet Adhesion, which shows the detachment mechanism of liquid suction-based adhesion.

published: 2025-05-07

Data for "Environmental DNA Metabarcoding of Vertebrates from Central Illinois, United States, 2023-2024"

Reves, Olivia; Larson, Eric (2025)

Data collected at 71 study sites from 2023 to 2024 for Reves, Olivia P. (2025): Using Environmental DNA Metabarcoding to Inform Biodiversity Conservation in Agricultural Landscapes. Master's thesis, University of Illinois Urbana-Champaign. Files include study site information, taxa by site matrices for vertebrates from environmental DNA metabarcoding using multiple mitochondrial DNA primers (COI, 12S), and bird species audibly detected by a phone app at study sites.

keywords: agricultural conservation; biodiversity; eDNA; environmental DNA; Illinois; metabarcoding; riparian buffers; stream flow; vertebrates

published: 2025-02-07

Arecibo ISR lag profile data 2016 September Campaign

Wang, Binghui; Kudeki, Erhan (2025)

Incoherent scatter radar datasets collected during the September 2016 campaign at Arecibo have been deposited in this databank. The lag products of the ISR data are stored as lag profile matrices with 5 minutes of integration time. The data is organized in a Python dictionary format, with each file containing 12 lag profile matrices representing one hour of observation. A sample Python script is provided to illustrate its usage.

published: 2026-01-20

Dataset for "CAMUS: Scalable Phylogenetic Network Estimation"

Willson, James; Warnow, Tandy (2026)

Dataset from "CAMUS: Scalable Phylogenetic Network Estimation." This dataset contains simulated phylogenetic networks, gene trees, and sequence data. - camus-dataset.tar.xz is the main archive containing all the simulated data. More details about the files and directories it contains can be found in README.md - scripts.zip contains various scripts used in the simulation study.

keywords: evolution; computational biology; bioinformatics; phylogenetics

published: 2026-01-21

Data for "Examining Organic Acid Production Potential and Growth-Coupled Strategies in Issatchenkia orientalis Using Constraint-Based Modeling"

Suthers, Patrick; Maranas, Costas (2026)

Growth-coupling product formation can facilitate strain stability by aligning industrial objectives with biological fitness. Organic acids make up many building block chemicals that can be produced from sugars obtainable from renewable biomass. Issatchenkia orientalis is a yeast strain tolerant to acidic conditions and is thus a promising host for industrial production of organic acids. Here, we use constraint-based methods to assess the potential of computationally designing growth-coupled production strains for I. orientalis that produce 22 different organic acids under aerobic or microaerobic conditions. We explore native and engineered pathways using glucose or xylose as the carbon substrates as proxy constituents of hydrolyzed biomass. We identified growth-coupled production strategies for 37 of the substrate-product pairs, with 15 pairs achieving production for any growth rate. We systematically assess the strain design solutions and categorize the underlying principles involved.

keywords: Bioproducts; Modeling

published: 2026-01-19

Data for International (Fair) Trade in Air-Quality-Related Mortality

Wang, Shiyuan (2026)

Note: The GTAP dataset includes a total of 140 regions, some of which are aggregated regions. For all map-related supplementary files (S11, S12, S13), we assign values to each individual country to enhance visualization. Countries within the same aggregated region are assigned the same regional value to maintain consistency across the map. Data S1 (separate file): S1.csv- CSV file detailing production-related deaths for the GTAP dataset. Rows: Each row represents a country where deaths occur as a result of production activities. Columns: Each column represents a country-sector pair on the production side. Values: The values indicate the number of deaths caused by production activities in the country-sector listed in each column and occurring in the country listed in each row. Data S2 (separate file): S2.csv- CSV file detailing production-related deaths for the EORA dataset. Structure: The file has the same structure as S1.csv. Data S3 (separate file): S3.csv- CSV file detailing consumption-related deaths for the GTAP dataset. Rows: Each row represents a country where deaths occur as a result of consumption activities. Columns: Each column represents a consumption country. Values: The values indicate the number of deaths caused by consumption activities in the country listed in the column and occurring in the country listed in the row. Data S4 (separate file): S4.csv- CSV file detailing consumption-related deaths for the EORA dataset. Structure: The file has the same structure as S3.csv. Data S5 (folder of files): S5.zip- a folder containing 141 CSV files, each named after a country's 3-digit code (e.g., USA.csv, CHN.csv), representing production-related spatial PM₂.₅ concentration patterns for all GTAP countries. Rows: Each row corresponds to a grid cell. Columns: Each column represents an industrial sector. The final column, "geometry," contains the spatial coordinates (latitude and longitude) for each grid cell. Values: Each value indicates the PM₂.₅ concentration level (in µg/m³) attributable to emissions from the specified sector in the given country, as they occur in each grid cell. Data S6 (folder of files): S6.zip- a folder containing 188 CSV files, each named after a country's 3-digit code, representing production-related spatial PM₂.₅ concentration patterns for all EORA countries. Structure: Each file follows the same format as those in S5.zip, with rows representing grid cells and columns representing industrial sectors, plus a "geometry" column containing spatial coordinates. Data S7 (separate file): S7.csv- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all GTAP countries. Rows: Each row represents a grid cell. Columns: Apart from the last column ("geometry"), which contains spatial information for each grid cell in latitude-longitude coordinates, each column represents a consumption country. Values: Each value indicates the PM₂.₅ concentration level caused by each country’s consumption process and occurring in each grid cell, measured in µg/m³. Data S8 (separate file): S8.csv- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all EORA countries. Structure: The file has the same structure as S7.csv. Data S9 (separate file): S9.csv- CSV file listing the total net bidirectional export of deaths for all countries in GTAP, displaying only positive values. Columns: "from": The country that exports more consumption-related deaths. "to": The country that imports more consumption-related deaths. "values": The net export of deaths between these two countries, calculated as the difference between the deaths flowing from "from" to "to" and those from "to" to "from." Data S10 (separate file): S10.csv- CSV file listing the total net bidirectional export of deaths for all countries in EORA, displaying only positive values. Structure: The file has the same structure as S9.csv. Data S11 (separate file): S11.csv- CSV file listing the Value of Statistical Lives (VSLs), and consumption-related externalities under three scenarios—Business as Usual (BAU), Global Community (GC), and Fair Trade in Deaths (FTD)—along with externalities per GDP and their differences for GTAP countries. Columns: VSL, BAU_Externality, GC_Externality, FTD_Externality BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC Data S12 (separate file): S12.csv- Same as S11.csv, but for EORA countries. Structure: Identical to S11.csv. Data S13 (separate file): S13.csv- purpose: Includes data used to generate Figures 1, 2, 3, and 5 in the main text. Columns: country_code: 3-letter country code GTAP_region, continent, population, GDP, GDP_capita, VSL export_of_death, import_of_death, net_export, net_export_capita allforeign_world, G50foreign_world, G100foreign_world cause_allforeign_world, cause_L30foreign_world, cause_L50foreign_world BAU_Externality, GC_Externality, FTD_Externality BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC geometry (used for visualization) Data S14 (separate file): S14.xlsx- this Excel file contains six sheets summarizing cross-model Pearson correlation coefficients between sectoral economic activity fractions and transboundary mortality impact metrics, based on both GTAP and EORA datasets. Sheets: Output_fraction_GTAP Direct_demand_fraction_GTAP Final_demand_fraction_GTAP Output_fraction_EORA Direct_demand_fraction_EORA Final_demand_fraction_EORA Rows: Each row represents an economic sector. Columns: G50foreign_world: Fraction of deaths attributable to final demand from regions where demand per capita is more than 50% higher than in the current country. cause_L50foreign_world: Fraction of deaths caused by consumption within the current country but occurring in countries with more than 50% lower demand per capita. Values: Each value represents the Pearson correlation between the sectoral fraction and the corresponding transboundary mortality metric. Data S15 (separate file): S15.csv- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of production-based premature deaths. Column Producer: The producing country–sector pair responsible for the emissions leading to health impacts. Column Affected Country: The country where the resulting premature deaths occur. Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis. Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each producer–affected country pair. Data S16 (separate file): S16.csv- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of consumption-based premature deaths. Column Consumer: The consuming country whose final demand drives the global production and associated health impacts. Column Affected Country: The country where the resulting premature deaths occur. Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis. Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each consumer–affected country combination.

published: 2025-09-18

Data from Assessing Precipitation, Evapotranspiration, and NDVI as Controls of U.S. Great Plains Plant Production

Chen, Maosi; Parton, William J.; Hartman, Melannie D.; Del Grosso, Stephen J.; Smith, William K.; Knapp, Alan; Lutz, Susan; Derner, Justin; Tucker, Compton; Ojima, Dennis; Volesky, Jerry; Stephenson, Mitchell B.; Schacht, Walter H.; Gao, Wei (2025)

Productivity throughout the North American Great Plains grasslands is generally considered to be water limited, with the strength of this limitation increasing as precipitation decreases. We hypothesize that cumulative actual evapotranspiration water loss (AET) from April to July is the precipitation‐related variable most correlated to aboveground net primary production (ANPP) in the U.S. Great Plains (GP). We tested this by evaluating the relationship of ANPP to AET, precipitation, and plant transpiration (Tr). We used multi‐year ANPP data from five sites ranging from semiarid grasslands in Colorado and Wyoming to mesic grasslands in Nebraska and Kansas, mean annual NRCS ANPP, and satellite‐derived normalized difference vegetation index (NDVI) data. Results from the five sites showed that cumulative April‐to‐July AET, precipitation, and Tr were well correlated (R2: 0.54–0.70) to annual changes in ANPP for all but the wettest site. AET and Tr were better correlated to annual changes in ANPP compared to precipitation for the drier sites, and precipitation in August and September had little impact on productivity in drier sites. April‐to‐July cumulative precipitation was best correlated (R2 = 0.63) with interannual variability in ANPP in the most mesic site, while AET and Tr were poorly correlated with ANPP at this site. Cumulative growing season (May‐to‐September) NDVI (iNDVI) was strongly correlated with annual ANPP at the five sites (R2 = 0.90). Using iNDVI as a surrogate for ANPP, we found that county‐level cumulative April–July AET was more strongly correlated to ANPP than precipitation for more than 80% of the GP counties, with precipitation tending to perform better in the eastern more mesic portion of the GP. Including the ratio of AET to potential evapotranspiration (PET) improved the correlation of AET to both iNDVI and mean county‐level NRCS ANPP. Accounting for how different precipitation‐related variables control ANPP (AET in drier portion, precipitation in wetter portion) provides opportunity to develop spatially explicit forecasting of ANPP across the GP for enhancing decision‐making by land managers and use of grassland ANPP for biofuels.

keywords: Sustainability;Field Data;Modeling

published: 2026-01-12

Data for "Metabolic Engineering Strategies to Produce Medium-Chain Oleochemicals via Acyl-ACP:CoA Transacylase Activity"

Yan, Qiang; Cordell, William; Jindra, Michael; Pfleger, Brian (2026)

Microbial lipid metabolism is an attractive route for producing oleochemicals. The predominant strategy centers on heterologous thioesterases to synthesize desired chain-length fatty acids. To convert acids to oleochemicals (e.g., fatty alcohols, ketones), the narrowed fatty acid pool needs to be reactivated as coenzyme A thioesters at cost of one ATP per reactivation – an expense that could be saved if the acyl-chain was directly transferred from ACP- to CoA-thioester. Here, we demonstrate such an alternative acyl-transferase strategy by heterologous expression of PhaG, an enzyme first identified in Pseudomonads, that transfers 3-hydroxy acyl-chains between acyl-carrier protein and coenzyme A thioester forms for creating polyhydroxyalkanoate monomers. We use it to create a pool of acyl-CoA’s that can be redirected to oleochemical products. Through bioprospecting, mutagenesis, and metabolic engineering, we develop three strains of Escherichia coli capable of producing over 1 g/L of medium-chain free fatty acids, fatty alcohols, and methyl ketones.

keywords: Bioproducts; Metabolomics

published: 2025-10-22

Data for Evaluation of 1,2-Diacyl-3-Acetyl Triacylglycerol Production in Yarrowia lipolytica

Yan, Qiang; Jacobson, Tyler B.; Ye, Zhou; Cortes-Peña, Yoel R.; Bhagwat, Sarang; Hubbard, Susan; Cordell, William T.; Oleniczak, Rebecca E.; Gambacorta, Francesca V.; Rivera-Vasquez, Julio; Shusta, Eric V.; Amador-Noguez, Daniel; Guest, Jeremy; Pfleger, Brian (2025)

Plants produce many high-value oleochemical molecules. While oil-crop agriculture is performed at industrial scales, suitable land is not available to meet global oleochemical demand. Worse, establishing new oil-crop farms often comes with the environmental cost of tropical deforestation. The field of metabolic engineering offers tools to transplant oleochemical metabolism into tractable hosts while simultaneously providing access to molecules produced by non-agricultural plants. Here, we evaluate strategies for rewiring metabolism in the oleaginous yeast Yarrowia lipolytica to synthesize a foreign lipid, 3-acetyl-1,2-diacyl-sn-glycerol (acTAG). Oils made up of acTAG have a reduced viscosity and melting point relative to traditional triacylglycerol oils making them attractive as low-grade diesels, lubricants, and emulsifiers. This manuscript describes a metabolic engineering study that established acTAG production at g/L scale, exploration of the impact of lipid bodies on acTAG titer, and a techno-economic analysis that establishes the performance benchmarks required for microbial acTAG production to be economically feasible.

keywords: Conversion;Sustainability;Biomass Analytics;Lipidomics;Metabolomics

published: 2025-11-03

Data for Tolerance of Engineered Rhodosporidium toruloides to Sorghum Hydrolysates During Batch and Fed-Batch Lipid Production

Woodruff, William; Deshavath, Narendra Naik; Susanto, Vionna; Rao, Christopher V.; Singh, Vijay (2025)

Oleaginous yeasts are a promising candidate for the sustainable conversion of lignocellulosic feedstocks into fuels and chemicals, but their growth on these substrates can be inhibited as a result of upstream pretreatment and enzymatic hydrolysis conditions. Previous studies indicate a high citrate buffer concentration during hydrolysis inhibits downstream cell growth and ethanol fermentation in Saccharomyces cerevisiae. In this study, an engineered Rhodosporidium toruloides strain with enhanced lipid accumulation was grown on sorghum hydrolysate with high and low citrate buffer concentrations. Both hydrolysis conditions resulted in similar sugar recovery rates and concentrations. No significant differences in cell growth, sugar utilization rates, or lipid production rates were observed between the two citrate buffer conditions during batch fermentation of R. toruloides. Under fed-batch growth on low-citrate hydrolysate a lipid titer of 16.7 g/L was obtained. Citrate buffer was not found to inhibit growth or lipid production in this engineered R. toruloides strain, nor did reducing the citrate buffer concentration negatively affect sugar yields in the hydrolysate. As this process is scaled-up, $131 per ton of hydrothermally pretreated biomass can be saved by use of the lower citrate buffer concentration during enzymatic hydrolysis.

keywords: Conversion;Hydrolysate;Lipidomics

published: 2025-10-15

Notothenia coriiceps and Paranotothenia angustata genome assemblies

York, Julia M.; Bhat, Shriram; Kim, Jinmu; Cardenas, Leyla; Cheng, Chi-Hing Christina (2025)

This repository contains supplementary information, alternate genome assemblies, annotation, and predicted protein datasets for Notothenia coriiceps and Paranotothenia angustata genome assemblies. Primary assemblies, mitochondrial assemblies, RNA-Seq data, and raw read data can be found under NCBI Bioproject PRJNA1310647.

keywords: notothenioid; Antarctic; fish; genome; DNA

published: 2025-10-16

Data for Optimizing Chemical-Free Pretreatment for Maximizing Oil/Lipid Recovery from Transgenic Bioenergy Crops and its Rapid Analysis Using Time Domain-NMR

Maitra, Shraddha; Long, Stephen P.; Singh, Vijay (2025)

Transgenic bioenergy crops have shown the potential to produce vegetative oil by accumulating energy-rich triacylglyceride molecules that can be converted into biofuels (biodiesel and biojet). These transgenic crops cater to improved biofuel yield by providing lipids along with cellulosic sugars. Efficient bioprocessing technologies are needed to utilize these transgenic plants to their maximum potential. To this end, this study investigates a low- and high-severity chemical-free hydrothermal pretreatment of transgenic oilcane 1566 bagasse with in situ lipids to maximize the recovery of lipids for biodiesel and fermentable sugars for ethanol with minimal inhibitor generation. Hydrothermal pretreatment at 170°C recovered ∼25% of total lipids in the pretreatment liquor, leaving the remainder in bagasse residue for hexane recovery post fermentation. The recovery of lipids in pretreatment liquor remained constant beyond 170°C. Along with lipids, ∼35% w/w and ∼50% w/w fermentable sugars were recovered post saccharification from bagasse pretreated at 170°C and 210°C for 20 min, respectively. Hydrothermal pretreatment at 170°C for 20 min provided the optimum conditions for maximum recovery of lipids and cellulosic sugars that resulted in enhanced biofuel yield per unit biomass. High severity pretreatment increased the generation of inhibitors beyond the tolerance of fermentation microorganisms. In addition, the application of time-domain proton NMR spectroscopy was extended to bioprocessing. NMR technology facilitated the analysis of total lipids, the composition of fatty acids, and the characterization of free and bound lipids in untreated and pretreated oilcane 1566 bagasse subsequent to each step of biomass to biofuel conversion.

keywords: Conversion;Feedstock Bioprocessing

published: 2025-11-03

Data for Pilot-Scale Processing of Miscanthus x giganteus for Recovery of Anthocyanins Integrated with Production of Microbial Lipids and Lignin-Rich Residue

Banerjee, Shivali; Dien, Bruce; Eilts, Kristen; Sacks, Erik; Singh, Vijay (2025)

Chemical-free hydrothermal pretreatment of Miscanthus x giganteus (Mxg) at the lab scale using high liquid-to-solid ratios resulted in the recovery of anthocyanins and enhanced enzymatic digestibility of residual biomass. In this study, the process is scaled up by using a continuous hydrothermal pretreatment reactor operated at a low liquid-to-solid ratio (50 % w/w solids) as an important step towards commercialization. Anthocyanin yield was 70 % w/w at the pilot scale (50 kg of Mxg), compared to the 94 % w/w yield achieved at the lab scale (0.5 g of Mxg). The pretreated biomass was subsequently refined mechanically using a disc mill to increase the accessibility of cellulose by cellulases. Enzymatic saccharification of the pretreated and disc-milled residue yielded 238 g/L sugar concentration by operating in fed-batch mode at 50 % w/v solids content. Two strains of Rhodosporidium toruloides were evaluated for converting the hydrolysate sugars into microbial lipids, and strain Y-6987 had the highest lipid titer (11.0 g/L). Further, the residue left after enzymatic saccharification was determined to be enriched 1.7-fold in the lignin content. This lignin-rich residue has value as a feedstock for the production of sustainable aviation fuel precursors and other high-value lignin-based chemicals. Hence the proposed biorefinery based on Mxg creates an opportunity for generating revenue from multiple high-value products. As the demand for biofuels and biobased products is rising, the biorefinery products from Mxg would create a niche in the industrial sector.

keywords: Conversion;Feedstock Production;Feedstock Bioprocessing;Hydrolysate;Lipidomics

published: 2025-11-12

Data for Spatially Varying Costs of GHG Abatement with Alternative Cellulosic Feedstocks for Sustainable Aviation Fuels

Fan, Xinxin; Khanna, Madhu; Lee, Yuanyao; Kent, Jeffrey; Shi, Rui; Guest, Jeremy; Lee, DoKyoung (2025)

Cellulosic biomass-based sustainable aviation fuels (SAFs) can be produced from various feedstocks. The breakeven price and carbon intensity of these feedstock-to-SAF pathways are likely to differ across feedstocks and across spatial locations due to differences in feedstock attributes, productivity, opportunity costs of land for feedstock production, soil carbon effects, and feedstock composition. We integrate feedstock to fuel supply chain economics and life-cycle carbon accounting using the same system boundary to quantify and compare the spatially varying greenhouse gas (GHG) intensities and costs of GHG abatement with SAFs derived from four feedstocks (switchgrass, miscanthus, energy sorghum, and corn stover) at 4 km resolution across the U.S. rainfed region. We show that the optimal feedstock for each location differs depending on whether the incentive is to lower breakeven price, carbon intensity, or cost of carbon abatement with biomass or to have high biomass production per unit land. The cost of abating GHG emissions with SAF ranges from $181 Mg−1 CO2e to more than $444 Mg−1 CO2e and is lowest with miscanthus in the Midwest, switchgrass in the south, and energy sorghum in a relatively small region in the Great Plains. While corn stover-based SAF has the lowest breakeven price per gallon, it has the highest cost of abatement due to its relatively high GHG intensity. Our findings imply that different types of policies, such as volumetric targets, tax credits, and low carbon fuel standards, will differ in the mix of feedstocks they incentivize and locations where they are produced in the U.S. rainfed region. Note: Column V in TableS7_DayCentSimulatedYield.csv should be labelled Corn Stover CoSo-NT-50% Max.

keywords: Sustainability;Geospatial;Modeling

published: 2025-09-30

Data from Reactive Species and Reaction Pathways for the Oxidative Cleavage of 4-Octene and Oleic Acid with H2O2 over Tungsten Oxide Catalysts

Yun, Danim; Ayla, E. Zeynep; Bregante, Daniel T.; Flaherty, David W. (2025)

Oxidative cleavage of carbon–carbon double bonds (C═C) in alkenes and fatty acids produces aldehydes and acids valued as chemical intermediates. Solid tungsten oxide catalysts are low cost, nontoxic, and selective for the oxidative cleavage of C═C bonds with hydrogen peroxide (H2O2) and are, therefore, a promising option for continuous processes. Despite the relevance of these materials, the elementary steps involved and their sensitivity to the form of W sites present on surfaces have not been described. Here, we combine in situ spectroscopy and rate measurements to identify significant steps in the reaction and the reactive species present on the catalysts and examine differences between the kinetics of this reaction on isolated W atoms grafted to alumina and on those exposed on crystalline WO3 nanoparticles. Raman spectroscopy shows that W–peroxo complexes (W–(η2-O2)) formed from H2O2 react with alkenes in a kinetically relevant step to produce epoxides, which undergo hydrolysis at protic surface sites. Subsequently, the CH3CN solvent deprotonates diols to form alpha-hydroxy ketones that react to form aldehydes and water following nucleophilic attack of H2O2. Turnover rates for oxidative cleavage, determined by in situ site titrations, on WOx–Al2O3 are 75% greater than those on WO3 at standard conditions. These differences reflect the activation enthalpies (ΔH‡) for the oxidative cleavage of 4-octene that are much lower than those for the isolated WOx sites (36 ± 3 and 60 ± 6 kJ·mol–1 for WOx–Al2O3 and WO3, respectively) and correlate strongly with the difference between the enthalpies of adsorption for epoxyoctane (ΔHads,epox), which resembles the transition state for epoxidation. The WOx–Al2O3 catalysts mediate oxidative cleavage of oleic acid with H2O2 following a mechanism comparable to that for the oxidative cleavage of 4-octene. The WO3 materials, however, form only the epoxide and do not cleave the C–C bond or produce aldehydes and acids. These differences reflect the distinct site requirements for these reaction pathways and indicate that acid sites required for diol formation are strongly inhibited by oleic acids and epoxides on WO3 whereas the Al2O3 support provides sites competent for this reaction and increase the yield of the oxidative cleavage products.

keywords: Catalysis;Conversion

published: 2025-11-03

Data for Catalytic Strategy for Conversion of Triacetic Acid Lactone to Potassium Sorbate

Kim, Min Soo; Choi, Dasol; Ha, Jihyo; Choi, Kyuhyeok; Yu, Jae-Hyuk; Dumesic, James; Huber, George (2025)

This study shows a new route to produce potassium sorbate (KS) from triacetic acid lactone (TAL), which is a chemical platform that can be biologically synthesized from natural sources. Sorbic acid and its potassium salt (KS) are widely used as preservatives in foods and pharmaceuticals. Three steps are used to produce KS from TAL: 1) hydrogenation of TAL into 4-hydroxy-6-methyltetrahydro-2-pyrone (HMP), 2) dehydration of HMP to parasorbic acid (PSA), and 3) ring-opening and hydrolysis of PSA to KS. TAL can be fully hydrogenated over Ni/SiO2 to give near quantitative yields of HMP. A three-step reaction kinetics model was developed for dehydration of HMP into PSA. This model was used to show that the highest PSA yield occurs at low temperatures. An experimental PSA yield of 84.2% with respect to TAL was obtained which agreed with the prediction of the reaction kinetics model. KOH was used as a coreactant for the ring-opening hydrolysis of PSA to produce >99.9% yield of KS from PSA. Tetrahydrofuran (THF) was used to purify the TAL derived-KS (TAL-KS). The TAL-KS had a KS purity of 95.5%. The overall yield of TAL-KS with respect to TAL was calculated to be 77.3%. TAL-KS produced in this study had similar antimicrobial activities as commercial KS.

keywords: Conversion;Catalysis;Modeling

published: 2025-11-12

Data for Carbon-negative Hydrogen: Aqueous Phase Reforming (APR) of Glycerol over NiPt Bimetallic Catalyst Coupled with CO2 Sequestration

Santiago-Martinez, Leoncio; Li, Mengting; Munoz-Briones, Paola; Vergara Zambrano, Javiera; Avraamidou, Styliani; Dumesic, James; Huber, George (2025)

Herein we report the production of high-pressure (19.3 bar), carbon-negative hydrogen (H2) from glycerol with a purity of 98.2 mol% H2, 1.8 mol% light hydrocarbons (mainly methane), and 400 ppm of CO. Aqueous phase reforming (APR) of 10 wt% glycerol solution was studied with a series of NiPt alumina bimetallic catalysts supported on alumina. The Ni8Pt1-450 catalyst had the highest hydrogen selectivity (95.6%) and the lowest alkanes selectivity (3.7%) of the tested catalysts. The hydrogen selectivity decreased in the order of Ni8Pt1-450 > Ni8Pt1-260 > Ni1Pt1-260 > Pt-260. The CO2 was sequestered with CaO adsorbent which formed CaCO3. We measured the adsorption capacity of the CaO adsorbent at different temperatures. Life cycle analysis showed that the APR of glycerol coupled with CO2 capture has net negative CO2 equivalent greenhouse gas emissions. The CO2 emissions are −9.9 kg CO2 eq./kg H2 and −50.1 kg CO2 eq./kg H2 when grid electricity and renewable electricity are used, respectively, and the CO2 is allocated respectively to the mass of products produced. The cost of this H2 (denoted as “green-emerald”) was estimated to be 2.4 USD per kg H2 when grid electricity is used and 2.7 USD per kg H2 when using renewable electricity. The cost of glycerol has the highest contribution of 1.71 USD per kg H2. Participation in the carbon credit markets can further decrease the price of the produced H2.

keywords: Conversion;Catalysis

published: 2025-09-08

Dataset on tracking the data quality landscape of retracted papers: Flag usage in titles and changes in DOI retraction status

Si, Luyang; Salami, Malik Oyewale; Schneider, Jodi (2025)

This work evaluates the consistency and reliability of the title flag, i.e., retraction labeling that appears in the title of retracted publications, using 925 sampled retracted publications indexed in the Crossref only (Lee & Schneider, 2023), that are indexed in three other sources, Retraction Watch, Scopus, and Web of Science as of April 2023. We presume the retraction status of an item based on its title flag. For example, the flag "removal notice" is a retraction notice, and "retracted article" is a retracted paper. We compared the item's likely retraction status from the flag with the item's actual retraction status from the publisher's website.

keywords: Crossref; Data Quality; Title flag; Retraction flag; Retraction flag assessment; Retraction labeling; Retraction indexing; Retracted papers; Retraction notices; Retraction status; RISRS