Illinois Data Bank Dataset Search Results
Results
published:
2026-02-13
Frederick, Samuel; Mohebalhojeh, Matin; Curtis, Jeffrey; West, Matthew; Riemer, Nicole
(2026)
This dateset contains data files necessary to replicate figures from "Idealized Particle-Resolved Large-Eddy Simulations to Evaluate the Impact of Emissions Spatial Heterogeneity on CCN Activity" submitted to Atmospheric Chemistry and Physics.
Within the compressed folder data.zip are two subdirectories, "processed_data" and "spatial-het". The "processed_data" directory contains netCDF files which contain a subset of simulation output used in figure generation. The "spatial-het" subdirectory contains a .csv file with spatial heterogeneity values computed via an exact algorithm of the spatial heterogeneity metric described by Mohebalhojeh et al. 2025. The subdirectory "sh-patterns" contains .csv files for each emissions scenario. Each entry corresponds to a single grid cell over a domain of dimension 100x100 (lateral resolution of the computational domain employed in this paper).
Within scripts.zip are python notebooks for generating figures. Additional python modules are included which contain helper functions for notebooks. Furthermore, a Fortran version of the spatial heterogeneity metric is included alongside shells scripts for creating a python environment in which the code can be compiled and convert into a Python module. Note that the create_env.sh and compile_nsh.sh scripts must be run prior to executing cells in notebooks to make use of the spatial heterogeneity subroutines.
<b>*Note*:</b> New in this V3: During review, a bug regarding vertical diffusion of particles was discovered in WRF-PartMC which necessitated re-running simulations. We present new simulations with diffusion fixed. Furthermore, we have run additional simulations in response to reviewer comments--simulations with emissions turned off at t = 4 h to investigate reversible partitioning and simulations with the RH raised near saturation throughout the domain to model the effects of co-condensation. The README PDF has been updated to reflect changes to the dataset collection. Also, we have added a shell script in scripts_v3.zip which was used to process simulation output and create the data subsets contained in data_v3.zip. Lastly, notebooks were re-run with updated datasets to create manuscript figures and additional plotting routines were added for new figures pertaining to the requested simulations.
keywords:
Atmospheric chemistry; aerosols; Particle-resolved modeling; spatial heterogeneity
published:
2026-02-11
Kim, Hyunhwa; Purba, Denissa Sari Darmawi; Kontou, Eleftheria
(2026)
The dataset and code enable replication of the case study in Section 6 titled "California wildfire energy supply logistics" of the Transportation Research Part E: Logistics and Transportation Review published paper "Bidirectional Energy Supply Logistics Using Uncrewed Electric Aerial and Ground Vehicles: A Two-Echelon Location-Routing Problem with Resource-Constrained Demand Allocation and Time Windows."
keywords:
electric vehicle; energy supply logistics; location-routing problem; bidirectional energy; uncrewed aerial vehicle
published:
2026-02-11
Hanley, David; Lee, Jongwon; Choi, Su Yeon; Bretl, Timothy
(2026)
If you use this dataset, please cite both the dataset and the associated data paper (bibtex is below).
@ARTICLE{11386847,
author={Hanley, David and Lee, Jongwon and Choi, Su Yeon and Bretl, Timothy},
journal={IEEE Transactions on Instrumentation and Measurement},
title={The MagPIE2 Dataset for Mapping, Localization, and Simultaneous Localization and Mapping Using Magnetic Fields},
year={2026},
volume={},
number={},
pages={1-1},
keywords={Magnetometers;Magnetic field measurement;Magnetic fields;Pedestrians;Location awareness;Buildings;Simultaneous localization and mapping;Measurement errors;Hardware;Calibration;Localization;mapping;SLAM;dataset;benchmark;magnetometer;magnetic field},
doi={10.1109/TIM.2026.3662919}}
We present a dataset for the evaluation of magnetic field-based robotic and pedestrian localization, mapping, and SLAM methods. This dataset contains magnetometer and inertial measurement unit data collected from inside three buildings both a pedestrian and a ground robot. Data were collected at different heights simultaneously, both with and without changes in the placement of objects that may affect magnetometer measurements. In total, approximately 689 square meters of floor space was covered by this dataset.
This dataset is archivally stored. We provide a GitHub site which is meant to serve as a forum to post issues with the dataset, share code using the dataset, and to resolve problems: <a href="https://github.com/hanley6/MagPIE2Forum">https://github.com/hanley6/MagPIE2Forum</a>
Note that while the dataset is meant to be permanently stored, this forum is not meant to guarantee perennial support and its existence will be dependent on the policies of GitHub.
<b>How is the dataset organized?</b> The data is divided into the following parts at a high level and more detailed information can be found in the Readme:
1. The walking portion of the dataset: CSL_WLK.zip, DCL_WLK.zip, Talbot_WLK.zip, and WLK_Misc.zip.
2. The robot portion of the dataset: Robot_Dataset.zip.
3. Motor interference tests: Motor_Interference_Test.zip.
4. Ground truth evaluation: Ground_Truth_Evaluation.zip.
5. Quick start results: Quick_Start_Results.zip.
<b>How is data recorded and stored?</b> Data is generally collected in the form of ROS bag files. Each ROS bag has Intel Realsense camera images, magnetometer readings, IMU readings, timestamps, and more as applicable for each file in the dataset. Each bag file has an associated metadata file written as a YAML file. This contains general information about each bag file including the start and stop time, who collected the bag file (during the pedestrian portion of the dataset), and the approximate location where data was collected. In several cases, additional comma separated (csv) files of the dataset where included either as a convenient supplement to ROS bag files (e.g., csv files of magnetometer calibration data) or because they serve as human readable quick start results.
<b>How does one set up and run files on the dataset?</b> The files are stored in ROS bags and are, therefore, meant to be run using the Robot Operating System. Information regarding how to use the Robot Operating System as well as installation instructions are available at: <a href="https://ros.org/">https://ros.org/</a>
keywords:
Localization; mapping; SLAM; dataset; benchmark; magnetometer; magnetic field
published:
2026-02-09
Park, Minhyuk; Chacko, George
(2026)
This dataset consists of a directed network in edge list format where nodes correspond to articles in the scientific literature and edges represent citations. The network was constructed by seed set expansion (two rounds of citing and cited papers ) of the article (seed node) reporting the discovery of PI 3-Kinase activity. " Malcolm Whitman, C Peter Downes, Marilyn Keeler, Tracy Keller, and Lewis Cantley. (1988) Type I phosphatidylinositol kinase makes a novel inositol phospholipid, phosphatidylinositol-3-phosphate. Nature, 332(6165):644–646." The edge list comprises 17,970,340 nodes and 127,255,020 edges.
The dataset was obtained from the Dimensions database via a two-level expansion of the seed node (article). The first expansion included four groups of nodes: the seed node; all publications cited by the seed node; all publications citing the seed node; and all publications cited by publications citing the seed node. The second expansion included all nodes that either cited or were cited by a node in the first expansion set.
Node ids used were converted from the proprietary identifiers in Dimensions using a zero-based sequence of integer_ids [0: (n-1)]. Access to the original identifiers requires a license from Digital Science.
published:
2025-12-23
Aly, Abdallah; A. Saif, M. Taher
(2025)
The uploaded data is part of the paper titled: Self-Modifying Percolation Governs Detachment in Soft Suction Wet Adhesion, which shows the detachment mechanism of liquid suction-based adhesion.
published:
2026-01-28
Nahid, Shahriar Muhammad; Dong, Haiyue; Nolan, Gillian; Nam, Sungwoo; Mason, Nadya; Huang, Pinshane; van der Zande, Arend
(2026)
Room-temperature transfer curves; Benchmarking conductance; STEM images of charged domain walls; Temperature-dependent transfer curves; Scaling of conductance, hopping length, threshold voltage, trap density, and field-effect mobility with temperature; Magnetotransport data; Optical, AFM, and PFM image of different field-effect transistors; STEM images of contacts; Output and transfer curves of FETs; Additional STEM images of charged domain walls; Temperature scaling of subthreshold swing and threshold voltage difference; Comparison of maximum field-effect mobility for different structures
published:
2025-10-29
Chen, Chu-Chun; Dominguez, Francina; Matus, Sean
(2025)
This dataset contains variables from the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 (ERA5; Hersbach et al., 2020). These data were used for the analysis in “The impact of large-scale land surface conditions on the South American low-level jet” published in Geophysical Research Letters.
Acknowledgments:
This work was supported by NSF Award AGS-1852709. We thank Dr. Zhuo Wang and Dr. Divyansh Chug for their valuable feedback and insightful discussions.
References:
Hersbach H, Bell B, Berrisford P, et al. The ERA5 global reanalysis. Q J R Meteorol Soc. 2020; 146: 1999–2049. https://doi.org/10.1002/qj.3803
keywords:
atmospheric sciences; South American low-level jet; land-atmosphere interactions; soil moisture; regional atmospheric circulation; southeastern South America
published:
2026-01-14
Bansal, Prateek; Shukla, Diwakar
(2026)
This dataset contains the .npy and .pkl files required to reproduce the plots in the study.
keywords:
GPCR; activation; STE2; Class D; molecular dynamics
published:
2026-02-01
Xu, Xiaotian; Yao, Yu; Liu, Yicen; Curtis, Jeffrey; West, West; Riemer, Nicole
(2026)
This dataset contains simulation results from PartMC-MOSAIC and WRF-PartMC that used in the journal article: Quantifying the Impact of Surfactants on Cloud Condensation Nuclei Activity Using a Particle-Resolved Model. Two compressed folder are uploaded here, one is for the data that used in this article, the other folder is the python scripts to process the data. For more details of the uploaded files, please check the README file.
keywords:
Surfactants; CCN; Effective surface tension
published:
2026-01-27
Trivellone, Valeria; Canuto, Francesca; Lucetti, Giulia; Dietrich, Christopher H.; Galetto, Luciana; Marzachì, Cristina
(2026)
Trivellone_etal_Full_PaperList_SystRev.xlsx: This dataset contains the list of peer-reviewed studies selected and critically appraised for a systematic review of quantitative PCR (qPCR) investigations tracking phytoplasma load dynamics in insect vectors. The dataset includes bibliographic information and selection status for each study, reflecting the inclusion and exclusion criteria applied during the review process. The literature search was completed on December 15, 2025. The list of inclusion and exclusion criteria are listed in the second spreadsheet.
Further methodological details, including search strategy, screening workflow, and appraisal criteria, are described in the associated paper, “Tracking the early spatio-temporal dynamics of phytoplasma multiplication within its leafhopper vector”, as well as in the Supplementary Materials (see below), by Valeria Trivellone, Francesca Canuto, Giulia Lucetti, Christopher H. Dietrich, Luciana Galetto, Cristina Marzachì.
keywords:
qPCR; systematic review; phytopalsma; multiplication; vector
published:
2025-05-07
Reves, Olivia; Larson, Eric
(2025)
Data collected at 71 study sites from 2023 to 2024 for Reves, Olivia P. (2025): Using Environmental DNA Metabarcoding to Inform Biodiversity Conservation in Agricultural Landscapes. Master's thesis, University of Illinois Urbana-Champaign. Files include study site information, taxa by site matrices for vertebrates from environmental DNA metabarcoding using multiple mitochondrial DNA primers (COI, 12S), and bird species audibly detected by a phone app at study sites.
keywords:
agricultural conservation; biodiversity; eDNA; environmental DNA; Illinois; metabarcoding; riparian buffers; stream flow; vertebrates
published:
2016-05-19
Donovan, Brian; Work, Dan
(2016)
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission.
The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords:
taxi;transportation;New York City;GPS
published:
2025-02-07
Wang, Binghui; Kudeki, Erhan
(2025)
Incoherent scatter radar datasets collected during the September 2016 campaign at Arecibo have been deposited in this databank. The lag products of the ISR data are stored as lag profile matrices with 5 minutes of integration time. The data is organized in a Python dictionary format, with each file containing 12 lag profile matrices representing one hour of observation. A sample Python script is provided to illustrate its usage.
published:
2025-12-18
Marshalla, Dan; Fraterrigo, Jennifer
(2025)
This dataset includes data from a study conducted in southern Illinois, USA, which was published in the Journal of Applied Ecology. The study investigated the interactive effects of fire history and invasion by the non-native grass Microstegium vimineum on fire intensity and oak regeneration in central hardwood forests. The dataset includes data on environmental conditions, historical fire occurrence, experimental fire intensity and fuel load, seedling and juvenile oak characteristics, Microstegium cover, and plot descriptions.
keywords:
Fire-grass-tree interactions; Historical fire regime; Invasive grasses; Microstegium vimineum, Post-fire oak survival; Prescribed fire
published:
2025-05-14
1228 egg hyperspectral images, the wavelength from 400 nm to 900 nm.
published:
2026-01-22
Edmonds, Devin; Du, Jane; Stickley, Samuel; Sucre, Samuel
(2026)
This dataset contains data and R scripts used to analyze the trade of non-native pet amphibians in the United States by integrating online classified advertisements with U.S. Fish and Wildlife Service import records. The data include records of amphibian advertisements, U.S. imports, taxonomic reference lists, and conservation status information. The dataset supports analyses identifying domestically produced species, species entering U.S. markets through unrecorded or unofficial trade pathways, and price differences associated with documented and undocumented trade. The dataset supports the analyses presented in an associated peer-reviewed publication in Biological Conservation.
keywords:
amphibian; biocommerce; biosecurity; conservation; LEMIS; pet trade; species laundering; wildlife trade
published:
2026-01-23
Kaman, Bobby; Lim, Jinho; Liu, Yingkai; Hoffmann, Axel
(2026)
Data related to a publication, "Emulating 2D Materials with magnons" to be published, but also as a preprint on arXiv https://arxiv.org/abs/2601.03210.
It contains scripts for the simulation program Mumax3, and python scripts for conversion and analysis.
keywords:
micromagnetics; mumax; tight-binding; spin waves; magnons
published:
2026-01-20
Willson, James; Warnow, Tandy
(2026)
Dataset from "CAMUS: Scalable Phylogenetic Network Estimation." This dataset contains simulated phylogenetic networks, gene trees, and sequence data.
- camus-dataset.tar.xz is the main archive containing all the simulated data. More details about the files and directories it contains can be found in README.md
- scripts.zip contains various scripts used in the simulation study.
keywords:
evolution; computational biology; bioinformatics; phylogenetics
published:
2026-01-21
Suthers, Patrick; Maranas, Costas
(2026)
Growth-coupling product formation can facilitate strain stability by aligning industrial objectives with biological fitness. Organic acids make up many building block chemicals that can be produced from sugars obtainable from renewable biomass. Issatchenkia orientalis is a yeast strain tolerant to acidic conditions and is thus a promising host for industrial production of organic acids. Here, we use constraint-based methods to assess the potential of computationally designing growth-coupled production strains for I. orientalis that produce 22 different organic acids under aerobic or microaerobic conditions. We explore native and engineered pathways using glucose or xylose as the carbon substrates as proxy constituents of hydrolyzed biomass. We identified growth-coupled production strategies for 37 of the substrate-product pairs, with 15 pairs achieving production for any growth rate. We systematically assess the strain design solutions and categorize the underlying principles involved.
keywords:
Bioproducts; Modeling
published:
2026-01-19
Note: The GTAP dataset includes a total of 140 regions, some of which are aggregated regions. For all map-related supplementary files (S11, S12, S13), we assign values to each individual country to enhance visualization. Countries within the same aggregated region are assigned the same regional value to maintain consistency across the map.
<b>Data S1 (separate file): S1.csv</b>- CSV file detailing production-related deaths for the GTAP dataset.
Rows: Each row represents a country where deaths occur as a result of production activities.
Columns: Each column represents a country-sector pair on the production side.
Values: The values indicate the number of deaths caused by production activities in the country-sector listed in each column and occurring in the country listed in each row.
<b>Data S2 (separate file): S2.csv</b>- CSV file detailing production-related deaths for the EORA dataset.
Structure: The file has the same structure as S1.csv.
<b>Data S3 (separate file): S3.csv</b>- CSV file detailing consumption-related deaths for the GTAP dataset.
Rows: Each row represents a country where deaths occur as a result of consumption activities.
Columns: Each column represents a consumption country.
Values: The values indicate the number of deaths caused by consumption activities in the country listed in the column and occurring in the country listed in the row.
<b>Data S4 (separate file): S4.csv</b>- CSV file detailing consumption-related deaths for the EORA dataset.
Structure: The file has the same structure as S3.csv.
<b>Data S5 (folder of files): S5.zip</b>- a folder containing 141 CSV files, each named after a country's 3-digit code (e.g., USA.csv, CHN.csv), representing production-related spatial PM₂.₅ concentration patterns for all GTAP countries.
Rows: Each row corresponds to a grid cell.
Columns: Each column represents an industrial sector. The final column, "geometry," contains the spatial coordinates (latitude and longitude) for each grid cell.
Values: Each value indicates the PM₂.₅ concentration level (in µg/m³) attributable to emissions from the specified sector in the given country, as they occur in each grid cell.
<b>Data S6 (folder of files): S6.zip</b>- a folder containing 188 CSV files, each named after a country's 3-digit code, representing production-related spatial PM₂.₅ concentration patterns for all EORA countries.
Structure: Each file follows the same format as those in S5.zip, with rows representing grid cells and columns representing industrial sectors, plus a "geometry" column containing spatial coordinates.
<b>Data S7 (separate file): S7.csv</b>- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all GTAP countries.
Rows: Each row represents a grid cell.
Columns: Apart from the last column ("geometry"), which contains spatial information for each grid cell in latitude-longitude coordinates, each column represents a consumption country.
Values: Each value indicates the PM₂.₅ concentration level caused by each country’s consumption process and occurring in each grid cell, measured in µg/m³.
<b>Data S8 (separate file): S8.csv</b>- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all EORA countries.
Structure: The file has the same structure as S7.csv.
<b>Data S9 (separate file): S9.csv</b>- CSV file listing the total net bidirectional export of deaths for all countries in GTAP, displaying only positive values.
Columns:
"from": The country that exports more consumption-related deaths.
"to": The country that imports more consumption-related deaths.
"values": The net export of deaths between these two countries, calculated as the difference between the deaths flowing from "from" to "to" and those from "to" to "from."
<b>Data S10 (separate file): S10.csv</b>- CSV file listing the total net bidirectional export of deaths for all countries in EORA, displaying only positive values.
Structure: The file has the same structure as S9.csv.
<b>Data S11 (separate file): S11.csv</b>- CSV file listing the Value of Statistical Lives (VSLs), and consumption-related externalities under three scenarios—Business as Usual (BAU), Global Community (GC), and Fair Trade in Deaths (FTD)—along with externalities per GDP and their differences for GTAP countries.
Columns:
VSL, BAU_Externality, GC_Externality, FTD_Externality
BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP
Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC
<b>Data S12 (separate file): S12.csv</b>- Same as S11.csv, but for EORA countries.
Structure: Identical to S11.csv.
<b>Data S13 (separate file): S13.csv</b>- purpose: Includes data used to generate Figures 1, 2, 3, and 5 in the main text.
Columns:
country_code: 3-letter country code
GTAP_region, continent, population, GDP, GDP_capita, VSL
export_of_death, import_of_death, net_export, net_export_capita
allforeign_world, G50foreign_world, G100foreign_world
cause_allforeign_world, cause_L30foreign_world, cause_L50foreign_world
BAU_Externality, GC_Externality, FTD_Externality
BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP
Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC
geometry (used for visualization)
<b>Data S14 (separate file): S14.xlsx</b>- this Excel file contains six sheets summarizing cross-model Pearson correlation coefficients between sectoral economic activity fractions and transboundary mortality impact metrics, based on both GTAP and EORA datasets.
Sheets:
Output_fraction_GTAP
Direct_demand_fraction_GTAP
Final_demand_fraction_GTAP
Output_fraction_EORA
Direct_demand_fraction_EORA
Final_demand_fraction_EORA
Rows: Each row represents an economic sector.
Columns:
G50foreign_world: Fraction of deaths attributable to final demand from regions where demand per capita is more than 50% higher than in the current country.
cause_L50foreign_world: Fraction of deaths caused by consumption within the current country but occurring in countries with more than 50% lower demand per capita.
Values: Each value represents the Pearson correlation between the sectoral fraction and the corresponding transboundary mortality metric.
<b>Data S15 (separate file): S15.csv</b>- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of production-based premature deaths.
Column Producer: The producing country–sector pair responsible for the emissions leading to health impacts.
Column Affected Country: The country where the resulting premature deaths occur.
Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis.
Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each producer–affected country pair.
<b>Data S16 (separate file): S16.csv</b>- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of consumption-based premature deaths.
Column Consumer: The consuming country whose final demand drives the global production and associated health impacts.
Column Affected Country: The country where the resulting premature deaths occur.
Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis.
Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each consumer–affected country combination.
published:
2025-09-18
Chen, Maosi; Parton, William J.; Hartman, Melannie D.; Del Grosso, Stephen J.; Smith, William K.; Knapp, Alan; Lutz, Susan; Derner, Justin; Tucker, Compton; Ojima, Dennis; Volesky, Jerry; Stephenson, Mitchell B.; Schacht, Walter H.; Gao, Wei
(2025)
Productivity throughout the North American Great Plains grasslands is generally considered to be water limited, with the strength of this limitation increasing as precipitation decreases. We hypothesize that cumulative actual evapotranspiration water loss (AET) from April to July is the precipitation‐related variable most correlated to aboveground net primary production (ANPP) in the U.S. Great Plains (GP). We tested this by evaluating the relationship of ANPP to AET, precipitation, and plant transpiration (Tr). We used multi‐year ANPP data from five sites ranging from semiarid grasslands in Colorado and Wyoming to mesic grasslands in Nebraska and Kansas, mean annual NRCS ANPP, and satellite‐derived normalized difference vegetation index (NDVI) data. Results from the five sites showed that cumulative April‐to‐July AET, precipitation, and Tr were well correlated (R2: 0.54–0.70) to annual changes in ANPP for all but the wettest site. AET and Tr were better correlated to annual changes in ANPP compared to precipitation for the drier sites, and precipitation in August and September had little impact on productivity in drier sites. April‐to‐July cumulative precipitation was best correlated (R2 = 0.63) with interannual variability in ANPP in the most mesic site, while AET and Tr were poorly correlated with ANPP at this site. Cumulative growing season (May‐to‐September) NDVI (iNDVI) was strongly correlated with annual ANPP at the five sites (R2 = 0.90). Using iNDVI as a surrogate for ANPP, we found that county‐level cumulative April–July AET was more strongly correlated to ANPP than precipitation for more than 80% of the GP counties, with precipitation tending to perform better in the eastern more mesic portion of the GP. Including the ratio of AET to potential evapotranspiration (PET) improved the correlation of AET to both iNDVI and mean county‐level NRCS ANPP. Accounting for how different precipitation‐related variables control ANPP (AET in drier portion, precipitation in wetter portion) provides opportunity to develop spatially explicit forecasting of ANPP across the GP for enhancing decision‐making by land managers and use of grassland ANPP for biofuels.
keywords:
Sustainability;Field Data;Modeling
published:
2026-01-19
Fourkas, Austen; Looney, Leslie
(2026)
This dataset includes the FITS files for all ALMA images used in the ApJ publication "Multiband ALMA Polarization Observations of BHB 07-11 Reveal Aligned Dust Grains in Complex Spiral Arm Structures". Additionally, this dataset includes details regarding the data reduction process so that interested users can perform the reduction and imaging themselves.
keywords:
FITS files; ALMA data; reduction instructions
published:
2026-01-12
Yan, Qiang; Cordell, William; Jindra, Michael; Pfleger, Brian
(2026)
Microbial lipid metabolism is an attractive route for producing oleochemicals. The predominant strategy centers on heterologous thioesterases to synthesize desired chain-length fatty acids. To convert acids to oleochemicals (e.g., fatty alcohols, ketones), the narrowed fatty acid pool needs to be reactivated as coenzyme A thioesters at cost of one ATP per reactivation – an expense that could be saved if the acyl-chain was directly transferred from ACP- to CoA-thioester. Here, we demonstrate such an alternative acyl-transferase strategy by heterologous expression of PhaG, an enzyme first identified in Pseudomonads, that transfers 3-hydroxy acyl-chains between acyl-carrier protein and coenzyme A thioester forms for creating polyhydroxyalkanoate monomers. We use it to create a pool of acyl-CoA’s that can be redirected to oleochemical products. Through bioprospecting, mutagenesis, and metabolic engineering, we develop three strains of Escherichia coli capable of producing over 1 g/L of medium-chain free fatty acids, fatty alcohols, and methyl ketones.
keywords:
Bioproducts; Metabolomics
published:
2025-10-22
Yan, Qiang; Jacobson, Tyler B.; Ye, Zhou; Cortes-Peña, Yoel R.; Bhagwat, Sarang; Hubbard, Susan; Cordell, William T.; Oleniczak, Rebecca E.; Gambacorta, Francesca V.; Rivera-Vasquez, Julio; Shusta, Eric V.; Amador-Noguez, Daniel; Guest, Jeremy; Pfleger, Brian
(2025)
Plants produce many high-value oleochemical molecules. While oil-crop agriculture is performed at industrial scales, suitable land is not available to meet global oleochemical demand. Worse, establishing new oil-crop farms often comes with the environmental cost of tropical deforestation. The field of metabolic engineering offers tools to transplant oleochemical metabolism into tractable hosts while simultaneously providing access to molecules produced by non-agricultural plants. Here, we evaluate strategies for rewiring metabolism in the oleaginous yeast Yarrowia lipolytica to synthesize a foreign lipid, 3-acetyl-1,2-diacyl-sn-glycerol (acTAG). Oils made up of acTAG have a reduced viscosity and melting point relative to traditional triacylglycerol oils making them attractive as low-grade diesels, lubricants, and emulsifiers. This manuscript describes a metabolic engineering study that established acTAG production at g/L scale, exploration of the impact of lipid bodies on acTAG titer, and a techno-economic analysis that establishes the performance benchmarks required for microbial acTAG production to be economically feasible.
keywords:
Conversion;Sustainability;Biomass Analytics;Lipidomics;Metabolomics
published:
2025-11-20
Yan, Qiang; Cordell, William; Breckner, Christian; Chen, Xuanqi; Jindra, Michael; Pfleger, Brian
(2025)
Medium-chain length methyl ketones are potential blending fuels due to their cetane numbers and low melting temperatures. Biomanufacturing offers the potential to produce these molecules from renewable resources such as lignocellulosic biomass. In this work, we designed and tested metabolic pathways in Escherichia coli to specifically produce 2-heptanone, 2-nonanone and 2-undecanone. We achieved substantial production of each ketone by introducing chain-length specific acyl-ACP thioesterases, blocking the β-oxidation cycle at an advantageous reaction, and introducing active β-ketoacyl-CoA thioesterases. Using a bioprospecting approach, we identified 15 homologs of E. coli β-ketoacyl-CoA thioesterase (FadM) and evaluated the in vivo activity of each against various chain length substrates. The FadM variant from Providencia sneebia produced the most 2-heptanone, 2-nonanone, and 2-undecanone, suggesting it has the highest activity on the corresponding β-ketoacyl-CoA substrates. We tested enzyme variants, including acyl-CoA oxidases, thiolases, and bi-functional 3-hydroxyacyl-CoA dehydratases to maximize conversion of fatty acids to β-keto acyl-CoAs for 2-heptanone, 2-nonanone, and 2-undecanone production. In order to address the issue of product loss during fermentation, we applied a 20% (v/v) dodecane layer in the bioreactor and built an external water cooling condenser connecting to the bioreactor heat-transferring condenser coupling to the condenser. Using these modifications, we were able to generate up to 4.4 g/L total medium-chain length methyl ketones.
keywords:
Metabolomics; Metabolic Engineering