Illinois Data Bank
Displaying 26 - 50 of 840 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2025-09-10
 
Conversion of corn fiber to ethanol in the dry grind process could increase ethanol yields, reduce downstream processing costs and improve overall process profitability. This work investigates the in-situ conversion of corn fiber into ethanol (cellulase addition during simultaneous saccharification and fermentation) during dry grind process. Addition of 30 FPU/g fiber cellulase resulted in 4.6% increase in ethanol yield compared to the conventional process. Use of excess cellulase (120 FPU/g fiber) resulted in incomplete fermentation and lower ethanol yield compared to the conventional process. Multiple factors including high concentrations of ethanol and phenolic compounds were responsible for yeast stress and incomplete fermentation in excess cellulase experiments.
keywords: Conversion;Feedstock Bioprocessing
published: 2025-09-09
 
Most native producers of ribosomally synthesized and post-translationally modified peptides (RiPPs) utilize N-terminal leader peptides to avoid potential cytotoxicity of mature products to the hosts. Unfortunately, the native machinery of leader peptide removal is often difficult to reconstitute in heterologous hosts. Here we devised a general method to produce bioactive lanthipeptides, a major class of RiPP molecules, in Escherichia coli colonies using synthetic biology principles, where leader peptide removal is programmed temporally by protease compartmentalization and inducible cell autolysis. We demonstrated the method for producing two lantibiotics, haloduracin and lacticin 481, and performed analog screening for haloduracin. This method enables facile, high throughput discovery, characterization, and engineering of RiPPs.
keywords: Conversion;Genome Engineering;Genomics
published: 2024-06-04
 
This dataset contains files and relevant metadata for real-world and synthetic LFR networks used in the manuscript "Well-Connectedness and Community Detection (2024) Park et al. presently under review at PLOS Complex Systems. The manuscript is an extended version of Park, M. et al. (2024). Identifying Well-Connected Communities in Real-World and Synthetic Networks. In Complex Networks & Their Applications XII. COMPLEX NETWORKS 2023. Studies in Computational Intelligence, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-031-53499-7_1. “The Overview of Real-World Networks image provides high-level information about the seven real-world networks. TSVs of the seven real-world networks are provided as [network-name]_cleaned to indicate that duplicated edges and self-loops were removed, where column 1 is source and column 2 is target. LFR datasets are contained within the zipped file. Real-world networks are labeled _cleaned_ to indicate that duplicate edges and self loops were removed. #LFR datasets for the Connectivity Modifier (CM) paper ### File organization Each directory `[network-name]_[resolution-value]_lfr` includes the following files: * `network.dat`: LFR network edge-list * `community.dat`: LFR ground-truth communities * `time_seed.dat`: time seed used in the LFR software * `statistics.dat`: statistics generated by the LFR software * `cmd.stat`: command used to run the LFR software as well as time and memory usage information
published: 2023-03-16
 
Curated networks and clustering output from the manuscript: Well-Connected Communities in Real-World Networks https://arxiv.org/abs/2303.02813
keywords: Community detection; clustering; open citations; scientometrics; bibliometrics
published: 2024-02-16
 
This dataset contains five files. (i) open_citations_jan2024_pub_ids.csv.gz, open_citations_jan2024_iid_el.csv.gz, open_citations_jan2024_el.csv.gz, and open_citation_jan2024_pubs.csv.gz represent a conversion of Open Citations to an edge list using integer ids assigned by us. The integer ids can be mapped to omids, pmids, and dois using the open_citation_jan2024_pubs.csv and open_citations_jan2024_pub_ids.scv files. The network consists of 121,052,490 nodes and 1,962,840,983 edges. Code for generating these data can be found https://github.com/chackoge/ERNIE_Plus/tree/master/OpenCitations. (ii) The fifth file, baseline2024.csv.gz, provides information about the metadata of PubMed papers. A 2024 version of PubMed was downloaded using Entrez and parsed into a table restricted to records that contain a pmid, a doi, and has a title and an abstract. A value of 1 in columns indicates that the information exists in metadata and a zero indicates otherwise. Code for generating this data: https://github.com/illinois-or-research-analytics/pubmed_etl. If you use these data or code in your work, please cite https://doi.org/10.13012/B2IDB-5216575_V1.
keywords: PubMed
published: 2024-07-29
 
This dataset consists of a citation graph. It was constructed by downloading and parsing the Works section of the Open Alex catalog of the global research system. Open Alex (see citation below) contains detailed information about scholarly research, including articles, authors, journals, institutions, and their relationships. The data were downloaded on 2024-07-15. The dataset comprises two compressed (.xz) files. 1) filename: openalexID_integer_id_hasDOI.parquet.xz. The tabular data within contains three columns: openalex_id, integer_id, and hasDOI. Each row represents a record with the following data types: • openalex_id: A unique identifier from the Open Alex catalog. • integer_id: An integer representing the new identifier (assigned by the authors) • hasDOI: An integer (0 or 1) indicating whether the record has a DOI (0 for no, 1 for yes). 2) filename: citation_table.tsv.xz This edgelist of citations has two columns (no header) of integer values that represent citing and cited integer_id, respectively. Summary Features • Total Nodes (Documents): 256,997,006 • Total Edges (citations): 2,148,871,058 • Documents with DOIs: 163,495,446 • Edges between documents with DOIs: 1,936,722,541 The code used to generate these files can be found here: https://github.com/illinois-or-research-analytics/lorran_openalex/
keywords: citation networks; Open Alex
published: 2025-08-16
 
The data within consist of compressed output files in the form of edgelists (*.edgelist.gz) and nodelists (*.aux.parquet) from large citation network simulations using an agent-based model. The code and instructions are available at: <a href="https://github.com/illinois-or-research-analytics/SASCA">https://github.com/illinois-or-research-analytics/SASCA</a>. In addition, we provide a distribution of citation frequencies drawn from a random sample of PubMed journal articles (pooled_50k_pubmed_unique.csv) and a table of recencies- the frequency with which citations are made to the previous year, the year before that and so on (recency_probs_percent_stahl_filled.csv). A manuscript describing the SASCA-s simulator has been submitted for review and will be referenced in a future version of this data repository if it is accepted. The prefixes sj and er refer to the real world and Erdos-Renyi random graph respectively that were used to initiate simulations. These 'seed' networks are available from the Github site referenced above.
keywords: benchmark networks; agent-based models; simulation; citation
published: 2025-08-17
 
These codes implement the master equation microkinetic modeling (ME-MKM) calculations of Adams et al. (J. Phys. Chem. C 2025, 129, 15, 7285–7294), as well as the automatic derivatives for activation energies and reaction orders in their follow-up work (in review).
keywords: Microkinetic model; master equation; periodic tiling; catalysis; adsorption;
published: 2025-09-08
 
This is the data set for the article entitled "Pollinator seed mixes are phenologically dissimilar to prairie remnants," a manuscript pending publication in Restoration Ecology. This represents the core phenology data of prairie remnant and pollinator seed mixes that were used for the main analyses. Note that additional data associated with the manuscript are intended to be published as a supplement in the journal. * In this V2, a second tab was added to the Rest.Ecol.data.xlsx file. This new sheet listed original data source citations that match the RELIX data base, a sister project.
keywords: native plants; ecological restoration; tallgrass prairie; native plant materials
published: 2025-09-08
 
Purpose-grown perennial herbaceous species are nonfood crops specifically cultivated for bioenergy production and have the potential to secure bioenergy feedstock resources while enhancing ecosystem services. This study assessed soil greenhouse gas emissions (CO2 and N2O), nitrate (NO3-N) leaching reduction potential, evapotranspiration (ET), and water-use efficiency (WUE) of bioenergy switchgrass (Panicum virgatum L.) in comparison to corn (Zea mays L.). The study was conducted on field-scale plots in Urbana, IL, during the 2020–2022 growing seasons. Switchgrass was established in 2020 and urea-fertilized at 56 kg N ha−1 year−1. Corn management followed best management practices for the US Midwest, including no-till and 202 kg N ha−1 year−1 fertilization, applied as urea–ammonium nitrate (32%). Our results showed lower direct N2O emissions in switchgrass compared to corn. Although soil CO2 emissions did not differ significantly during the establishment year, emissions in subsequent years were over 50% higher in switchgrass than in corn, likely due to increased belowground biomass, which was over five times higher in switchgrass. Nitrate-N leaching decreased as the switchgrass stand matured, reaching 80% lower than in corn by the third year. Differences in ET and WUE between corn and switchgrass were not significant; however, results indicate a trend toward reduced WUE in switchgrass under drought, driven by lower aboveground biomass production. Our study demonstrates that switchgrass can be implemented at a commercial scale without negatively impacting the hydrological cycle, while potentially reducing N losses through nitrate-N leaching and soil N2O emissions, and enhancing belowground C storage.
keywords: field data; perennial bioenergy grasses; soil; switchgrass
published: 2025-09-08
 
Miscanthus x giganteus (Mxg) is a promising perennial crop for producing natural colorants, renewable fuels, and bioproducts. However, natural recalcitrance and high pretreatment cost are major barriers to their complete conversion. In this study, a green processing method has been investigated for efficient recovery of natural pigments (anthocyanins), fermentable sugars, and pure lignin from Mxg genotypes using choline chloride-based natural deep eutectic solvents (NADES) systems. Interestingly, choline chloride: lactic acid (ChCl: LA) NADES-processed biomass resulted in 67.8 ± 2.1 μg g−1 of anthocyanins from dry biomass. A maximum of 87.4%–94.1% glucose yield was achieved after enzymatic saccharification. The effective extraction of lignin with high purity with higher β-aryl ether (βO4) bonds from advanced crops is crucial for lignin valorization. Notably, highly pure lignin (≈93.4% ± 1.4%) is achieved after low-temperature NADES pretreatment while retaining lignin’s native structure. 31P nuclear magnetic resonance demonstrated that total phenolics for ChCl: LA-lignin resulted in 1.20 mmol g−1 hydroxyls. The relative monolignol composition of syringyl (S), guaiacyl (G), and p-hydroxyphenyl (H) is 19.0, 65.7, and 14.3%, respectively, as evidenced by heteronuclear single quantum coherence analysis. This study provides a novel approach for obtaining high-purity lignin for catalytic depolymerization for oligomers and bifunctional monoaromatics production and leverages current cellulosic biorefinery technologies.
keywords: biomass analytics; feedstock bioprocessing; inter-brc; miscanthus
published: 2025-09-06
 
4D-STEM datasets for solution-treated (CrCoNi)93Al4Ti2Nb MEA in [111], [112], and [114] zone. Data used for Ultramicroscopy article "Differentiating electron diffuse scattering via 4D-STEM spatial fluctuation and correlation analysis in complex FCC alloys". Experiment details can be found in the paper. Data-specific details are listed in the Readme file.
keywords: 4D-STEM; MEA; Electron Diffuse-Scattering; FluCor
published: 2025-05-27
 
This dataset contains all raw and processed data used to generate the figures in the main text and supplementary material of the paper "High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit." The data can be used to reproduce the plots and validate the analysis. Accompanying Jupyter notebooks provide step-by-step analysis pipelines for figure generation. The dataset also includes drawings for the mechanical samples used to perform the experiment. In addition, the dataset provides ANSYS HFSS electromagnetic simulation files used to design and analyze the resonator structures and estimate field distributions.
keywords: superconducting qubit; magnon sensing; hybrid quantum systems; spin-photon coupling; magnon decay; cavity QED
published: 2025-05-21
 
Raw data of Auchenorrhyncha (Hemiptera) species presence and abundance from samples collected as part of Morgan Brown's M.S. thesis entitled "Investigating changes in Auchenorrhyncha (Hemiptera) communities in Illinois prairies over 25 years." Collection_Events_MBrown.pdf contains information that corresponds to each collection event code listed in the raw data files, including coordinates, date of collection, collection method, and name of collector. Each CSV file contains Auchenorrhyncha species presence and abundance data from each sampling area in Illinois: Route 45 Railroad Prairie, Richardson Wildlife Foundation, Mason County nature preserves, and Twelve Mile Prairie. Variables included in the CSV files include: Family: Taxonomic family to which each species belongs Subfamily: Taxonomic subfamily to which each species belongs Tribe: Taxonomic tribe to which each species belongs Species: Lowest taxonomic level to which individuals were identified The first row of column 5 to the end are collection event codes which correspond to each code listed in the PDF * New in V2: The CSV files originally uploaded in V1 contained outdated species names. V2 provides updated CSV files with the corrected names.
keywords: Biodiversity; Entomology; Conservation
published: 2025-06-10
 
This dataset contains all the raw and processed data used to generate the figures presented in the main text and the supplementary information of the paper "Operation of a high frequency, phase slip qubit." It also includes code for data analysis and code for generating the figures.
keywords: phase slip qubit; superconducting qubit; quantum information; disordered superconductors
published: 2025-06-26
 
This dataset supports the analysis presented in the study on curbside electric vehicle (EV) charging infrastructure planning in San Francisco and the published paper titled "Urban electric vehicle infrastructure: Strategic planning for curbside charging." It includes spatial data layers and tabular data used to evaluate location suitability under multiple criteria, such as demand, accessibility, and environmental benefits. This dataset can be used to replicate the multi-criteria decision-making framework, perform additional spatial analyses, or inform policy decisions related to EV infrastructure siting in urban environments.
keywords: Electric Vehicles; Curbside Charging Stations; Multi-Criteria Decision-Making; Suitability Analysis; Urban Infrastructure
published: 2025-07-23
 
Supplementary data and code associated with the Biogeosciences paper published by Cecilia Prada et al. "Soil and Biomass Carbon Storage is Much Higher in Central American than Andean Montane Forests". There are 16 files associated with this paper (1) AGB.csv providing the site, plot, treeID, mnemn, family, agb, and AGcarbon for each tree in the dataset. Column headings are described in the file AGB_metadata.csv (2) AGB_metadata.csv Metadata (column descriptions) for AGB.csv (3) CWD_D.csv Complete information on the downed coarse woody debris (CWD) measured in each plot (4) CWD_D_metadata.csv Metadata (column descriptions) for CWD_D.csv (5) CWD_S.csv Complete information on the standing coarse woody debris measured in each plot (6) CWD_S_metadata.csv Metadata (column descriptions) for CWD_S.csv (7) SoilC.csv Estimated soil carbon storage (Mg C) at each sampling location in each plot (8) SoilC_metadata.csv Metadata (column descriptions) for SoilC.csv (9) Table.csv Data source, soil carbon value (Mg C) and elevation from published data sources (10) Table_metadata.csv Metadata (column descriptions) for Table.csv (11) TableS1.csv Data source, above ground carbon value (Mg C) and elevation from published data sources (12) TableS1_metadata.csv Metadata (column descriptions) for TableS1.csv (13) RScript.R Annotated code for data analysis and figures (14)Full_dataset.csv Full set of environmental data and carbon data by plot (15) Full_dataset_metadata.csv Metadata (column descriptions) for Full_dataset.csv (16) Species list and species codes.csv Full family, genus and species names for the species codes (column mnemn in AGB.csv)
keywords: tropical forest; carbon storage
published: 2025-07-30
 
This dataset includes three data files for linking species' climate sensitivity, trait combinations, and listing status. It contains species occurrence data within Hydrologic Unit Code 12 (HUC12) watersheds, along with trait information and Rarity and Climate Sensitivity (RCS) index scores for lotic caddisflies, stoneflies, mussels, dragonflies, and crayfish across all Midwest Climate Adaptation Science Center states: Minnesota, Iowa, Missouri, Wisconsin, Illinois, Indiana, Michigan, and Ohio. For mussels, the geographic scope is expanded to include all Midwest Regional Species of Greatest Conservation Need (RSGCN) states—North Dakota, South Dakota, Nebraska, Kansas, and Kentucky. However, occurrence data for mussels is not included due to data-sharing agreements. Metadata are included with each data file. Please refer to the associated manuscript for original data sources, trait references, and details on the RCS index calculation.
keywords: climate sensitivity; conservation status; traits; aquatic invertebrates; Midwest
published: 2025-08-07
 
Dataset generated using the technique described in "EC-SBM synthetic network generator". This contains multiple synthetic networks with ground-truth community structure, which can be used to evaluate community detection methods. Note: * networks.zip contains the synthetic networks
keywords: network science; synthetic networks; community detection; tsv
published: 2016-05-19
 
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords: taxi;transportation;New York City;GPS
published: 2025-09-01
 
Chronic wasting disease (CWD) surveillance data from Illinois and Wisconsin, USA between the fiscal years 2003 and 2022 (calendar years 2002 and 2021). Data is reported at the township level as defined by the US Public Survey System. CWD cases, animals tested for CWD, and the apparent prevalence calculated from these values are given by township and fiscal year. Data has been anonymized by replacing original township names with identification numbers to maintain the privacy of landowners. Variables include Tests, Cases, and nonlinear transformations of Tests and Cases (inverse, square root, and log transformations).
keywords: chronic wasting disease; cwd; white-tailed deer; deer; cervid; prion; apparent prevalence; prevalence; surveillance
published: 2025-08-04
 
This dataset contains the data used for the publication “Aboveground rather than belowground productivity drives variability in Miscanthus x giganteus net primary productivity”. This dataset contains Miscanthus x giganteus biomass, carbon, and nitrogen tissue data for aboveground and belowground plant parts collected in 2021 for three different sites in Iowa with three different nitrogen application rates. Data at the Iowa sites were collected via biometric hand harvesting, belowground excavations, and soil coring both in-clump and beside-clump. Data were collected at two collection timepoints to calculate the contributions of belowground parts to Miscanthus x giganteus net primary productivity. This dataset also includes Miscanthus x giganteus and Switchgrass soil coring and excavation data collected in 2012 at the University of Illinois Urbana Champaign Energy Farm.
keywords: Miscanthus; Net Primary Productivity; Excavation; Nitrogen fertilization; Translocation; Belowground Biomass; Carbon
published: 2025-08-01
 
Physiological and yield data from a three year field experiment of soybean exposed to elevated ozone stress and reduced soil moisture at the SoyFACE experiment.
keywords: soybean; ozone; drought; photosynthesis; yield
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us