Illinois Data Bank Dataset Search Results
Results
published:
2022-08-08
Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy
(2022)
This upload contains all datasets used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment".
The zip file has the following structure (presented as an example):
salma_paper_datasets/
|_README.md
|_10aa/
|_crw/
|_homfam/
|_aat/
| |_...
|_...
|_het/
|_5000M2-het/
| |_...
|_5000M3-het/
...
|_rec_res/
Generally, the structure can be viewed as:
[category]/[dataset]/[replicate]/[alignment files]
# Categories:
1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.
2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).
3. homfam: There are the 10 largest Homfam datasets, each with one replicate.
4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.
5. rec\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.
# Alignment files
There are at most 6 `.fasta` files in each sub-directory:
1. `all.unaln.fasta`: All unaligned sequences.
2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.
3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).
4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.
5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).
6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.
>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.
>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.
>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.
# Additional file(s)
1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.
keywords:
SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published:
2022-12-21
Sherwood, Joshua; Tiemann, Jeremy; Stein, Jeffrey
(2022)
This dataset is associated with a larger manuscript published in 2022 in the Illinois Natural History Survey Bulletin that summarized the Fishes of Champaign County project from 2012-2015. With data spanning over 120 years, the Fishes of Champaign County is a comprehensive, long-term investigation into the changing fish communities of east-central Illinois. Surveys first occurred in Champaign County in the late 1880s (40 sites), with subsequent surveys in 1928–1929 (125 sites), 1959–1960 (143 sites), and 1987–1988 (141 sites). Between 2012 and 2015, we resampled 122 sites across Champaign County. The combined data from these five surveys have produced a unique perspective into not only the fish communities of the region, but also insight into in-stream habitat changes during the past 120 years.
The dataset is in Microsoft Access format, with five data tables, one for each time period surveyed. Field names are self-explanatory, with some variation in data types collected during different surveys as follows: Forbes & Richardson (1880s) collected presence/absence only. Thompson & Hunt (1928-1929) collected abundance only, Larimore & Smith (1959-1960) collected length and weight for some samples, but only presence/absence at others. In some cases, fish of the same species were weighed in bulk, with the fields “LOW” and “HIGH” indicating the lower and upper limits of total length in the batch, and weight indicating the gross weight of all fish in the batch. Larimore and Bayley (1987-1988) collected length and weight for all surveys, and Sherwood and Stein (2012-2015) collected length and weight for all surveys except for cases where extremely abundant single species where subsampled. Lengths are reported in millimeters, and weight in grams. Two lookup tables provide information about species codes used in the data tables and sample site location and notes.
keywords:
fishes of Champaign County; streams; anthropogenic disturbances; long-term dataset
published:
2024-06-17
Stuchiner, Emily; Jernigan, Wyatt; Zhang, Ziliang; Eddy, William; DeLucia, Evan; Yang, Wendy
(2024)
Data includes carbon mineralization rates, potential denitrification rates, net nitrous oxide fluxes, and soil chemical properties from a laboratory incubation of soil samples collected from 20 locations across an Illinois maize field.
keywords:
denitrification; nitrous oxide; dissolved organic carbon; maize
published:
2025-10-30
Dwivedi, Nidhi; Yamamoto, Senri; Zhao, Yunjun; Hou, Guichuan; Bowling, Forrest; Tobimatsu, Yuki; Liu, Chang-Jun
(2025)
Grass lignocelluloses feature complex compositions and structures. In addition to the presence of conventional lignin units from monolignols, acylated monolignols and flavonoid tricin also incorporate into lignin polymer; moreover, hydroxycinnamates, particularly ferulate, cross-link arabinoxylan chains with each other and/or with lignin polymers. These structural complexities make grass lignocellulosics difficult to optimize for effective agro-industrial applications. In the present study, we assess the applications of two engineered monolignol 4-O-methyltransferases (MOMTs) in modifying rice lignocellulosic properties. Two MOMTs confer regiospecific para-methylation of monolignols but with different catalytic preferences. The expression of MOMTs in rice resulted in differential but drastic suppression of lignin deposition, showing more than 50% decrease in guaiacyl lignin and up to an 90% reduction in syringyl lignin in transgenic lines. Moreover, the levels of arabinoxylan-bound ferulate were reduced by up to 50%, and the levels of tricin in lignin fraction were also substantially reduced. Concomitantly, up to 11 μmol/g of the methanol-extractable 4-O-methylated ferulic acid and 5–7 μmol/g 4-O-methylated sinapic acid were accumulated in MOMT transgenic lines. Both MOMTs in vitro displayed discernible substrate promiscuity towards a range of phenolics in addition to the dominant substrate monolignols, which partially explains their broad effects on grass phenolic biosynthesis. The cell wall structural and compositional changes resulted in up to 30% increase in saccharification yield of the de-starched rice straw biomass after diluted acid-pretreatment. These results demonstrate an effective strategy to tailor complex grass cell walls to generate improved cellulosic feedstocks for the fermentable sugar-based production of biofuel and bio-chemicals.
keywords:
Feedstock Production;Biomass Analytics;Genome Engineering
published:
2025-09-25
Vu-Le, The-Anh; Park, Minhyuk; Chen, Ian; Warnow, Tandy
(2025)
Dataset for "Using Stochastic Block Models for Community Detection". This contains synthetic networks with ground-truth community structure generated using synthetic network generators (specifically, ABCD+o) based on real-world networks and computed clusterings on these real-world networks.
Note:
* networks.zip contains the synthetic networks
published:
2025-10-10
Singh, Ramkrishna; Liu, Hui; Shanklin, John; Singh, Vijay
(2025)
Lipids accumulated in the vegetative tissues of cellulosic feedstocks can be a potential raw material for biodiesel and bioethanol production. In this work, bagasse of genetically engineered sorghum was subjected to liquid hot-water pretreatment at 170, 180, and 190 °C for different reaction time. Under the optimal pretreatment condition (170 °C, 20 min), the residue was enriched in glucan (57.39 ± 2.63 % w/w) and xylan (13.38 ± 0.49 % w/w). The total lipid content of the pretreated residue was 6.81% w/w, similar to that observed in untreated bagasse (6.30% w/w). Pretreatment improved the enzymatic digestibility of bagasse, allowing a recovery of 79% w/w and 86% w/w of glucose and xylose, respectively. The pretreatment and enzymatic saccharification resulted in a 2-fold increase in total lipid in enzymatic residue compared to the original bagasse. Thus, pretreatment and enzymatic hydrolysis enabled high sugar recovery while concentrating triglycerides and free fatty acids in the residue.
keywords:
Conversion;Feedstock Production;Feedstock Bioprocessing
published:
2018-11-21
Clark, Lindsay V.; Lipka, Alexander E.; Sacks, Erik J.
(2018)
This set of scripts accompanies the manuscript describing the R package polyRAD, which uses DNA sequence read depth to estimate allele dosage in diploids and polyploids. Using several high-confidence SNP datasets from various species, allelic read depth from a typical RAD-seq dataset was simulated, then genotypes were estimated with polyRAD and other software and compared to the true genotypes, yielding error estimates.
keywords:
R programming language; genotyping-by-sequencing (GBS); restriction site-associated DNA sequencing (RAD-seq); polyploidy; single nucleotide polymorphism (SNP); Bayesian genotype calling; simulation
published:
2025-10-24
Choe, Kisurb; Jindra, Michael A.; Hubbard, Susan; Pfleger, Brian; Sweedler, Jonathan
(2025)
Creating controlled lipid unsaturation locations in oleochemicals can be a key to many bioengineered products. However, evaluating the effects of modifications to the acyl-ACP desaturase on lipid unsaturation is not currently amenable to high-throughput assays, limiting the scale of redesign efforts to <200 variants. Here, we report a rapid mass spectrometry (MS) assay for profiling the positions of double bonds on membrane lipids produced by Escherichia coli colonies after treatment with ozone gas. By MS measurement of the ozonolysis products of Δ6 and Δ8 isomers of membrane lipids from colonies expressing recombinant Thunbergia alata desaturase, we screened a randomly mutagenized library of the desaturase gene at 5 s per sample. Two variants with altered regiospecificity were isolated, indicated by an increase in 16:1 Δ8 proportion. We also demonstrated the ability of these desaturase variants to influence the membrane composition and fatty acid distribution of E. coli strains deficient in the native acyl-ACP desaturase gene, fabA. Finally, we used the fabA deficient chassis to concomitantly express a non-native acyl-ACP desaturase and a medium-chain thioesterase from Umbellularia californica, demonstrating production of only saturated free fatty acids.
keywords:
Conversion;Lipidomics;Mass Spectrometry
published:
2025-11-06
Deshavath, Narendra Naik; Woodruff, William; Eller, Fred; Susanto, Vionna; Yang, Cindy; Rao, Christopher V.; Singh, Vijay
(2025)
Microbial oils are a sustainable biomass-derived substitute for liquid fuels and vegetable oils. Oilcane, an engineered sugarcane with superior feedstock characteristics for biodiesel production, is a promising candidate for bioconversion. This study describes the processing of oilcane stems into juice and hydrothermally pretreated lignocellulosic hydrolysate and their valorization to ethanol and microbial oil using Saccharomyces cerevisiae and engineered Rhodosporidium toruloides strains, respectively. A bioethanol titer of 106 g/L was obtained from S. cerevisiae grown on oilcane juice in a 3 L fermenter, and a lipid titer of 8.8 g/L was obtained from R. toruloides grown on oilcane hydrolysate in a 75 L fermenter. Oil was extracted from the R. toruloides cells using supercritical CO2, and the observed fatty acid profile was consistent with previous studies on this strain. These results demonstrate the feasibility of pilot-scale lipid production from oilcane hydrolysate as part of an integrated bioconversion strategy.
keywords:
Conversion;Bioproducts;Feedstock Bioprocessing;Hydrolysate
published:
2026-01-15
Singh, Ramkrishna; Bhagwat, Sarang; Viswanathan, Mothi Bharath; Cortes-Pena, Yoel; Eilts, Kristen; Mingfeng, Cao; Guest, Jeremy; Zhao, Huimin; Singh, Vijay
(2026)
Triacetic acid lactone (TAL) can be microbially produced and further chemically upgraded to several high-value chemicals. In this work, several acidic and basic ion-exchange resins and activated charcoal were evaluated for their ability to adsorb microbially produced TAL. Activated charcoal and a weak base resin, Dowex 66, showed similar TAL adsorption capacity of 0.18 ± 0.002 g/g. At 15% w/v activated charcoal, about 98% of TAL present in fermentation broth could be adsorbed. Further, ethanol washing allowed recovery of 72% of adsorbed TAL. A biorefinery producing TAL from sucrose was designed, simulated, and evaluated (through technoeconomic analysis) under uncertainty, for an estimated TAL minimum product selling price (MPSP) of $4.27/kg [$3.71−4.94/kg; 5th-95th percentiles] for the current state of technology and $2.83/kg [$2.46–3.29/kg] following potential near-term improvements to fermentation. Thus, this work provides an adsorptive process to recover microbially produced TAL that can be chemically upgraded to several industrial products.
keywords:
Bioproducts; Feedstock Bioprocessing
published:
2024-10-07
Kole Aspray, Elise; Ainsworth, Elizabeth; McGrath, Jesse; McGrath, Justin; Montes, Christopher; Whetten, Andrew; Ort, Donald; Long, Stephen; Puthuval, Kannan; Mies, Timothy; Bernacchi, Carl; DeLucia, Evan; Dalsing, Bradley; Leakey, Andrew; Li, Shuai; Herriott, Jelena; Miglietta, Franco
(2024)
This data set is related to the SoyFACE experiments, which are open-air agricultural climate change experiments that have been conducted since 2001. The fumigation experiments take place at the SoyFACE farm and facility in Champaign County, Illinois during the growing season of each year, typically between June and October.
This V4 contains new experimental data files, hourly fumigation files, and weather/ambient files for 2022 and 2023, since the original dataset only included files for 2001-2021. The MATLAB code has also been updated for efficiency, and explanatory files have been updated accordingly. Below are new changes in V4:
- The "SoyFACE Plot Information 2001 to 2021" file is renamed to “SoyFACE ring information 2001 to 2023.xlsx”. Data for 2022 and 2023 were added. File contains information about each year of the SoyFACE experiments, including the fumigation treatment type (CO2, O3, or a combination treatment), the crop species, the plots (also referred to as 'rings' and labeled with numbers between 2 and 31) used in each experiment, important experiment dates, and the target concentration levels or 'setpoints' for CO2 and O3 in each experiment.
- The "SoyFACE 1-Minute Fumigation Data Files" were updated to contain sub-folders for each year of the experiments (2001-2023), each of which contains sub-folders for each ring used in that year's experiments. This data set also includes hourly data files for the fumigation experiments ("SoyFACE Hourly Fumigation Data Files" folder) created from the 1-minute files, and hourly ambient/weather data files for each year of the experiments ("Hourly Weather and Ambient Data Files" folder which has also been updated to include 2022 and 2023 data). The ambient CO2 and O3 data are collected at SoyFACE, and the weather data are collected from the SURFRAD and WARM weather stations located near the SoyFACE farm.
- “Rings.xlsx” is new in this version. This file lists the rings and treatments used in each year of the SoyFACE experiments between 2001 and 2023 and is used in several of the MATLAB codes.
- “CMI Weather Data Explanation.docx” is newly added. This file contains specific information about the processing of raw weather data, which is used in the hourly weather and ambient data files.
- Files that were in RAR format in V3 are now updated and saved as ZIP format, including: Hourly Weather and Ambient Data Files.zip , SoyFACE 1-Minute Fumigation Data Files.zip , SoyFACE Hourly Fumigation Data Files.zip, and Matlab Files.zip.
- The "Fumigation Target Percentages" file was updated to add data for 2022 and 2023. This file shows how much of the time the CO2 and O3 fumigation levels are within a 10 or 20 percent margin of the target levels when the fumigation system is turned on.
- The "Matlab Files" folder contains custom code (Aspray, E.K.) that was used to clean the "SoyFACE 1-Minute Fumigation Data" files and to generate the "SoyFACE Hourly Fumigation Data" and "Fumigation Target Percentages" files. Code information can be found in the various "Explanation" files. The Matlab code changes are as follows:
1. “Data_Issues_Finder.m” code was changed to use the “Ring.xlsx” file to gather ring and treatment information based on the contents of the file rather than being hardcoded in the Matlab code itself.
2. “Data_Issues_Finder_all.m” code is new. This code is the same as the “Data_Issues_Finder.m” code except that it identifies all CO2 and O3 repeats. In contrast, the “Data_Issues_Finder.m” code only identifies CO2 and O3 repeats that occur when the fumigation system is turned on.
3. “Target_Yearly.m” code was changed to use the “Ring.xlsx” file to gather ring and treatment information based on the contents of the file rather than being hardcoded in the Matlab code itself.
4. “HourlyFumCode.m” code is new. This code uses the “Rings.xlsx” file to gather ring and treatment information based on the contents of the file instead of the user needing to define these values explicitly. This code also defines a list of all ring folders for the year selected and runs the hourly code for each ring, instead of the user having to run the hourly code for each ring individually. Finally, the code generates two dialog boxes for the user, one which allows user to specify whether they want the hourly code to be run for 1-minute fumigation files or 1-minute ambient files, and another which allows user to specify whether they would like the hourly fumigation averages to be replaced with hourly ambient averages when the fumigation system is turned off.
5. “HourlyDataFun.m” code was changed to run either “HourlyData.m” code or “HourlyDataAmb.m” code, depending on user input in the first dialog box.
6. “HourlyData.m” code was changed to replace hourly fumigation averages with hourly ambient averages when the fumigation system is turned off, depending on user input in the second dialog box.
7. “HourlyDataAmb.m” code is new. This code is similar to “HourlyData.m” code but is used to calculate hourly averages for 1-minute ambient files instead 1-minute fumigation files.
8. “batch.m” code was changed to account for new function input variables in “HourlyDataFun.m” code, along with adding header columns for “FumOutput.xlsx” and “AmbOutput.xlsx” output files generated by “HourlyData.m” and “HourlyDataAmb.m” code.
- Finally, the " * Explanation" files contain information about the column names, units of measurement, steps needed to use Matlab code, and other pertinent information for each data file. Some of them have been updated to reflect the current change of data.
keywords:
SoyFACE; agriculture; agricultural; climate; climate change; atmosphere; atmospheric change; CO2; carbon dioxide; O3; ozone; soybean; fumigation; treatment
published:
2025-09-17
Zhao, Huimin; Rabinowitz, Joshua; Guest, Jeremy; Zhu, Zhixin; Bhagwat, Sarang; Li, Xi; Weilandt, Daniel; Xu, Hao; Tan, Shih-I; Tran, Vinh
(2025)
Microbial production of chemicals may suffer from inadequate cofactor provision, a challenge further exacerbated in yeasts due to compartmentalized cofactor metabolism. Here, we perform cofactor engineering through the decompartmentalization of mitochondrial metabolism to improve succinic acid (SA) production in Issatchenkia orientalis. We localize the reducing equivalents of mitochondrial NADH to the cytosol through cytosolic expression of its pyruvate dehydrogenase (PDH) complex and couple a reductive tricarboxylic acid pathway with a glyoxylate shunt, partially bypassing an NADH-dependent malate dehydrogenase to conserve NADH. Cytosolic SA production reaches a titer of 104 g/L and a yield of 0.85 g/g glucose, surpassing the yield of 0.66 g/g glucose constrained by cytosolic NADH availability. Additionally, expressing cytosolic PDH, we expand our I. orientalis platform to enhance acetyl-CoA-derived citramalic acid and triacetic acid lactone production by 1.22- and 4.35-fold, respectively. Our work establishes I. orientalis as a versatile platform to produce markedly reduced and acetyl-CoA-derived chemicals.
keywords:
bioproducts; metabolic engineering
published:
2025-09-15
Kantola, Ilsa; Masters, Michael; DeLucia, Evan
(2025)
Data sets for material included in "A 13-year record indicates differences in the duration and depth of soil carbon accrual among potential bioenergy crops" by Kantola et al., 2025, in Global Change Biology Bioenergy. Data include soil organic carbon (SOC), carbon stable isotope ratios, annual belowground biomass, and annual post-harvest litter for four crops, maize/soybean, miscanthus, switchgrass, and prairie, between 2008 and 2021.
keywords:
bioenergy crops; soil organic carbon; miscanthus; switchgrass; prairie
published:
2025-09-17
Avalos, Jose L; Mantri, Krishi
(2025)
Microbial fermentation provides a sustainable method of producing valuable chemicals. Adding dynamic control to fermentations can significantly improve titers, but most systems rely on transcriptional controls of metabolic enzymes, leaving existing intracellular enzymes unregulated. This limits the ability of transcriptional controls to switch off metabolic pathways, especially when metabolic enzymes have long half-lives. We developed a two-layer transcriptional/post-translational control system for yeast fermentations. Specifically, the system uses blue light to transcriptionally activate the major pyruvate decarboxylase PDC1, required for cell growth and concomitant ethanol production. Switching to darkness transcriptionally inactivates PDC1 and instead activates the anti-Pdc1p nanobody, NbJRI, to act as a genetically encoded inhibitor of Pdc1p accumulated during the growth phase. This dual transcriptional/post-translational control improves the production of 2,3-BDO and citramalate by up to 100 and 92% compared to using transcriptional controls alone in dynamic two-phase fermentations. This study establishes the NbJRI nanobody as an effective genetically encoded inhibitor of Pdc1p that can enhance the production of pyruvate-derived chemicals.
keywords:
metabolic engineering
published:
2017-09-28
Price, Edward P. F.; Spyreas, Greg; Matthews, Jeffrey
(2017)
This is the dataset used in the Journal of Ecology publication of the same name. It is a site by species matrix of species relative abundances.
The file BH.veg.data.csv contains a site by species matrix of species relative abundance (percent cover across all sampling quadrats within site). Data under the heading Year refers to sampling periods. Year 1 refers to the first set of samples taken between 1997 and 2000, Year 2 refers to the second set taken between 2002 and 2005, Year 3 refers to the third set taken between 2007 and 2010, and Year 4 refers to the fourth set taken between 2012 and 2015. All sites met Critical Trends Assessment Program (CTAP) size criteria of being at least 2 ha in size with a minimum of 500 m2 of suitable sampling area.
The data in file BH.site.location.csv contains Public Land Survey System ranges and townships in which specific sites were located. All sites were located within the U.S. state of Illinois.
More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for the data on the wetlands in this study. More information on the program and data requests can be obtained by visiting the program webpage.
Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/
keywords:
biodiversity; biotic homogenization; invasive species; Phalaris arundinacea; plant population and community dynamics; similarity index; wetlands
published:
2020-07-01
Rykhlevskii, Andrei; Huff, Kathryn D.
(2020)
keywords:
molten salt; fuel cycle; reprocessing; refueling
published:
2025-11-03
Blanc-Betes, Elena; Gomez-Casanovas, Nuria; Hartman, Melannie D.; Hudiburg, Tara W.; Khanna, Madhu; Parton, William; DeLucia, Evan H.
(2025)
Bioenergy with carbon capture and storage (BECCS) sits at the nexus of the climate and energy security. We evaluated trade-offs between scenarios that support climate stabilization (negative emissions and net climate benefit) or energy security (ethanol production). Our spatially explicit model indicates that the foregone climate benefit from abandoned cropland (opportunity cost) increased carbon emissions per unit of energy produced by 14–36%, making geologic carbon capture and storage necessary to achieve negative emissions from any given energy crop. The toll of opportunity costs on the climate benefit of BECCS from set-aside land was offset through the spatial allocation of crops based on their individual biophysical constraints. Dedicated energy crops consistently outperformed mixed grasslands. We estimate that BECCS allocation to land enrolled in the Conservation Reserve Program (CRP) could capture up to 9 Tg C year–1 from the atmosphere, deliver up to 16 Tg CE year–1 in emissions savings, and meet up to 10% of the US energy statutory targets, but contributions varied substantially as the priority shifted from climate stabilization to energy provision. Our results indicate a significant potential to integrate energy security targets into sustainable pathways to climate stabilization but underpin the trade-offs of divergent policy-driven agendas.
keywords:
Sustainability;Field Data;Modeling
published:
2019-01-27
Le, Thien; Sy, Aaron; Molloy, Erin K.; Zhang, Qiuyi; Rao, Satish; Warnow, Tandy
(2019)
This repository include datasets that are studied with INC/INC-ML/INC-NJ in the paper `Using INC within Divide-and-Conquer Phylogeny Estimation' that was submitted to AICoB 2019. Each dataset has its own readme.txt that further describes the creation process and other parameters/softwares used in making these datasets. The latest implementation of INC/INC-ML/INC-NJ can be found on https://github.com/steven-le-thien/constraint_inc. Note: there may be files with DS_STORE as extension in the datasets; please ignore these files.
keywords:
phylogenetics; gene tree estimation; divide-and-conquer; absolute fast converging
published:
2021-04-16
Xia, Yushu; Wander, Michelle; Kwon, Hoyoung
(2021)
This dataset includes five files developed using the procedures described in the article 'Developing County-level Data of Nitrogen Fertilizer and Manure Inputs for Corn Production in the United States' and Supplemental Information published in the Journal of Cleaner Production in 2021.
Citation: Xia, Yushu, Hoyoung Kwon, and Michelle Wander. "Developing county-level data of nitrogen fertilizer and manure inputs for corn production in the United States." Journal of Cleaner Production 309 (2021): e126957.
Brief method: The fertilizer and manure inputs for corn were generated with a top-down approach by assigning county-level total N inputs reported by USGS to different crops using state- and county-level survey data. The corn N needs were estimated using empirical extension-based equations coupled with soil and environmental covariates. The estimates of fertilizer N inputs were further refined for corn grain and silage production at the county level and gap-filling (using state-level averages) was carried out to generate final files for U.S. county-level N inputs.
The dataset is provided in an alternative format in Google Earth Engine: https://code.earthengine.google.com/13a0078e7ee727bc001e045ad0e8c6fc
keywords:
Corn; Nitrogen Fertilizer; Manure; Conterminous U.S.
published:
2022-02-10
Sharma, Bijay P.; Zhang, Na; DoKyoung, Lee; Heaton, Emily; Delucia, Evan H.; Sacks, Erik J.; Kantola, Ilsa B.; Boersma, Nicholas N.; Long, Stephen P.; Voigt, Thomas B.; Khanna, Madhu
(2022)
The compiled datasets include plot level observations of energy crops (miscanthus and switchgrass) from recent experimental field trials in the US including dry biomass yield, location, state, region, harvest year, growing season degree days (GDD), winter season heating degree days (HDD), growing season cumulative precipitation, annual nitrogen application rate, age of the pant when harvested, National Commodity Crop Productivity Index (NCCPI) values, and cultivar type (switchgrass) from various published and unpublished sources.
The stata codes include estimation procedures for four different specifications, i.e., Model A includes deterministic effect without interaction terms; Model B includes deterministic effect with interaction terms (N2, age2, N × age, GDD2, precip2, N × NCCPI); Model C includes deterministic effect with interaction terms, study, and location random effect; Model D includes deterministic effect with interaction terms, harvest year augmented study, and location random effect.
keywords:
Age; Miscanthus; Nitrogen; Switchgrass; Yield; Center for Advanced Bioenergy and Bioproducts Innovation
published:
2025-10-09
Namoi, Nictor; Jang, Chunhwa; Voigt, Thomas; Lee, DoKyoung
(2025)
Aging-related yield decline in Miscanthus × giganteus (miscanthus) remains a major constraint to sustainable biomass production. This study evaluated how nitrogen (N) management and soil fertility influence yield-component traits and productivity in aging miscanthus. Trials were conducted at two sites established in 2008 at the University of Illinois Energy Farm, Urbana, IL. (i) The Sun Grant trial received 0, 60, and 120 kg N ha−1 annually until 2015. Starting 2021, half of each plot received 60 or 120 kg N ha−1, resulting in six legacy-contemporary treatments: 0N–0N, 0N–120N, 60N–0N, 60N–60N, 120N–0N, 120N–120N. (ii) The Energy Farm trial remained unfertilized until 2014, when one half of each plot received 56 kg N ha−1, forming two treatments: 0N–0N, 0N–56N. Sun Grant trial results showed N fertilization increased tiller density (tillers m−2) and tiller weight (g tiller−1) in juvenile to early-mature miscanthus (2011–2015). After N withdrawal, both traits declined (20 % and 40 %), though legacy effects persisted in tiller weight in the aging stands (2020–2023). Contemporary N had little effect on tiller density but increased tiller weight by 34 %–77 %, resulting in 23 %–106 % higher machine-harvested biomass yield in 0–120N, 60-60N, and 120-120N plots. At the Energy Farm trial, 0N–56N plots yielded 59 %–108 % more biomass than 0N–0N. Soil total N increased (Sun Grant: 47 % by 2020; Energy Farm: 58 % by 2023), while Mehlich-3 P (42 %–44 %) and K (21 %–46 %) declined. These findings identify tiller weight as a key determinant of biomass yield in aging miscanthus and highlight the need for P and K management for long-term productivity.
keywords:
miscanthus; nitrogen; soil
published:
2018-08-01
Clark, Lindsay V.; Lipka, Alexander E.; Sacks, Erik J.
(2018)
This set of scripts accompanies the manuscript describing the R package polyRAD, which uses DNA sequence read depth to estimate allele dosage in diploids and polyploids. Using several high-confidence SNP datasets from various species, allelic read depth from a typical RAD-seq dataset was simulated, then genotypes were estimated with polyRAD and other software and compared to the true genotypes, yielding error estimates.
keywords:
R programming language; genotyping-by-sequencing (GBS); restriction site-associated DNA sequencing (RAD-seq); polyploidy; single nucleotide polymorphism (SNP); Bayesian genotype calling; simulation
published:
2025-02-23
Bondarenko, Nikita; Podladchikov, Yury; Williams-Stroud, Sherilyn; Makhnenko, Roman
(2025)
Dataset with numerical routines and laboratory testing data associated with the manuscript: Bondarenko, N., Podladchikov, Y., Williams‐Stroud, S., & Makhnenko, R. (2025). Stratigraphy‐induced localization of microseismicity during CO2 injection in Illinois Basin. Journal of Geophysical Research: Solid Earth, 130, e2024JB029526. https://doi.org/10.1029/2024JB029526
keywords:
Illinois Basin Decatur Project; Induced Seismicity; GPU; Numerical modeling
published:
2016-07-22
Clark, Lindsay V.; Dzyubenko, Elena; Dzyubenko, Nikolay; Bagmet, Larisa; Sabitov, Andrey; Chebukin, Pavel; Johnson, Douglas A.; Kjeldsen, Jens Bonderup; Petersen, Karen Koefoed; Jørgensen, Uffe; Yoo, Ji Hye; Heo, Kweon; Yu, Chang Yeon; Zhao, Hua; Jin, Xiaoli; Peng, Junhua; Yamada, Toshihiko; Sacks, Erik J.
(2016)
Datasets and R scripts relating to the manuscript "Ecological characteristics and in situ genetic associations for yield-component traits of wild Miscanthus from eastern Russia" published in Annals of Botany, 10.1093/aob/mcw137. Field data, including collection locations, physical and ecological information for each location, and plant phenotypes relating to biomass are included. Genetic data in this repository include single nucleotide polymorphisms (SNPs) derived from restriction site-associated DNA sequencing (RAD-seq), as well as plastid microsatellites. A file is also included listing the DNA sequences of all RAD-seq markers generated to-date by the Sacks lab, including those from this publication.
keywords:
Miscanthus sacchariflorus; Miscanthus sinensis; Russia; germplasm; RAD-seq; SNP
published:
2022-09-19
Data characterize zooplankton in Shelbyville Reservoir, Illinois, United States of America. Zooplankton were sampled with a conical zooplankton net (0.5m diameter mouth) when water was deeper than 2 m and by grab sample when water was shallower. Zooplankton samples were concentrated and subsampled with a Hensen-Stempel pipette following protocols described in Detmer et al. (2019). Zooplankton were identified to the lowest feasible taxonomic unit according to Pennak (1989) and Thorp and Covich (2001) and were enumerated in a 1 mL Sedgewick-Rafter cell. Subsamples were analyzed until at least 200 individuals were enumerated from each site.were counted across for each of the three main taxonomic groups (cladocerans, copepods, and rotifers). Given the variation in zooplankton concentrations at each site, this process often lead to far more than 200 individuals being counted (x̄ = 269, min = 200, max = 487). A summary of the sample size from each site can be found in Supplementary Table S2. Abundances were corrected for volume of water filtered. For rare taxa (< 20 individuals per sample), all individuals were measured for length. For abundant taxa, length measurements were collected on the first 20 organisms of each abundant taxon encountered in a subsample. Dry mass was calculated from equations for microcrustaceans, rotifers, and Chaoborus sp. (Rosen ,1981; Botrell et al., 1976; Dumont and Balvay, 1979).
keywords:
Reservoir; Zooplankton