Illinois Data Bank Dataset Search Results
Results
published:
2018-08-01
Clark, Lindsay V.; Lipka, Alexander E.; Sacks, Erik J.
(2018)
This set of scripts accompanies the manuscript describing the R package polyRAD, which uses DNA sequence read depth to estimate allele dosage in diploids and polyploids. Using several high-confidence SNP datasets from various species, allelic read depth from a typical RAD-seq dataset was simulated, then genotypes were estimated with polyRAD and other software and compared to the true genotypes, yielding error estimates.
keywords:
R programming language; genotyping-by-sequencing (GBS); restriction site-associated DNA sequencing (RAD-seq); polyploidy; single nucleotide polymorphism (SNP); Bayesian genotype calling; simulation
published:
2025-02-23
Bondarenko, Nikita; Podladchikov, Yury; Williams-Stroud, Sherilyn; Makhnenko, Roman
(2025)
Dataset with numerical routines and laboratory testing data associated with the manuscript: Bondarenko, N., Podladchikov, Y., Williams‐Stroud, S., & Makhnenko, R. (2025). Stratigraphy‐induced localization of microseismicity during CO2 injection in Illinois Basin. Journal of Geophysical Research: Solid Earth, 130, e2024JB029526. https://doi.org/10.1029/2024JB029526
keywords:
Illinois Basin Decatur Project; Induced Seismicity; GPU; Numerical modeling
published:
2025-01-17
Suski, Cory; Dennis, Clark
(2025)
This is the data set for a publication titled, "Coupling carbon dioxide gas within a bubble curtain enhances its effectiveness to deter fish." The current study sought to quantify whether adding carbon dioxide gas (CO2) to a bubble curtain would enhance its efficacy to block fish. For this, a choice tank was outfitted with bubble curtains infused with either compressed air alone, or with two different concentrations of CO2 [30 or 100 mg/L]. Passage rates and position of common carp (an invasive Cyprinid) and black bullhead (a native Ictalurid) exposed to these treatments were compared. The data set consists of data from each of the experiments performed during the study.
keywords:
invasive species; multimodal barriers; deterrents; biodiversity; species range; distribution
published:
2024-12-05
Meacham-Hensold, Katherine; Ort, Donald
(2024)
Data consists of RNA expression, tuber mass, photosynthetic capacity and diurnal CO2 assimilation calculations, potato tuber nutrient content, photorespiratory metabolite analysis and meteorological data to support the increase in yield and thermotolerance observed in potato plants with an introduce photorespiratory bypass. Data was collected between 2019-2024 at University of Illinois at Urbana-Champaign, IL, USA.
keywords:
Photorespiratory bypass; photosynthesis; photorespiration; food security; potato
published:
2019-05-16
Molloy, Erin K.; Warnow, Tandy
(2019)
This repository includes scripts and datasets for the paper, "Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge." All data files in this repository are for analyses using the logdet distance matrix computed on the concatenated alignment. Data files for analyses using the average gene-tree internode distance matrix can be downloaded from the Illinois Data Bank (https://doi.org/10.13012/B2IDB-1424746_V1). The latest version of NJMerge can be downloaded from Github (https://github.com/ekmolloy/njmerge).<br />
<strong>List of Changes:</strong>
• Updated timings for NJMerge pipelines to include the time required to estimate distance matrices; this impacted files in the following folder: <strong>data.zip</strong>
• Replaced "Robinson-Foulds" distance with "Symmetric Difference"; this impacted files in the following folders: <strong> tools.zip; data.zip; scripts.zip</strong>
• Added some additional information about the java command used to run ASTRAL-III; this impacted files in the following folders: <strong>data.zip; astral64-trees.tar.gz (new)</strong>
keywords:
divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
published:
2016-07-22
Clark, Lindsay V.; Dzyubenko, Elena; Dzyubenko, Nikolay; Bagmet, Larisa; Sabitov, Andrey; Chebukin, Pavel; Johnson, Douglas A.; Kjeldsen, Jens Bonderup; Petersen, Karen Koefoed; Jørgensen, Uffe; Yoo, Ji Hye; Heo, Kweon; Yu, Chang Yeon; Zhao, Hua; Jin, Xiaoli; Peng, Junhua; Yamada, Toshihiko; Sacks, Erik J.
(2016)
Datasets and R scripts relating to the manuscript "Ecological characteristics and in situ genetic associations for yield-component traits of wild Miscanthus from eastern Russia" published in Annals of Botany, 10.1093/aob/mcw137. Field data, including collection locations, physical and ecological information for each location, and plant phenotypes relating to biomass are included. Genetic data in this repository include single nucleotide polymorphisms (SNPs) derived from restriction site-associated DNA sequencing (RAD-seq), as well as plastid microsatellites. A file is also included listing the DNA sequences of all RAD-seq markers generated to-date by the Sacks lab, including those from this publication.
keywords:
Miscanthus sacchariflorus; Miscanthus sinensis; Russia; germplasm; RAD-seq; SNP
published:
2023-09-21
Clarke, Caitlin; Lischwe Mueller, Natalie; Joshi, Manasi Ballal; Fu, Yuanxi; Schneider, Jodi
(2023)
The relationship between physical activity and mental health, especially depression, is one of the most studied topics in the field of exercise science and kinesiology. Although there is strong consensus that regular physical activity improves mental health and reduces depressive symptoms, some debate the mechanisms involved in this relationship as well as the limitations and definitions used in such studies. Meta-analyses and systematic reviews continue to examine the strength of the association between physical activity and depressive symptoms for the purpose of improving exercise prescription as treatment or combined treatment for depression. This dataset covers 27 review articles (either systematic review, meta-analysis, or both) and 365 primary study articles addressing the relationship between physical activity and depressive symptoms. Primary study articles are manually extracted from the review articles. We used a custom-made workflow (Fu, Yuanxi. (2022). Scopus author info tool (1.0.1) [Python]. <a href="https://github.com/infoqualitylab/Scopus_author_info_collection">https://github.com/infoqualitylab/Scopus_author_info_collection</a> that uses the Scopus API and manual work to extract and disambiguate authorship information for the 392 reports. The author information file (author_list.csv) is the product of this workflow and can be used to compute the co-author network of the 392 articles.
This dataset can be used to construct the inclusion network and the co-author network of the 27 review articles and 365 primary study articles. A primary study article is "included" in a review article if it is considered in the review article's evidence synthesis. Each included primary study article is cited in the review article, but not all references cited in a review article are included in the evidence synthesis or primary study articles. The inclusion network is a bipartite network with two types of nodes: one type represents review articles, and the other represents primary study articles. In an inclusion network, if a review article includes a primary study article, there is a directed edge from the review article node to the primary study article node. The attribute file (article_list.csv) includes attributes of the 392 articles, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network.
Collectively, this dataset reflects the evidence production and use patterns within the exercise science and kinesiology scientific community, investigating the relationship between physical activity and depressive symptoms.
FILE FORMATS
1. article_list.csv - Unicode CSV
2. author_list.csv - Unicode CSV
3. Chinese_author_name_reference.csv - Unicode CSV
4. inclusion_net_edges.csv - Unicode CSV
5. review_article_details.csv - Unicode CSV
6. supplementary_reference_list.pdf - PDF
7. README.txt - text file
8. systematic_review_inclusion_criteria.csv - Unicode CSV
<b>UPDATES IN THIS VERSION COMPARED TO V3</b> (Clarke, Caitlin; Lischwe Mueller, Natalie; Joshi, Manasi Ballal; Fu, Yuanxi; Schneider, Jodi (2023): The Inclusion Network of 27 Review Articles Published between 2013-2018 Investigating the Relationship Between Physical Activity and Depressive Symptoms. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4614455_V3)
- We added a new file systematic_review_inclusion_criteria.csv.
keywords:
systematic reviews; meta-analyses; evidence synthesis; network visualization; tertiary studies; physical activity; depressive symptoms; exercise; review articles
published:
2025-01-27
Zinnen, Jack; Chase, Marissa; Charles, Brian; Meissen, Justin; Matthews, Jeffrey
(2025)
This is the core data for RELIX, a dataset of vascular plant species presence for 353 prairie remnants in the Midwestern United States and associated dataset of prairie remnant metadata. The primary data file contains a list of the vascular plant species observed in the prairie remnants, as well as a metadata table with more information about the prairie remnant in question and the species list itself. The data was compiled from a variety of written sources, private and published, chronicling observations made between the mid-twentieth century and 2021. It also contains a supplementary data table of vascular plant species observed in at least 8 of the prairie remnants in RELIX, as well as a list of acknowledgements for the associated manuscript.
keywords:
prairie peninsula; prairie relict; prairie soil; species inventories; tallgrass prairie
published:
2024-11-15
Blanke, Steven; Ringling, Megan; Tan, Ivilyn; Oh, Seung
(2024)
This page contains the data for the manuscript "Vacuolating cytotoxin A interactions with the host cell surface". This manuscript is currently in prep.
keywords:
Steven R Blanke; Vacuolating cytotoxin A; VacA; Helicobacter pylori; protein binding; sphingomyelin; cell surface
published:
2019-09-01
Jackson, Nicole; Konar, Megan; Debaere, Peter; Estes, Lyndon
(2019)
Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014.
The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).
keywords:
global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series
published:
2024-11-14
Matthews, Jeffrey W.; Huang, Annie H.
(2024)
These data represent the raw data from the paper “The invasion of Japanese hop (Humulus japonicus) in a restored floodplain forest” published in Invasive Plant Science and Management by Annie H. Huang and Jeffrey W. Matthews.
keywords:
invasive plants; restored wetlands
published:
2025-06-30
Li, Shengyun; Liao, Ling-Hsiu; Wu, Wen-Yen; Berenbaum, May
(2025)
This dataset is associated with the manuscript "Residual tau-fluvalinate, a beehive acaricide, disrupts growth and metabolism in the greater wax moth, Galleria mellonella"
This dataset includes 2 Excel files:
1) raw_data_bioassay.xlsx: this file contains the raw data for waxworm bioassay. There are 2 worksheets within this file:
- LC50: raw data for measuring the LC50 of Galleria mellonella (greater wax moth) in laboratory and field strains exposed to tau-fluvalinate.
- RGR: Relative Growth Rate, raw data for measuring body weight of field strain of Galleria mellonella exposed to tau-fluvalinate.
2) raw-data_RT-qPCR.xlsx: this file contains raw data (Ct value) of RT-qPCR.
keywords:
Apis mellifera; cytochrome P450; tau-fluvalinate; detoxification genes; waxworm
published:
2024-07-11
Schneider, Amy; Suski, Cory
(2024)
published:
2024-08-12
Hartman, Jordan H; Davis, Mark A; Iacaruso, Nicholas J; Tiemann, Jeremy S; Larson, Eric R
(2024)
Data associated with the manuscript "Stable isotopes and diet metabarcoding reveal trophic overlap between native and invasive Banded Killifish (Fundulus diaphanus) subspecies." by Jordan H. Hartman, Mark A. Davis, Nicholas J. Iacaruso, Jeremy S. Tiemann, Eric R. Larson. For this project, we sampled six locations in Michigan and Illinois for Eastern and Western Banded Killifish and primary consumers. Using stable isotope analysis we found that Eastern Banded Killifish had higher variance in littoral dependence and trophic position than Western Banded Killifish, but both stable isotope and gut content metabarcoding analyses revealed an overlap in the diet composition and trophic position between the subspecies. This dataset provides the sampling locations, accession numbers for gut content metabarcoding data from the National Center for Biotechnology Information Sequence Read Archive, the assignment of each family used in the gut content metabarcoding analysis as littoral, pelagic, terrestrial, or parasite. and the raw stable isotope data from University of California Davis.
keywords:
non-game fish; invasive species; imperiled species; stable isotope analysis; gut content metabarcoding
published:
2024-10-10
Zeiri, Offer; Hatzis, Katherine Marie; Gomez, Maurea; Cook, Emily A; Kincanon, Maegen; Murphy, Catherine
(2024)
keywords:
Gold nanorods, Surface enhanced Raman spectroscopy, SERS, Polyoxometalates
published:
2022-09-19
Data characterize zooplankton in Shelbyville Reservoir, Illinois, United States of America. Zooplankton were sampled with a conical zooplankton net (0.5m diameter mouth) when water was deeper than 2 m and by grab sample when water was shallower. Zooplankton samples were concentrated and subsampled with a Hensen-Stempel pipette following protocols described in Detmer et al. (2019). Zooplankton were identified to the lowest feasible taxonomic unit according to Pennak (1989) and Thorp and Covich (2001) and were enumerated in a 1 mL Sedgewick-Rafter cell. Subsamples were analyzed until at least 200 individuals were enumerated from each site.were counted across for each of the three main taxonomic groups (cladocerans, copepods, and rotifers). Given the variation in zooplankton concentrations at each site, this process often lead to far more than 200 individuals being counted (x̄ = 269, min = 200, max = 487). A summary of the sample size from each site can be found in Supplementary Table S2. Abundances were corrected for volume of water filtered. For rare taxa (< 20 individuals per sample), all individuals were measured for length. For abundant taxa, length measurements were collected on the first 20 organisms of each abundant taxon encountered in a subsample. Dry mass was calculated from equations for microcrustaceans, rotifers, and Chaoborus sp. (Rosen ,1981; Botrell et al., 1976; Dumont and Balvay, 1979).
keywords:
Reservoir; Zooplankton
published:
2025-01-01
Smith, Rebecca; Hussain, Abrar
(2025)
Raw data from a survey of para-veterinary workers in Pakistan regarding knowledge, attitudes, and practices around ticks and tick-borne diseases. Between March and August 2023, we conducted a web-based survey among para-veterinarians recruited via email, text message, and face-to-face conversations.
keywords:
ticks; survey; tick-borne disease; para-veterinary workers
published:
2025-10-10
Cheng, Ming-Hsun; Dien, Bruce; Jin, Yong-Su; Thompson, Stephanie R.; Shin, Jonghyeok; Slininger, Patricia J.; Qureshi, Nasib; Singh, Vijay
(2025)
Glucose and xylose are the major sugars present in cellulosic hydrolysates. The cellulosic sugars can be used for the production of platform chemicals. In this study, productions of lipid and ethanol by yeasts were compared for concentrated bioenergy sorghum syrup. Bioenergy sorghum was hydrothermally pretreated at 50% w/w solids in a continuous industrial reactor and sequentially mechanically refined using a burr mill to improve biomass accessibility for hydrolysis. Fed-batch enzymatic hydrolysis was conducted with 50% w/v solids loading and cellulase cocktail (50 FPU/g biomass) to achieve 230 g/L sugar concentration. Various strains of Rhodosporidium toruloides were evaluated for converting sugars into lipids, and strain Y-6987 had the highest lipid titer (9.2 g/L). The lipid titer was improved to 19.0 g/L by implementing a two-stage culture scheme, where the first stage was optimized for yeast growth and the second for lipid production. For ethanol production, the engineered Saccharomyces cerevisiae SR8ΔADH6 was used to coferment glucose and xylose. Ethanol fermentation was optimized for media nutrients (YP, YNB/urea, and urea), cellulosic sugar concentration, and sulfite conditioning to maximize the ethanol concentration from sorghum syrups. Fermentation of 70% v/v concentrated hydrolysate conditioned with sulfite produces 50.1 g/L ethanol from 141 g/L of sugars.
keywords:
Conversion;Feedstock Bioprocessing
published:
2018-11-20
Corey, Ryan M.; Tsuda, Naoki; Singer, Andrew C.
(2018)
A dataset of acoustic impulse responses for microphones worn on the body. Microphones were placed at 80 positions on the body of a human subject and a plastic mannequin. The impulse responses can be used to study the acoustic effects of the body and can be convolved with sound sources to simulate wearable audio devices and microphone arrays. The dataset also includes measurements with different articles of clothing covering some of the microphones and with microphones placed on different hats and accessories. The measurements were performed from 24 angles of arrival in an acoustically treated laboratory.
Related Paper: Ryan M. Corey, Naoki Tsuda, and Andrew C. Singer. "Acoustic Impulse Responses for Wearable Audio Devices," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK, May 2019.
All impulse responses are sampled at 48 kHz and truncated to 500 ms. The impulse response data is provided in WAVE audio and MATLAB data file formats. The microphone locations are provided in tab-separated-value files for each experiment and are also depicted graphically in the documentation.
The file wearable_mic_dataset_full.zip contains both WAVE- and MATLAB-format impulse responses.
The file wearable_mic_dataset_matlab.zip contains only MATLAB-format impulse responses.
The file wearable_mic_dataset_wave.zip contains only WAVE-format impulse responses.
keywords:
Acoustic impulse responses; microphone arrays; wearables; hearing aids; audio source separation
published:
2022-03-25
Shen, Chengze; Park, Minhyuk; Warnow, Tandy
(2022)
This upload includes the 16S.B.ALL in 100-HF condition (referred to as 16S.B.ALL-100-HF) used in Experiment 3 of the WITCH paper (currently accepted in principle by the Journal of Computational Biology). 100-HF condition refers to making sequences fragmentary with an average length of 100 bp and a standard deviation of 60 bp. Additionally, we enforced that all fragmentary sequences to have lengths > 50 bp. Thus, the final average length of the fragments is slightly higher than 100 bp (~120 bp).
In this case (i.e., 16S.B.ALL-100-HF), 1,000 sequences with lengths 25% around the median length are retained as "backbone sequences", while the remaining sequences are considered "query sequences" and made fragmentary using the "100-HF" procedure. Backbone sequences are aligned using MAGUS (or we extract their reference alignment). Then, the fragmentary versions of the query sequences are added back to the backbone alignment using either MAGUS+UPP or WITCH.
More details of the tar.gz file are described in README.txt.
keywords:
MAGUS;UPP;Multiple Sequence Alignment;eHMMs
published:
2022-07-25
This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4950847_V1) and collects entity mentions that were manually determined to be noisy, non-species entities.
keywords:
synthetic biology; NERC data; species mentions, noisy entities
published:
2022-07-25
This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4950847_V1) for species entities and represents those that were determined to be species (i.e., were not noisy entities) but for which no corresponding concept could be found in the NCBI taxonomy database.
keywords:
synthetic biology; NERC data; species mentions, not found entities
published:
2023-06-29
Pandit, Akshay; Karakoc, Deniz Berfin; Konar, Megan
(2023)
This database provides estimates of agricultural and food commodity flows [in both tons and $US] between the US and China for the year 2017. Pairwise information is provided between US states and Chinese provinces, and US counties and Chinese provinces for 7 Standardized Classification of Transported Goods (SCTG) commodity categories. Additionally, crosswalks are provided to match Harmonized System (HS) codes and China's Multi-Regional Input Output (MRIO) commodity sectors to their corresponding SCTG commodity codes. The included SCTG commodities are:
- SCTG 01: Iive animals and fish
- SCTG 02: cereal grains
- SCTG 03: agricultural products (except for animal feed, cereal grains, and forage products)
- SCTG 04: animal feed, eggs, honey, and other products of animal origin
- SCTG 05: meat, poultry, fish, seafood, and their preparations
- SCTG 06: milled grain products and preparations, and bakery products
- SCTG 07: other prepared foodstuffs, fats and oils
For additional information, please see the related paper by Pandit et al. (2022) in Environmental Research Letters. ADD DOI WHEN RECEIVED
keywords:
Food flows; High-resolution; County-scale; Bilateral; United States; China
published:
2024-11-07
Fernandez-Materan, Francelys; Olivos-Caicedo, Kelly; Daniel, Steven; Walden, Kimberly; Fields, Christopher; Hernandez, Alvaro; Alves, Joao; Ridlon, Jason
(2024)
This dataset is part of a genome annoucement. The main folder PROKKA_results contain nine Prokka v.1.14.6 annotation files from nine Clostridium scindens genome sequences. Each file provide 12 output files including predicted protein sequences (.faa), nucleotide sequences of the predicted coding regions (.ffn), nucleotide sequence of the genome (.fna and .fsa), annotated genome in GenBank format (.gbk), steps recording performed during the annotation process (.log), error messages or warnings (.err), annotations in Sequin format (.sqn), summary of the annotations in tabular (.tbl), tab-separated values (.tsv) and plain text (.txt) formats.
keywords:
Clostridium scindens; genome annotation; PROKKA;
published:
2024-11-07
Zheng, Heng; Fu, Yuanxi; Vandel, Ellie; Schneider, Jodi
(2024)
This dataset consists of the 286 publications retrieved from Web of Science and Scopus on July 6, 2023 as citations for Willoughby et al., 2014:
Patrick H. Willoughby, Matthew J. Jansma, and Thomas R. Hoye (2014). A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042
We added the DOIs of the citing publications into a Zotero collection. Then we exported all 286 DOIs in two formats: a .csv file (data export) and an .rtf file (bibliography).
<b>Willoughby2014_286citing_publications.csv</b> is a Zotero data export of the citing publications.
<b>Willoughby2014_286citing_publications.rtf</b> is a bibliography of the citing publications, using a variation of the American Psychological Association style (7th edition) with full names instead of initials.
To create <b>Willoughby2014_citation_contexts.csv</b>, HZ manually extracted the paragraphs that contain a citation marker of Willoughby et al., 2014. We refer to these paragraphs as the citation contexts of Willoughby et al., 2014. Manual extraction started with 286 citing publications but excluded 2 publications that are not in English, those with DOIs 10.13220/j.cnki.jipr.2015.06.004 and 10.19540/j.cnki.cjcmm.20200604.201
The silver standard aimed to triage the citing publications of Willoughby et al., 2014 that are at risk of propagating unreliability due to a code glitch in a computational chemistry protocol introduced in Willoughby et al., 2014. The silver standard was created stepwise:
First one chemistry expert (YF) manually annotated the corpus of 284 citing publications in English, using their full text and citation contexts. She manually categorized publications as either at risk of propagating unreliability or not at risk of propagating unreliability, with a rationale justifying each category.
Then we selected a representative sample of citation contexts to be double annotated. To do this, MJS turned the full dataset of citation contexts (Willoughby2014_citation_contexts.csv) into word embeddings, clustered them using similarity measures using BERTopic's HDBS, and selected representative citation contexts based on the centroids of the clusters.
Next the second chemistry expert (EV) annotated the 77 publications associated with the citation contexts, considering the full text as well as the citation contexts.
<b>double_annotated_subset_77_before_reconciliation.csv</b> provides EV and YF's annotation before reconciliation.
To create the silver standard YF, EV, and JS discussed differences and reconciled most differences. YF and EV had principled reasons for disagreeing on 9 publications; to handle these, YF updated the annotations, to create the silver standard we use for evaluation in the remainder of our JCDL 2024 paper (<b>silver_standard.csv</b>)
<b>Inter_Annotator_Agreement.xlsx</b> indicates publications where the two annotators made opposite decisions and calculates the inter-annotator agreement before and after reconciliation together.
<b>double_annotated_subset_77_before_reconciliation.csv</b> provides EV and YF's annotation after reconciliation, including applying the reconciliation policy.
keywords:
unreliable cited sources; knowledge maintenance; citations; scientific digital libraries; scholarly publications; reproducibility; unreliability propagation; citation contexts