Illinois Data Bank Dataset Search Results
Results
published:
2023-07-01
Tonks, Adam; Hwang, Jeongwoo
(2023)
This is the data used in the paper "Assessment of spatiotemporal flood risk due to compound precipitation extremes across the contiguous United States".
Code from the Github repository https://github.com/adtonks/precip_extremes can be used with the data here to reproduce the paper's results. v1.0.0 of the code is also archived at https://doi.org/10.5281/zenodo.8104252
This dataset is derived from NOAA-CIRES-DOE 20th Century Reanalysis V3. The NOAA-CIRES-DOE Twentieth Century Reanalysis Project version 3 used resources of the National Energy Research Scientific Computing Center managed by Lawrence Berkeley National Laboratory which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and used resources of NOAA's Remotely Deployed High Performance Computing Systems.
keywords:
spatiotemporal; CONUS; United States; precipitation; extremes; flooding
published:
2025-04-17
Mollenhauer, Michael; Pfaff, Wolfgang
(2025)
This dataset includes analysis code used to analyze the data involved with swapping photons between superconducting qubits in separate modules though a superconducting coaxial cable bus. The dataset includes Python code to model and plot the data, CAD designs of the modules that hold the superconducting qubits, high frequency simulation software files to model the electric fields of the superconducting circuits
keywords:
superconducting qubits; qunatum information; modular architecture
published:
2025-05-27
Rani, Sonia; Cao, Xi; Baptista, Alejandro E.; Hoffmann, Axel; Pfaff, Wolfgang
(2025)
This dataset contains all raw and processed data used to generate the figures in the main text and supplementary material of the paper "High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit." The data can be used to reproduce the plots and validate the analysis. Accompanying Jupyter notebooks provide step-by-step analysis pipelines for figure generation. The dataset also includes drawings for the mechanical samples used to perform the experiment. In addition, the dataset provides ANSYS HFSS electromagnetic simulation files used to design and analyze the resonator structures and estimate field distributions.
keywords:
superconducting qubit; magnon sensing; hybrid quantum systems; spin-photon coupling; magnon decay; cavity QED
published:
2019-10-15
Choi, Sang Hyun; Rao, Vikyath; Gernat, Tim; Hamilton, Adam; Robinson, Gene; Goldenfeld, Nigel
(2019)
Filtered trophallaxis interactions for two honeybee colonies, each containing 800 worker bees and one queen. Each colony consists of bees that were administered a juvenile hormone analogy, a vehicle treatment, or a sham treatment to determine the effect of colony perturbation on the duration of trophallaxis interactions. Columns one and two display the unique identifiers for each bee involved in a particular trophallaxis exchange, and columns three and four display the Unix timestamp of the beginning/end of the interaction (in milliseconds), respectively.<br /><b>Note</b>: the queen interactions were omitted from the uploaded dataset for reasons that are described in submitted manuscript. Those bees that performed poorly are also omitted from the final dataset.
keywords:
honey bee; trophallaxis; social network
published:
2020-09-25
This repository contains the datasets and corresponding results for the paper "MAGUS: Multiple Sequence Alignment using Graph Clustering".
The Datasets.zip archive contains the ROSE, balibase, Gutell, and RNASim datasets used in our experiments.
The Results.zip archive contains the outputs of running our methods against these datasets.
Datasets used:
ROSE: 10 simulated nucleotide model conditions from the SATe paper, each with 20 replicates, and with 1000 sequences per replicate.
The ROSE datasets were originally taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i">https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i</a>
RNASim: This is a collection of simulated nucleotide datasets that were generated under a model of evolution that reflects selection due to RNA structural constraints. We sampled 20 subsets of 1000 sequences each, as well as 10 subsets of 10000 each, by randomly sampling from the original million-sequence RNASim dataset.
Gutell: 16S.M, 16S.3, 16S.T, 16S.B.ALL: Four biological nucleotide datasets from the Comparative Ribosomal Website (CRW) with cleaned reference alignments from SATe. Since PASTA is restricted to datasets without sequence length heterogeneity, these were modified to remove sequences that deviate by more than 20% from the median length. The scrubbed datasets range from 740 to 24,246 sequences. The pre-screened 16S datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s">https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s</a>
BAliBASE: We use eight BAliBASE amino acid datasets used in the PASTA paper. As above, we remove outlier sequences, which leaves us with sizes ranging from 195 to 732 sequences. The pre-screened Balibase datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp">https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp</a>
published:
2024-04-05
Sinaiko, Guy; Cao, Yanghui; Dietrich, Christopher H.
(2024)
The following files include specimen information, DNA sequence data, and additional information on the analyses used to reconstruct the phylogeny of the leafhopper genus Neoaliturus as described in the Methods section of the original paper:
1. Taxon_sampling.csv: contains data on the individual specimens from which DNA was extracted, including sample code, taxon name, collection data (locality, date and name of collector) and museum unique identifier.
2. Alignments.zip: a ZIP archive containing 432 separate FASTA files representing the aligned nucleotide sequences of individual gene loci used in the analysis.
3. Concatenated_Matrix.fa: is a FASTA file containing the concatenated individual gene alignments used for the maximum likelihood analysis in IQ-TREE.
4. Genes_and_Loci.rtf: identifies the individual genes and loci used in the analysis. The partition name is the same as the name of the individual alignment file in the zipped Alignments folder.
5. Partitions_best_scheme.nex: is a text file in the standard NEXUS format that indicates the names of the individual data partitions and their locations in the concatenated matrix, and also indicates the substitution model for each partition.
6. (New in this version 2) Scripts & Description.zip includes 8 custom shell or perl scripts used to assemble the DNA sequence data by perform reciprocal blast searches between the reference sequences and assemblies for each sample, extract the best sequences based on the blast searches, screen the hits for each locus and keep only the best result, and generate the nucleotide sequence dataset for the predicted orthologues (see the file description.txt for details).
7. (New in this version 2) Full_genetic_distances_matrix.csv shows the genetic distances between pairs of samples in the datset (proportion of nucleotides that differ between samples).
keywords:
leafhopper; phylogeny; anchored-hybrid-enrichment; DNA sequence; insect
published:
2025-03-14
Mishra, Apratim; Diesner, Jana; Torvik, Vetle I.
(2025)
Hype - PubMed dataset
Prepared by Apratim Mishra
This dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences.
The candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’.
This is version 3 of the dataset. Added new file - WSD_hype.tsv
File 1: hype_dataset_final.tsv
Primary dataset. It has the following columns:
1. PMID: represents unique article ID in PubMed
2. Year: Year of publication
3. Hype_word: Candidate hype word, such as ‘novel.’
4. Sentence: Sentence in abstract containing the hype word.
5. Hype_percentile: Abstract relative position of hype word.
6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location.
7. Introduction: The ‘I’ component of the hype word based on IMRaD
8. Methods: The ‘M’ component of the hype word based on IMRaD
9. Results: The ‘R’ component of the hype word based on IMRaD
10. Discussion: The ‘D’ component of the hype word based on IMRaD
File 2: hype_removed_phrases_final.tsv
Secondary dataset with same columns as File 1.
Hype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases:
1. Major: histocompatibility, component, protein, metabolite, complex, surgery
2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid
3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment
4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values
5. Essential: medium, features, properties, opportunities, oil
6. Unique: model, amino
7. Robust: regression
8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information
9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains
10. Remarkable: properties
11. Definite: radiotherapy, surgery
File 3: WSD_hype.tsv
Includes hype-based disambiguation for candidate words targeted for WSD (Word sense disambiguation)
keywords:
Hype; PubMed; Abstracts; Biomedicine
published:
2025-12-14
Fraterrigo, Jennifer; Chen, Weile
(2025)
This dataset contains information about absorptive roots from 170 plots along a latitudinal and temperature gradient in northern Alaska, including tussock sedges and deciduous alder, birch, and willow shrubs. This dataset accompanies the paper "Impacts of Arctic Shrubs on Root Traits and Belowground Nutrient Cycles Across a Northern Alaskan Climate Gradient," which was published in Frontiers in Plant Sciences.
<b>*Note:</b> in the "patch coordinates" tab, the same coordinates/elevation ("Long", "Lat", and "Elev (m)") apply to all patches that share a number. For ex: "Patch" W1, B1, and G1 share the same "Long", "Lat", and "Elev (m)" values as "Patch" A1.
keywords:
absorptive root traits; shrub expansion; Arctic; Alaskan tundra
published:
2020-04-20
Supplemental data sets for the Manuscript entitled "Contribution of fungal and invertebrate communities to mass loss and wood depolymerization in tropical terrestrial and aquatic habitats"
keywords:
Coiba Island; wood decomposition; cellulose; hemicellulose; lignin breakdown; aquatic fungi
published:
2020-06-19
This dataset include data pulled from the World Bank 2009, the World Values Survey wave 6, Transparency International from 2009. The data were used to measure perceptions of expertise from individuals in nations that are recipients of development aid as measured by the World Bank.
keywords:
World Values Survey; World Bank; expertise; development
published:
2023-04-12
Towns, John; Hart, David
(2023)
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 through the XSEDE operational period, which ended August 31, 2022. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
Because the XSEDE program has ended, the allocation_award_history file includes all allocations activity initiated via XSEDE processes through August 31, 2022. The Resource Providers and successor program to XSEDE agreed to honor all project allocations made during XSEDE. Thus, allocation awards that extend beyond the end of XSEDE may not reflect all activity that may ultimately be part of the project award. Similarly, allocation usage data only reflects usage reported through August 31, 2022, and may not reflect all activity that may ultimately be conducted by projects that were active beyond XSEDE.
keywords:
allocations; cyberinfrastructure; XSEDE
published:
2025-09-17
Kamara, Shasta; Glomb, Jackson; Suski, Cory
(2025)
Data was generated from juvenile paddlefish acclimated to one of three different temperatures (13.0°C, 17.5°C, or 22.0°C) for two weeks. After which, fish were subjected to one of two experiments, one being simulated angling in which physiological parameters (stress hormones, lactate, glucose, ions, and oxygen transport parameters were evaluated in plasma or whole blood), the other experiment consisted of critical thermal maxima tests. Data set includes physiological parameters, water quality temperatures, and morphometric data generated from these experiments and fish.
keywords:
Sport fish, critical thermal maximum, exercise, recovery, conservation, fisheries, management
published:
2018-07-28
Hoang, Linh; Schneider, Jodi
(2018)
This dataset presents a citation analysis and citation context analysis used in Linh Hoang, Frank Scannapieco, Linh Cao, Yingjun Guan, Yi-Yun Cheng, and Jodi Schneider. Evaluating an automatic data extraction tool based on the theory of diffusion of innovation. Under submission. We identified the papers that directly describe or evaluate RobotReviewer from the list of publications on the RobotReviewer website <http://www.robotreviewer.net/publications>, resulting in 6 papers grouped into 5 studies (we collapsed a conference and journal paper with the same title and authors into one study). We found 59 citing papers, combining results from Google Scholar on June 05, 2018 and from Scopus on June 23, 2018. We extracted the citation context around each citation to the RobotReviewer papers and categorized these quotes into emergent themes.
keywords:
RobotReviewer; citation analysis; citation context analysis
published:
2025-08-04
Hartman, Theodore; Studt, Jacob; VanLoocke, Andy; McDaniel, Marshall; Howe, Adina; Masters, Michael D. ; Mitchell, Corey; DeLucia, Evan H.; Heaton, Emily
(2025)
This dataset contains the data used for the publication “Aboveground rather than belowground productivity drives variability in Miscanthus x giganteus net primary productivity”. This dataset contains Miscanthus x giganteus biomass, carbon, and nitrogen tissue data for aboveground and belowground plant parts collected in 2021 for three different sites in Iowa with three different nitrogen application rates. Data at the Iowa sites were collected via biometric hand harvesting, belowground excavations, and soil coring both in-clump and beside-clump. Data were collected at two collection timepoints to calculate the contributions of belowground parts to Miscanthus x giganteus net primary productivity. This dataset also includes Miscanthus x giganteus and Switchgrass soil coring and excavation data collected in 2012 at the University of Illinois Urbana Champaign Energy Farm.
keywords:
Miscanthus; Net Primary Productivity; Excavation; Nitrogen fertilization; Translocation; Belowground Biomass; Carbon
published:
2025-09-24
Lee, Jaewon; Kwak, Suryang; Liu, Jing-Jing; Yu, Sora; Yun, Eun Ju; Kim, Dong Hyun; Liu, Cassie; Kim, Kyoung Heon; Jin, Yong-Su
(2025)
2′-Fucosyllactose (2′-FL), a human milk oligosaccharide with confirmed benefits for infant health, is a promising infant formula ingredient. Although Escherichia coli, Saccharomyces cerevisiae, Corynebacterium glutamicum, and Bacillus subtilis have been engineered to produce 2′-FL, their titers and productivities need be improved for economic production. Glucose along with lactose have been used as substrates for producing 2′-FL, but accumulation of by-products due to overflow metabolism of glucose hampered efficient production of 2′-FL regardless of a host strain. To circumvent this problem, we used xylose, which is the second most abundant sugar in plant cell wall hydrolysates and is metabolized through oxidative metabolism, for the production of 2′-FL by engineered yeast. Specifically, we modified an engineered S. cerevisiae strain capable of assimilating xylose to produce 2′-FL from a mixture of xylose and lactose. First, a lactose transporter (Lac12) from Kluyveromyces lactis was introduced. Second, a heterologous 2′-FL biosynthetic pathway consisting of enzymes Gmd, WcaG, and WbgL from E. coli was introduced. Third, we adjusted expression levels of the heterologous genes to maximize 2′-FL production. The resulting engineered yeast produced 25.5 g/L of 2′-FL with a volumetric productivity of 0.35 g/L∙h in a fed-batch fermentation with lactose and xylose feeding to mitigate the glucose repression. Interestingly, the major location of produced 2′-FL by the engineered yeast can be changed using different culture media. While 72% of the produced 2′-FL was secreted when a complex medium was used, 82% of the produced 2′-FL remained inside the cells when a minimal medium was used. As yeast extract is already used as food and animal feed ingredients, 2′-FL enriched yeast extract can be produced cost-effectively using the 2′-FL-accumulating yeast cells.
keywords:
Conversion;Genome Engineering
published:
2022-05-20
Haselhorst, Derek; Moreno, J. Enrique; Tcheng, David K.; Punyasena, Surangi W.
(2022)
This dataset includes images and annotated counts for 150 airborne pollen samples from the Center for Tropical Forest Science 50 ha forest dynamics plot on Barro Colorado Island, Panama. Samples were collected once a year from April 1994 to June 2010.
keywords:
aerial pollen traps; automated pollen identification; Barro Colorado Island; convolutional neural networks; Neotropics; palynology; phenology
published:
2022-08-20
Jones, Todd; Ward, Michael
(2022)
Dataset associated with Jones and Ward BEAS-D-21-00106R2 submission: Parasitic cowbird development up to fledging and subsequent post-fledging survival reflect life history variation found across host species. Excel CSV files and .inp file with data used in nest survival and Brown-headed Cowbird post-fledging analyses and file with descriptions of each column. The CSV file is setup for logistic exposure models in SAS or R and the .inp file is setup to be uploaded into program MARK for multi-state recaptures only analysis. Species included in the analyses: American Robin, Blue Grosbeak, Brown Thrasher, Blue-winged Warbler, Carolina Chickadee, Chipping Sparrow, Common Yellowthroat, Dickcissel, Eastern Bluebird, Eastern Phoebe, Eastern Towhee, Field Sparrow, Gray Catbird, House Wren, Indigo Bunting, Northern Cardinal, Red-winged Blackbird, Tree Swallow, Yellow-breasted Chat, and Yellow Warbler.
keywords:
brood parasitism; cowbird; carryover effects; phenotypic plasticity; post-fledging; songbirds
published:
2024-07-11
Pelech, Elena; Long, Steve
(2024)
This dataset includes the gas exchange and TDL (tunable diode laser) files between 4 accessions of Glycine soja and 1 elite accession of Glycine max (soybean) during light induction.
In this V2, code files for Matlab and R are also included to calculate mesophyll conductance and calculate the limitation on photosynthesis, respectively.
keywords:
photosynthesis; mesophyll conductance; soybean; light induction
published:
2023-06-01
Pan, Chao; Peng, Jianhao; Chien, Eli; Milenkovic, Olgica
(2023)
This dataset contains four real-world sub-datasets with data embedded into Poincare ball models, including Olsson's single-cell RNA expression data, CIFAR10, Fashion-MNIST and mini-ImageNet. Each sub-dataset has two corresponding files: one is the data file, the other one is the pre-computed reference points for each class in the sub-dataset. Please refer to our paper (https://arxiv.org/pdf/2109.03781.pdf) and codes (https://github.com/thupchnsky/PoincareLinearClassification) for more details.
keywords:
Hyperbolic space; Machine learning; Poincare ball models; Perceptron algorithm; Support vector machine
published:
2019-12-20
Wang, Yu; Burgess, Steven J. ; de Becker, Elsa ; Long, Stephen P.
(2019)
This dynamic photosynthesis model of soybean canopy is developed by Yu Wang (yuwangcn@illinois.edu), IGB, University of Illinois.
If you want to know more details, please check the following publication
Yu Wang, Steven J. Burgess, Elsa de Becker, Stephen P. Long. Photosynthesis in the fleeting shadows: An overlooked opportunity for increasing crop productivity? The Plant Journal.
keywords:
Matlab; Soybean canopy; photosynthesis model
published:
2020-08-01
Rhoads, Bruce ; Lewis, Quinn; Sukhodolov, Alexander; Constantinescu, George
(2020)
This data set includes information used to determine patterns of mixing at three small confluences in East Central Illinois based on differences in the temperature or turbidity of the two confluent flows.
keywords:
mixing; confluences; flow structure
published:
2023-01-05
This is the data used in the paper "Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data". A preprint may be found at https://doi.org/10.48550/arXiv.2212.11367
Code from the Github repository https://github.com/adtonks/mosquito_GNN can be used with the data here to reproduce the paper's results. v1.0.0 of the code is also archived at https://doi.org/10.5281/zenodo.7897830
keywords:
west nile virus; machine learning; gnn; mosquito; trap; graph neural network; illinois; geospatial
published:
2024-05-30
Lyu, Fangzheng; Zhou, Lixuanwu; Park, Jinwoo; Baig, Furqan; Wang, Shaowen
(2024)
This dataset contains all the datasets used in the study conducted for the research publication titled "Mapping dynamic human sentiments of heat exposure with location-based social media data". This paper develops a cyberGIS framework to analyze and visualize human sentiments of heat exposure dynamically based on near real-time location-based social media (LBSM) data. Large volumes and low-cost LBSM data, together with a content analysis algorithm based on natural language processing are used effectively to generate heat exposure maps from human sentiments on social media.
## What’s inside - A quick explanation of the components of the zip file
* US folder includes the shapefile corresponding to the United State with County as spatial unit
* Census_tract folder includes the shapefile corresponding to the Cook County with census tract as spatial unit
* data/data.txt includes instruction to retrieve the sample data either from Keeling or figshare
* geo/data20000.txt is the heat dictionary created in this paper, please refer to the corresponding publication to see the data creation process
Jupyter notebook and code attached to this publication can be found at: https://github.com/cybergis/real_time_heat_exposure_with_LBSMD
keywords:
CyberGIS; Heat Exposure; Location-based Social Media Data; Urban Heat
published:
2020-03-13
Sweet, Andrew; Johnson, Kevin; Cameron, Stephen
(2020)
Data files associated with the assembly of mitochondrial minicircles from five species of parasitic lice. This includes data from four species in the genus Columbicola and from the human louse (Pediculus humanus). The files include FASTA sequences for all five species, reference sequences for read mapping approaches, resulting contigs produced by various assembly approaches, and alignments of human louse minicircles mapped to published sequences of the same species.
keywords:
mitochondria; FASTA; nucleotide sequences; alignment; Columbicola; Pediculus
published:
2021-05-12
Clem, Scott; Harmon-Threatt, Alexandra
(2021)
These are the data sets associated with our publication "Field borders provide winter refuge for beneficial predators and parasitoids: a case study on organic farms." For this project, we compared the communities of overwintering arthropod natural enemies in organic cultivated fields and wildflower-strip field borders at five different sites in central Illinois.
Abstract:
Semi-natural field borders are frequently used in midwestern U.S. sustainable agriculture. These habitats are meant to help diversify otherwise monocultural landscapes and provision them with ecosystem services, including biological control. Predatory and parasitic arthropods (i.e., potential natural enemies) often flourish in these habitats and may move into crops to help control pests. However, detailed information on the capacity of semi-natural field borders for providing overwintering refuge for these arthropods is poorly understood. In this study, we used soil emergence tents to characterize potential natural enemy communities (i.e., predacious beetles, wasps, spiders, and other arthropods) overwintering in cultivated organic crop fields and adjacent field borders. We found a greater abundance, species richness, and unique community composition of predatory and parasitic arthropods in field borders compared to arable crop fields, which were generally poorly suited as overwintering habitat. Furthermore, potential natural enemies tended to be positively associated with forb cover and negatively associated with grass cover, suggesting that grassy field borders with less forb cover are less well-suited as winter refugia. These results demonstrate that semi-natural habitats like field borders may act as a source for many natural enemies on a year-to-year basis and are important for conserving arthropod diversity in agricultural landscapes.
keywords:
Natural enemy; wildflower strips; conservation biological control; semi-natural habitat; field border; organic farming