Illinois Data Bank Dataset Search Results
Results
published:
2020-10-01
Acevedo-Siaca, Liana; Long, Stephen
(2020)
Raw gas exchange data for photosynthetic induction in 6 rice accession flag leaves. Photosynthetic induction and point measurements were made at ambient [CO2]. Two accessions (AUS 278 and IR64) were selected to screen in greater detail in which photosynthetic induction was measured at six [CO2].
published:
2021-05-01
Cheng, Ti-Chung; Li, Tiffany Wenting; Karahalios, Karrie; Sundaram, Hari
(2021)
This is the first version of the dataset.
This dataset contains anonymize data collected during the experiments mentioned in the publication: “I can show what I really like.”: Eliciting Preferences via Quadratic Voting that would appear in April 2021.
Once the publication link is public, we would provide an update here.
These data were collected through our open-source online systems that are available at (experiment1)[https://github.com/a2975667/QV-app] and (experiment 2)[https://github.com/a2975667/QV-buyback]
There are two folders in this dataset. The first folder (exp1_data) contains data collected during experiment 1; the second folder (exp2_data) contains data collected during experiment 2.
keywords:
Quadratic Voting; Likert scale; Empirical studies; Collective decision-making
published:
2025-10-01
Schetter, August; Lin, Cheng-Hsien; Zumpf, Colleen; Jang, Chunhwa; Hoffmann Jr., Leo; Rooney, William; Lee, DoKyoung
(2025)
Recently introduced photoperiod-sensitive (PS) biomass sorghum (Sorghum bicolor L. Moench) needs to be investigated for yield potential under different cultivation environments with reasonable nitrogen (N) inputs. The objectives of this study were to (1) evaluate the biomass yield and feedstock quality of four sorghum hybrids with different levels of PS ranging from very PS (VPS) hybrids and to moderate PS (MPS) hybrids, and (2) determine the optimal N inputs (0~168 kg N ha−1) under four environments: combinations of both temperate (Urbana, IL) and subtropical (College Station, TX) regions during 2018 and 2019. Compared to TX, the PS sorghums in central IL showed higher yield potential and steady feedstock production with an extended day length and with less precipitation variability, especially for the VPS hybrids. The mean dry matter (DM) yields of VPS hybrids were 20.5 Mg DM ha−1 and 17.7 Mg DM ha−1 in IL and TX, respectively. The highest N use efficiency occurred at a low N rate of 56 kg N ha−1 by improving approximately 33 kg DM ha−1 per 1.0 kg N ha−1 input. Approximately 70% of the PS sorghum biomass can be utilized for biofuel production, consisting of 58-65% of the cell-wall components and 4-11% of the soluble sugar. This study demonstrated that the rainfed temperate area (e.g., IL) has a great potential for the sustainable cultivation of PS energy sorghum due to their observed high yield potential, stable production, and low N requirements.
keywords:
Sustainability;Biomass Analytics;Field Data
published:
2019-09-17
Mishra, Shubhanshu
(2019)
Trained models for multi-task multi-dataset learning for text classification as well as sequence tagging in tweets.
Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality.
Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging.
Models were trained using: <a href="https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py">https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py</a>
See <a href="https://github.com/socialmediaie/SocialMediaIE">https://github.com/socialmediaie/SocialMediaIE</a> and <a href="https://socialmediaie.github.io">https://socialmediaie.github.io</a> for details.
If you are using this data, please also cite the related article:
Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
keywords:
twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning; classification; sequence tagging
published:
2020-07-15
Legried, Brandon; Molloy, Erin K.; Warnow, Tandy; Roch, Sebastien
(2020)
This repository includes scripts and datasets for the paper, "Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss."
keywords:
Species tree estimation; gene duplication and loss; identifiability; statistical consistency; quartets; ASTRAL
published:
2023-07-01
Tonks, Adam; Hwang, Jeongwoo
(2023)
This is the data used in the paper "Assessment of spatiotemporal flood risk due to compound precipitation extremes across the contiguous United States".
Code from the Github repository https://github.com/adtonks/precip_extremes can be used with the data here to reproduce the paper's results. v1.0.0 of the code is also archived at https://doi.org/10.5281/zenodo.8104252
This dataset is derived from NOAA-CIRES-DOE 20th Century Reanalysis V3. The NOAA-CIRES-DOE Twentieth Century Reanalysis Project version 3 used resources of the National Energy Research Scientific Computing Center managed by Lawrence Berkeley National Laboratory which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and used resources of NOAA's Remotely Deployed High Performance Computing Systems.
keywords:
spatiotemporal; CONUS; United States; precipitation; extremes; flooding
published:
2025-04-17
Mollenhauer, Michael; Pfaff, Wolfgang
(2025)
This dataset includes analysis code used to analyze the data involved with swapping photons between superconducting qubits in separate modules though a superconducting coaxial cable bus. The dataset includes Python code to model and plot the data, CAD designs of the modules that hold the superconducting qubits, high frequency simulation software files to model the electric fields of the superconducting circuits
keywords:
superconducting qubits; qunatum information; modular architecture
published:
2025-05-27
Rani, Sonia; Cao, Xi; Baptista, Alejandro E.; Hoffmann, Axel; Pfaff, Wolfgang
(2025)
This dataset contains all raw and processed data used to generate the figures in the main text and supplementary material of the paper "High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit." The data can be used to reproduce the plots and validate the analysis. Accompanying Jupyter notebooks provide step-by-step analysis pipelines for figure generation. The dataset also includes drawings for the mechanical samples used to perform the experiment. In addition, the dataset provides ANSYS HFSS electromagnetic simulation files used to design and analyze the resonator structures and estimate field distributions.
keywords:
superconducting qubit; magnon sensing; hybrid quantum systems; spin-photon coupling; magnon decay; cavity QED
published:
2025-06-03
White, Andrew; Lambert, John
(2025)
GIS data and geoprocessing tools associated with White and Lambert (2025) modeling paper that assesses the potential impact of development on the archaeological resources of Illinois.
keywords:
development; archaeology; climate change; GIS
published:
2019-10-15
Choi, Sang Hyun; Rao, Vikyath; Gernat, Tim; Hamilton, Adam; Robinson, Gene; Goldenfeld, Nigel
(2019)
Filtered trophallaxis interactions for two honeybee colonies, each containing 800 worker bees and one queen. Each colony consists of bees that were administered a juvenile hormone analogy, a vehicle treatment, or a sham treatment to determine the effect of colony perturbation on the duration of trophallaxis interactions. Columns one and two display the unique identifiers for each bee involved in a particular trophallaxis exchange, and columns three and four display the Unix timestamp of the beginning/end of the interaction (in milliseconds), respectively.<br /><b>Note</b>: the queen interactions were omitted from the uploaded dataset for reasons that are described in submitted manuscript. Those bees that performed poorly are also omitted from the final dataset.
keywords:
honey bee; trophallaxis; social network
published:
2020-03-14
Rhoads, Bruce ; Lindroth, Evan
(2020)
Data on bank elevations determined from lidar data for the Upper Sangamon River, Illinois, the Mission River, Texas, and the White River in Indiana
keywords:
bank elevations, rivers, meandering, lowland
published:
2020-09-25
This repository contains the datasets and corresponding results for the paper "MAGUS: Multiple Sequence Alignment using Graph Clustering".
The Datasets.zip archive contains the ROSE, balibase, Gutell, and RNASim datasets used in our experiments.
The Results.zip archive contains the outputs of running our methods against these datasets.
Datasets used:
ROSE: 10 simulated nucleotide model conditions from the SATe paper, each with 20 replicates, and with 1000 sequences per replicate.
The ROSE datasets were originally taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i">https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i</a>
RNASim: This is a collection of simulated nucleotide datasets that were generated under a model of evolution that reflects selection due to RNA structural constraints. We sampled 20 subsets of 1000 sequences each, as well as 10 subsets of 10000 each, by randomly sampling from the original million-sequence RNASim dataset.
Gutell: 16S.M, 16S.3, 16S.T, 16S.B.ALL: Four biological nucleotide datasets from the Comparative Ribosomal Website (CRW) with cleaned reference alignments from SATe. Since PASTA is restricted to datasets without sequence length heterogeneity, these were modified to remove sequences that deviate by more than 20% from the median length. The scrubbed datasets range from 740 to 24,246 sequences. The pre-screened 16S datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s">https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s</a>
BAliBASE: We use eight BAliBASE amino acid datasets used in the PASTA paper. As above, we remove outlier sequences, which leaves us with sizes ranging from 195 to 732 sequences. The pre-screened Balibase datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp">https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp</a>
published:
2024-04-05
Sinaiko, Guy; Cao, Yanghui; Dietrich, Christopher H.
(2024)
The following files include specimen information, DNA sequence data, and additional information on the analyses used to reconstruct the phylogeny of the leafhopper genus Neoaliturus as described in the Methods section of the original paper:
1. Taxon_sampling.csv: contains data on the individual specimens from which DNA was extracted, including sample code, taxon name, collection data (locality, date and name of collector) and museum unique identifier.
2. Alignments.zip: a ZIP archive containing 432 separate FASTA files representing the aligned nucleotide sequences of individual gene loci used in the analysis.
3. Concatenated_Matrix.fa: is a FASTA file containing the concatenated individual gene alignments used for the maximum likelihood analysis in IQ-TREE.
4. Genes_and_Loci.rtf: identifies the individual genes and loci used in the analysis. The partition name is the same as the name of the individual alignment file in the zipped Alignments folder.
5. Partitions_best_scheme.nex: is a text file in the standard NEXUS format that indicates the names of the individual data partitions and their locations in the concatenated matrix, and also indicates the substitution model for each partition.
6. (New in this version 2) Scripts & Description.zip includes 8 custom shell or perl scripts used to assemble the DNA sequence data by perform reciprocal blast searches between the reference sequences and assemblies for each sample, extract the best sequences based on the blast searches, screen the hits for each locus and keep only the best result, and generate the nucleotide sequence dataset for the predicted orthologues (see the file description.txt for details).
7. (New in this version 2) Full_genetic_distances_matrix.csv shows the genetic distances between pairs of samples in the datset (proportion of nucleotides that differ between samples).
keywords:
leafhopper; phylogeny; anchored-hybrid-enrichment; DNA sequence; insect
published:
2025-03-14
Mishra, Apratim; Diesner, Jana; Torvik, Vetle I.
(2025)
Hype - PubMed dataset
Prepared by Apratim Mishra
This dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences.
The candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’.
This is version 3 of the dataset. Added new file - WSD_hype.tsv
File 1: hype_dataset_final.tsv
Primary dataset. It has the following columns:
1. PMID: represents unique article ID in PubMed
2. Year: Year of publication
3. Hype_word: Candidate hype word, such as ‘novel.’
4. Sentence: Sentence in abstract containing the hype word.
5. Hype_percentile: Abstract relative position of hype word.
6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location.
7. Introduction: The ‘I’ component of the hype word based on IMRaD
8. Methods: The ‘M’ component of the hype word based on IMRaD
9. Results: The ‘R’ component of the hype word based on IMRaD
10. Discussion: The ‘D’ component of the hype word based on IMRaD
File 2: hype_removed_phrases_final.tsv
Secondary dataset with same columns as File 1.
Hype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases:
1. Major: histocompatibility, component, protein, metabolite, complex, surgery
2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid
3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment
4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values
5. Essential: medium, features, properties, opportunities, oil
6. Unique: model, amino
7. Robust: regression
8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information
9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains
10. Remarkable: properties
11. Definite: radiotherapy, surgery
File 3: WSD_hype.tsv
Includes hype-based disambiguation for candidate words targeted for WSD (Word sense disambiguation)
keywords:
Hype; PubMed; Abstracts; Biomedicine
published:
2025-12-14
Fraterrigo, Jennifer; Chen, Weile
(2025)
This dataset contains information about absorptive roots from 170 plots along a latitudinal and temperature gradient in northern Alaska, including tussock sedges and deciduous alder, birch, and willow shrubs. This dataset accompanies the paper "Impacts of Arctic Shrubs on Root Traits and Belowground Nutrient Cycles Across a Northern Alaskan Climate Gradient," which was published in Frontiers in Plant Sciences.
<b>*Note:</b> in the "patch coordinates" tab, the same coordinates/elevation ("Long", "Lat", and "Elev (m)") apply to all patches that share a number. For ex: "Patch" W1, B1, and G1 share the same "Long", "Lat", and "Elev (m)" values as "Patch" A1.
keywords:
absorptive root traits; shrub expansion; Arctic; Alaskan tundra
published:
2020-04-20
Supplemental data sets for the Manuscript entitled "Contribution of fungal and invertebrate communities to mass loss and wood depolymerization in tropical terrestrial and aquatic habitats"
keywords:
Coiba Island; wood decomposition; cellulose; hemicellulose; lignin breakdown; aquatic fungi
published:
2020-01-31
Bradshaw, Therin M.; Blake-Bradshaw, Abigail G.; Fournier, Auriel M.V.; Lancaster, Joseph D. ; O'Connell, John; Jacques, Christopher N.; Eicholtz, Michael W.; Hagy, Heath M
(2020)
Data inputs, and scripts for the analysis detailed in Bradshaw et al, published in PlosONE 2020.
keywords:
Marsh birds; wetlands
published:
2020-06-19
This dataset include data pulled from the World Bank 2009, the World Values Survey wave 6, Transparency International from 2009. The data were used to measure perceptions of expertise from individuals in nations that are recipients of development aid as measured by the World Bank.
keywords:
World Values Survey; World Bank; expertise; development
published:
2025-02-07
Huang, Annie H.; Matthews, Jeffrey W.
(2025)
These data represent the raw data from the paper “Influence of light availability and water depth on competition between Phalaris arundinacea and herbaceous vines” published in Wetlands by Annie H. Huang and Jeffrey W. Matthews. The data are archived in one file: Huang&Matthews_mesocosm_data_archive. This file includes raw data collected during a greenhouse experiment described in the paper.
published:
2020-01-27
Morphologic data of dunes in the World's big rivers. Morphologic descriptors for large dunes include: dune height, dune mean leeside angle, dune maximum leeside angle, dune wavelength, dune flow depth (at the crest), and the fractional height of the maximum slope on the leeside for each dune. Morphologic descriptors for small dunes include: dune height, dune mean leeside angle, dune maximum leeside angle, dune wavelength, and dune flow depth (at the crest).
keywords:
dune; bedform; rivers; morphology;
published:
2023-04-12
Towns, John; Hart, David
(2023)
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 through the XSEDE operational period, which ended August 31, 2022. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
Because the XSEDE program has ended, the allocation_award_history file includes all allocations activity initiated via XSEDE processes through August 31, 2022. The Resource Providers and successor program to XSEDE agreed to honor all project allocations made during XSEDE. Thus, allocation awards that extend beyond the end of XSEDE may not reflect all activity that may ultimately be part of the project award. Similarly, allocation usage data only reflects usage reported through August 31, 2022, and may not reflect all activity that may ultimately be conducted by projects that were active beyond XSEDE.
keywords:
allocations; cyberinfrastructure; XSEDE
published:
2025-09-17
Kamara, Shasta; Glomb, Jackson; Suski, Cory
(2025)
Data was generated from juvenile paddlefish acclimated to one of three different temperatures (13.0°C, 17.5°C, or 22.0°C) for two weeks. After which, fish were subjected to one of two experiments, one being simulated angling in which physiological parameters (stress hormones, lactate, glucose, ions, and oxygen transport parameters were evaluated in plasma or whole blood), the other experiment consisted of critical thermal maxima tests. Data set includes physiological parameters, water quality temperatures, and morphometric data generated from these experiments and fish.
keywords:
Sport fish, critical thermal maximum, exercise, recovery, conservation, fisheries, management
published:
2025-07-31
Gibson, Jared; Jiang, Zhanzhi; Kou, Angela
(2025)
This repository includes data files and analysis and plotting codes for reproducing the figures in the paper "A scanning resonator for probing quantum coherent devices" arXiv:2506.22620
published:
2025-08-01
Beach, Cheyenne R.; Koop, Jennifer A.H.; Fournier, Auriel M.V.
(2025)
Data from the 2025 publication in the Wilson Journal of Ornithology with the same name.
keywords:
Lesser Scaup; Waterfowl; Transmitter Effects
published:
2018-07-28
Hoang, Linh; Schneider, Jodi
(2018)
This dataset presents a citation analysis and citation context analysis used in Linh Hoang, Frank Scannapieco, Linh Cao, Yingjun Guan, Yi-Yun Cheng, and Jodi Schneider. Evaluating an automatic data extraction tool based on the theory of diffusion of innovation. Under submission. We identified the papers that directly describe or evaluate RobotReviewer from the list of publications on the RobotReviewer website <http://www.robotreviewer.net/publications>, resulting in 6 papers grouped into 5 studies (we collapsed a conference and journal paper with the same title and authors into one study). We found 59 citing papers, combining results from Google Scholar on June 05, 2018 and from Scopus on June 23, 2018. We extracted the citation context around each citation to the RobotReviewer papers and categorized these quotes into emergent themes.
keywords:
RobotReviewer; citation analysis; citation context analysis