Illinois Data Bank Dataset Search Results
Results
published:
2019-01-07
Carlstone, Jamie; Kenfield, Ayla Stein; Norman, Michael; Wilkin, John
(2019)
Vendor transcription of the Catalogue of Copyright Entries, Part 1, Group 1, Books: New Series, Volume 29 for the Year 1932. This file contains all of the entries from the indicated volume.
keywords:
copyright; Catalogue of Copyright Entries; Copyright Office
published:
2023-01-01
Cao, Yanghui; Dietrich, Christopher H.; Kits, Joel; Dmitriev, Dmitry A.; Xu, Ye; Huang, Min
(2023)
The following files were used to reconstruct the phylogeny of the leafhopper subfamily Typhlocybinae, using IQ-TREE v1.6.12 and ASTRAL v 4.10.5.
<b>1) Taxon_sampling.csv:</b> contains the sample IDs (1st column) and the taxonomic information (2nd column). Sample IDs were used in the alignment files and partition files.
<b>2) concatenated_nt_complete.phy:</b> a complete concatenated nucleotide dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 154,992 nucleotide positions (intron included) from 665 loci. Hyphens are used to represent gaps.
<b>3) concatenated_nt_complete_partition.nex:</b> the partitioning schemes for concatenated_nt_complete.phy. The file partitions the 154,992 nucleotide characters into 426 character sets, and defines the best substitution model for each character set.
<b>4) concatenated_cds_complete.phy:</b> a complete concatenated coding DNA sequence dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 153,525 nucleotide positions (intron excluded) from 665 loci. Hyphens are used to represent gaps.
<b>5) concatenated_cds_complete_partition.nex:</b> the partitioning schemes for concatenated_cds_complete.phy. The file partitions the 153,525 nucleotide characters into 426 character sets, and defines the best substitution model for each character set.
<b>6) concatenated_nt_reduced.phy:</b> a reduced concatenated nucleotide dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 95,076 nucleotide positions (intron included) from 374 loci. Hyphens are used to represent gaps.
<b>7) concatenated_nt_reduced_partition.nex:</b> the partitioning schemes for concatenated_nt_reduced.phy. The file partitions the 95,076 nucleotide characters into 312 character sets, and defines the best substitution model for each character set.
<b>8) concatenated_aa_complete.phy:</b> a complete concatenated amino acid dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12, corresponding to concatenated_cds_complete.phy. The file lists the sequences of 248 samples with 51,175 amino acid positions from 665 loci. Hyphens are used to represent gaps.
<b>9) concatenated_aa_complete_partition.nex:</b> the partitioning schemes for concatenated_aa_complete.phy. The file partitions the 51,175 amino acid characters into 426 character sets, and defines the best substitution model for each character set.
<b>10) concatenated_aa_reduced.phy:</b> a reduced concatenated amino acid dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12, corresponding to concatenated_nt_reduced.phy. The file lists the sequences of 248 samples with 31,384 amino acid positions from 374 loci. Hyphens are used to represent gaps.
<b>11) concatenated_aa_reduced_partition.nex:</b> the partitioning schemes for concatenated_aa_reduced.phy. The file partitions the 31,384 amino acid characters into 312 character sets, and defines the best substitution model for each character set.
<b>12) Individual_gene_alignment.zip:</b> contains 426 FASTA files, each one is an alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5 based the consensus trees with a minimum average bootstrap value of 70.
keywords:
Auchenorrhyncha, Cicadomorpha, Membracoidea, anchored hybrid enrichment
published:
2026-01-09
Schultz, J Carl; Cao, Mingfeng; Zhao, Huimin
(2026)
Rhodotorula toruloides has been increasingly explored as a host for bioproduction of lipids, fatty acid derivatives and terpenoids. Various genetic tools have been developed, but neither a centromere nor an autonomously replicating sequence (ARS), both necessary elements for stable episomal plasmid maintenance, has yet been reported. In this study, cleavage under targets and release using nuclease (CUT&RUN), a method used for genome-wide mapping of DNA–protein interactions, was used to identify R. toruloides IFO0880 genomic regions associated with the centromeric histone H3 protein Cse4, a marker of centromeric DNA. Fifteen putative centromeres ranging from 8 to 19 kb in length were identified and analyzed, and four were tested for, but did not show, ARS activity. These centromeric sequences contained below average GC content, corresponded to transcriptional cold spots, were primarily nonrepetitive and shared some vestigial transposon-related sequences but otherwise did not show significant sequence conservation. Future efforts to identify an ARS in this yeast can utilize these centromeric DNA sequences to improve the stability of episomal plasmids derived from putative ARS elements.
keywords:
Genome Engineering; Genomics
published:
2024-05-23
Xing, Yuqing; Bae, Seokjin; Ritz, Ethan; Yang, Fan; Birol, Turan; Salinas , Andrea N. Capa ; Ortiz, Brenden R.; Wilson , Stephen D.; Wang, Ziqiang; Fernandes, Rafael M.; Madhavan, Vidya
(2024)
This dataset consists of all the figure files that are part of the main text and supplementary of the manuscript titled "Optical manipulation of the charge density wave state in RbV3Sb5". For detailed information on the individual files refer to the readme file.
keywords:
kagome superconductor; optics; charge density wave
published:
2019-06-11
Wang, Wenrui; Wang, Tao; Amin, Vivek P.; Wang, Yang; Radhakrishnan, Anil; Davidson, Angie; Allen, Shane R.; Silva, T. J.; Ohldag, Hendrik; Balzar, Davor; Zink, Barry L.; Haney, Paul M.; Xiao, John Q.; Cahill, David G.; Lorenz, Virginia O.; Fan, Xin
(2019)
This dataset provides the raw data, code and related figures for the paper, "Anomalous Spin-Orbit Torques in Magnetic Single-Layer Films."
keywords:
spintronics; spin-orbit torques; magnetic materials
published:
2018-11-18
Kwang, Jeffrey; Parker, Gary
(2018)
This dataset contains experimental measurements used in the paper, "Ultra-sensitivity of Numerical Landscape Evolution Models to their Initial Conditions." (to be submitted).
The data is taken from experimental runs in a miniature landscape model named the eXperimental Landscape Evolution (XLE) facility. In this facility, we complete five >24hr runs at 5 minute temporal resolution. Every five minutes, an planform image was capture, and a digital elevation model (DEM) was generated. For each run, images and a corresponding animation of images are documented. In addition,ASCII formatted DEMs along with color hillshade maps were generated. The hillshade map images were also made into an animation.
This dataset is associated with the following publication: https://doi.org/10.1029/2019GL083305
keywords:
landscape evolution model; digital elevation model; geomorphology
published:
2019-02-02
The bee visitation data includes the percentage of each bee pollinator group in bee bowls and observed. The data are referenced in the article with the following citation:
Bennett, A.B., Lovell, S.T. 2019. Landscape and local site variables differentially influence pollinators and pollination services in urban agricultural sites. Accepted for publication in: PLOS ONE.
published:
2018-05-16
Lewis, Quinn; Bruce, Rhoads
(2018)
These data are for two companion papers on use of LSPIV obtained from UAS (i.e. drones) to measure flow structure in streams. The LSPIV1 folder contains spreadsheet data used in each case referred to in Table 1 in the manuscript. In the spreadsheets, there is a cell that denotes which figure was constructed with which data. The LSPIV2 folder contains spreadsheets with data used for the constructed figures, and are labeled by figure.
keywords:
LSPIV; drone; UAS; flow structure; rivers
published:
2018-12-13
Yin, Dandong; Wang, Shaowen
(2018)
The dataset contains a complete example (inputs, outputs, codes, intermediate results, visualization webpage) of executing Height Above Nearest Drainage HAND workflow with CyberGIS-Jupyter.
keywords:
cybergis; hydrology; Jupyter
published:
2019-07-27
Clark, Lindsay V.; Dwiyanti, Maria Stefanie; Anzoua, Kossonou G.; Brummer, Joe E.; Glowacka, Katarzyna; Hall, Megan; Heo, Kweon; Jin, Xiaoli; Lipka, Alexander E.; Peng, Junhua; Yamada, Toshihiko; Yoo, Ji Hye; Yu, Chang Yeon; Zhao, Hua; Long, Stephen P.; Sacks, Erik J.
(2019)
Genotype calls are provided for a collection of 583 Miscanthus sinensis clones across 1,108,836 loci mapped to version 7 of the Miscanthus sinensis reference genome. Sequence and alignment information for all unique RAD tags is also provided to facilitate cross-referencing to other genomes.
keywords:
variant call format (VCF); sequence alignment/map format (SAM); miscanthus; single nucleotide polymorphism (SNP); restriction site-associated DNA sequencing (RAD-seq); bioenergy; grass
published:
2018-06-02
Palmer, Ryan; Albarracin, Dolores
(2018)
keywords:
conspiracy theory; trust in science
published:
2017-12-04
Zaya, David N.; Leicht-Young, Stacey A.; Pavlovic, Noel; Hetrea, Christopher S.; Ashley, Mary V.
(2017)
Data used for Zaya et al. (2018), published in Invasive Plant Science and Management DOI 10.1017/inp.2017.37, are made available here. There are three spreadsheet files (CSV) available, as well as a text file that has detailed descriptions for each file ("readme.txt"). One spreadsheet file ("prices.csv") gives pricing information, associated with Figure 3 in Zaya et al. (2018). The other two spreadsheet files are associated with the genetic analysis, where one file contains raw data for biallelic microsatellite loci ("genotypes.csv") and the other ("structureResults.csv") contains the results of Bayesian clustering analysis with the program STRUCTURE. The genetic data may be especially useful for future researchers. The genetic data contain the genotypes of the horticultural samples that were the focus of the published article, and also genotypes of nearly 400 wild plants. More information on the location of the wild plant collections can be found in the Supplemental information for Zaya et al. (2015) Biological Invasions 17:2975–2988 DOI 10.1007/s10530-015-0926-z. See "readme.txt" for more information.
keywords:
Horticultural industry; invasive species; microsatellite DNA; mislabeling; molecular testing
published:
2024-10-08
Mersich, Ina; Bishop, Rebecca; Diaz Yucupicio, Sandra; Nobrega, Ana D.; Austin, Scott; Barger, Anne; Fick , Megan E.; Wilkins, Pamela
(2024)
Acepromazine was administered to healthy adult horses to induce transient anemia secondary to splenic sequestration. Data was collected at baseline (T0), 1 hour (T1) and 12 hours (T2) post acepromazine administration. Data collection included PCV, TP, CBC, fibrinogen, PT, PTT and viscoelastic coagulation profiles (VCM Vet) as well as ultrasonographic measurements of the spleen at all 3 time points.
keywords:
horse; coagulation; viscoelastic testing; anemia; acepromazine
published:
2021-03-15
Stodola, Alison P.; Lydeard, Charles; Lamer, James T.; Douglass, Sarah A.; Cummings, Kevin; Campbell, David
(2021)
Dataset associated with "Hiding in plain sight: genetic confirmation of putative Louisiana Fatmucket Lampsilis hydiana in Illinois" as submitted to Freshwater Mollusk Biology and Conservation by Stodola et al. Images are from cataloged specimens from the Illinois Natural History Survey (INHS) Mollusk Collection in Champaign, Illinois that were used for genetic research. File names indicate the species as confirmed in Stodola et al. (i.e., Lampsilis siliquoidea or Lampsilis hydiana) followed by the INHS Mollusk Collection catalog number, followed by the individual specimen number, followed by shell view (interior or exterior). If no specimen number is noted in the file name, there is only one specimen for that catalog number. For example: Lsiliquoidea_46515_1_2_3_exterior.
Images were created by photographing specimens on a metric grid in an OrTech Photo-e-Box Plus with a Nikon D610 single lens reflex camera using a 60mm lens. Post-processing of images (cropping, image rotation, and auto contrast) occurred in Adobe Photoshop and saved as TIFF files using no image compression, interleaved pixel order, and IBM PC Byte Order. One additional partial lot, INHS Mollusk Catalog No. 37059 (shown with both interior and exterior view in one image), is included for reference but was not genetically sequenced. A .csv file contains an index of all specimens photographed.
SPECIES: species confirmed using genetic analyses
GENE: cox1 or nad1 mitochondrial gene
ACCESSION: GenBank accession number
INHS CATALOG NO: Illinois Natural History Survey Mollusk Collection Catalog number
WATERBODY: waterbody where specimen was collected
PUTATIVE SPECIES: species determination based on morphological characters prior to genetic analysis
Phylogenetic sequence data (.nex files) were aligned using BioEdit (Hall, T.A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series 41:95-98.). Pertinent methodology for the analysis are contained within the manuscript submittal for Stodola et al. to Freshwater Mollusk Biology and Conservation. In these files, "N" is a standard symbol for an unknown base.
keywords:
Lampsilis hydiana; Lampsilis siliquoidea; unionid; Louisiana Fatmucket; Fatmucket; genetic confirmation
published:
2018-12-20
Dong, Xiaoru; Xie, Jingyi; Hoang, Linh; Schneider, Jodi
(2018)
File Name: Error_Analysis.xslx
Data Preparation: Xiaoru Dong
Date of Preparation: 2018-12-12
Data Contributions: Xiaoru Dong, Linh Hoang, Jingyi Xie, Jodi Schneider
Data Source: The classification prediction results of prediction in testing data set
Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider
Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews
Description: The file contains lists of the wrong and correct prediction of inclusion criteria of Cochrane Systematic Reviews from the testing data set and the length (number of words) of the inclusion criteria.
Notes: In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords:
Inclusion criteria, Randomized controlled trials, Machine learning, Systematic reviews
published:
2024-01-31
Wang, Xiudan; Dietrich, Christopher; Zhang, Yalin
(2024)
The included files were used to reconstruct the phylogeny of Coelidiinae using combined morphological and molecular data, estimate divergence times and reconstruct ancestral biogeographic areas as described in the manuscript submitted for publication. The file “Coelidiinae_dna_morph_combined.nex” is a text file in standard NEXUS format used by various phylogenetic analysis programs. This file includes the aligned and concatenated nucleotide sequences or five gene regions (mitochondrial COI and 16S, and nuclear 28S D-2, histone H3, histone H2A and wingless) indicated by standard “ACGT” nucleotide symbols with missing data indicated by “?”, and morphological character data as defined in Table S3 used in the analyses. The data partitions are indicated toward the end of the file by ranges of numbers (“charset Subset 1 – 4” for the DNA data and “charset morph” for the morphological characters) followed by commands for the phylogenetic analysis program MrBayes that specify the model settings for each data partition. Detailed data on species included (as rows) in the dataset, including collection localities and GenBank accession numbers are provided in the Table_S1_Specimen_information.csv file. The file "TablesS2-S4.pdf" lists the primers used for polymerase chain reaction amplification, the list of morphological character definitions, and the morphological character matrix. The file “RASP_Distribution.csv” contains a list of the species included in the phylogenetic dataset (first column) and a code (second column) indicating their distributions as follows: (A) Oriental, (B) Palaearctic, (C) Australian, (D) Afrotropical, (E) Neotropical, and (F) Nearctic. More than one letter indicates that the species occurs in more than one region. The file "infile_for_BEAST.txt" is the input file in XML format used for the molecular divergence time analysis using the program BEAST (Bayesian Evolutionary Analysis by Sampling Trees) as described in the Methods section of the manuscript. This file includes comments that document the steps of the analysis.
keywords:
leafhopper; phylogeny; DNA sequence; insect; timetree; biogeography
published:
2018-07-13
Hensley, Merinda Kaye; Johnson, Heidi R.
(2018)
Qualitative Data collected from the websites of undergraduate research journals between October, 2014 and May, 2015. Two CSV files. The first file, "Sample", includes the sample of journals with secondary data collected. The second file, "Population", includes the remainder of the population for which secondary data was not collected. Note: That does not add up to 800 as indicated in article, rows were deleted for journals that had broken links or defunct websites during random sampling process.
keywords:
undergraduate research; undergraduate journals; scholarly communication; libraries; liaison librarianship
published:
2016-08-18
Copyright Review Management System renewals by year, data from Table 2 of the article "How Large is the ‘Public Domain’? A comparative Analysis of Ringer’s 1961 Copyright Renewal Study and HathiTrust CRMS Data."
keywords:
copyright; copyright renewals; HathiTrust
published:
2018-04-23
Contains a series of datasets that score pairs of tokens (words, journal names, and controlled vocabulary terms) based on how often they co-occur within versus across authors' collections of papers. The tokens derive from four different fields of PubMed papers: journal, affiliation, title, MeSH (medical subject headings). Thus, there are 10 different datasets, one for each pair of token type: affiliation-word vs affiliation-word, affiliation-word vs journal, affiliation-word vs mesh, affiliation-word vs title-word, mesh vs mesh, mesh vs journal, etc.
Using authors to link papers and in turn pairs of tokens is an alternative to the usual within-document co-occurrences, and using e.g., citations to link papers. This is particularly striking for journal pairs because a paper almost always appears in a single journal and so within-document co-occurrences are 0, i.e., useless.
The tokens are taken from the Author-ity 2009 dataset which has a cluster of papers for each inferred author, and a summary of each field. For MeSH, title-words, affiliation-words that summary includes only the top-20 most frequent tokens after field-specific stoplisting (e.g., university is stoplisted from affiliation and Humans is stoplisted from MeSH). The score for a pair of tokens A and B is defined as follows. Suppose Ai and Bi are the number of occurrences of token A (and B, respectively) across the i-th author's papers, then
nA = sum(Ai); nB = sum(Ai)
nAB = sum(Ai*Bi) if A not equal B; nAA = sum(Ai*(Ai-1)/2) otherwise
nAnB = nA*nB if A not equal B; nAnA = nA*(nA-1)/2 otherwise
score = 1000000*nAB/nAnB if A is not equal B; 1000000*nAA/nAnA otherwise
Token pairs are excluded when: score < 5, or nA < cut-off, or nB < cut-off, or nAB < cut-offAB.
The cut-offs differ for token types and can be inferred from the datasets. For example, cut-off = 200 and cut-offAB = 20 for journal pairs.
Each dataset has the following 7 tab-delimited all-ASCII columns
1: score: roughly the number tokens' co-occurrence divided by the total number of pairs, in parts per million (ppm), ranging from 5 to 1,000,000
2: nAB: total number of co-occurrences
3: nAnB: total number of pairs
4: nA: number of occurrences of token A
5: nB: number of occurrences of token B
6: A: token A
7: B: token B
We made some of these datasets as early as 2011 as we were working to link PubMed authors with USPTO inventors, where the vocabulary usage is strikingly different, but also more recently to create links from PubMed authors to their dissertations and NIH/NSF investigators, and to help disambiguate PubMed authors. Going beyond explicit (exact within-field match) is particularly useful when data is sparse (think old papers lacking controlled vocabulary and affiliations, or papers with metadata written in different languages) and when making links across databases with different kinds of fields and vocabulary (think PubMed vs USPTO records). We never published a paper on this but our work inspired the more refined measures described in:
<a href="https://doi.org/10.1371/journal.pone.0115681">D′Souza JL, Smalheiser NR (2014) Three Journal Similarity Metrics and Their Application to Biomedical Journals. PLOS ONE 9(12): e115681. https://doi.org/10.1371/journal.pone.0115681</a>
<a href="http://dx.doi.org/10.5210/disco.v7i0.6654">Smalheiser, N., & Bonifield, G. (2016). Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation. DISCO: Journal of Biomedical Discovery and Collaboration, 7. doi:http://dx.doi.org/10.5210/disco.v7i0.6654</a>
keywords:
PubMed; MeSH; token; name disambiguation
published:
2020-10-01
Fraterrigo, Jennifer; Rembelski, Mara
(2020)
We measured the effects of fire or drought treatment on plant, microbial and biogeochemical responses in temperate deciduous forests invaded by the annual grass Microstegium vimineum with a history of either frequent fire or fire exclusion.
Please note, on Documentation tab / Experimental or Sampling Design, “15 (XVI)” should be “16 (XVI)”.
keywords:
plant-soil interaction; grass-fire cycle; Microstegium; carbon and nitrogen cycling; microbial decomposers
published:
2018-04-05
GBS data from Phaseolus accessions, for a study led by Dr. Glen Hartman, UIUC. <br />The (zipped) fastq file can be processed with the TASSEL GBS pipeline or other pipelines for SNP calling. The related article has been submitted and the methods section describes the data processing in detail.
published:
2025-10-10
Yang, Pan; Cai, Ximing; Leibensperger, Carrie; Khanna, Madhu
(2025)
The success of a bioenergy policy relies largely on the wide adoption of perennial energy crops at the farm scale. This study uses survey data to examine potential adoption decisions by farmers in the U.S. Midwest and the causal effects of various direct and indirect influencing factors, especially heterogeneous preferences of farmers. A Bayesian network (BN) model is developed to delineate the causal relationship between farmers adoption decisions and the influencing factors. We find a dominating role of economic factors and a non-negligible impact of non-economic factors, such as the perceived environmental benefits and the extent of familiarity with perennial energy crops. To examine the effect of heterogeneity in farmer preferences, we classify the surveyed farmers into four categories based on their attitudes toward the economic, social, and environmental dimensions of perennial energy crops. We identified statistically significant between-group differences in the responses of the four types of farmers to the various influencing factors. Our findings contribute to disentangling the complicated motivations that will influence perennial energy crop adoption decisions and provide implications for more targeted policy development that need to consider the heterogeneous drivers of farmer decisions about land use.
keywords:
Sustainability;Modeling
published:
2025-10-29
Chen, Chu-Chun; Dominguez, Francina; Matus, Sean
(2025)
This dataset contains variables from the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 (ERA5; Hersbach et al., 2020). These data were used for the analysis in “The impact of large-scale land surface conditions on the South American low-level jet” published in Geophysical Research Letters.
Acknowledgments:
This work was supported by NSF Award AGS-1852709. We thank Dr. Zhuo Wang and Dr. Divyansh Chug for their valuable feedback and insightful discussions.
References:
Hersbach H, Bell B, Berrisford P, et al. The ERA5 global reanalysis. Q J R Meteorol Soc. 2020; 146: 1999–2049. https://doi.org/10.1002/qj.3803
keywords:
atmospheric sciences; South American low-level jet; land-atmosphere interactions; soil moisture; regional atmospheric circulation; southeastern South America
published:
2019-09-06
This is a dataset of 1101 comments from The New York Times (May 1, 2015-August 31, 2015) that contains a mention of the stemmed words vaccine or vaxx.
keywords:
vaccine;online comments