Illinois Data Bank
Log in with NetID
University Library, University of Illinois at Urbana-Champaign
Illinois Data Bank
Log in with NetID
25 per page
50 per page
1 - 25
Generate Report from Search Results
Life Sciences (83)
Social Sciences (48)
Physical Sciences (23)
Technology and Engineering (19)
U.S. National Science Foundation (NSF) (49)
U.S. National Institutes of Health (NIH) (26)
U.S. Department of Energy (DOE) (14)
U.S. Department of Agriculture (USDA) (8)
Illinois Department of Natural Resources (IDNR) (3)
U.S. National Aeronautics and Space Administration (NASA) (2)
U.S. Geological Survey (USGS) (2)
U.S. Army (1)
CC BY (74)
Nowak, Jennifer E.; Sweet, Andrew D.; Weckstein, Jason D.; Johnson, Kevin P. (2019): Data for: A molecular phylogenetic analysis of the genera of fruit doves and their allies using dense taxonomic sampling. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9797270_V1
Multiple sequence alignments from concatenated nuclear and mitochondrial genes and resulting phylogenetic tree files of fruit doves and their close relatives. Files include: BEAST input XML file (fruit_dove_beast_input.xml); a maximum clade credibility tree from a BEAST analysis (fruit_dove_beast_mcc.tre); concatenated multiple sequence alignment NEXUS files for the novel dataset (fruit_dove_concatenated_alignment.nex, 76 taxa, 4,277 characters) and the dataset with additional sequences (fruit_dove_plus_cibois_data_concatenated_alignment.nex, 204 taxa, 4,277 characters), both of which contain a MrBayes block including partition information; and 50% majority-rule consensus trees generated from MrBayes analyses, using the NEXUS alignment files as inputs (fruit_dove_mrbayes_consensus.tre, fruit_dove_plus_cibois_data_mrbayes_consensus.tre).
fruit doves; multiple sequence alignment; phylogeny; Aves: Columbidae
Smith, Rebecca (2019): Mastitis risk effect on the economic consequences of paratuberculosis control in dairy cattle: A stochastic modeling study. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7539223_V1
Simulation data related to the paper "Mastitis risk effect on the economic consequences of paratuberculosis control in dairy cattle: A stochastic modeling study"
Skinner, Rachel; Dietrich, Christopher; Walden, Kimberly; Gordon, Eric; Sweet, Andrew; Podsiadlowski, Lars; Petersen, Malte; Simon, Chris; Takiya, Daniela; Johnson, Kevin (2019): Data for Phylogenomics of Auchenorrhyncha (Insecta: Hemiptera) using Transcriptomes: Examining Controversial Relationships via Degeneracy Coding and Interrogation of Gene Conflict. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1461292_V1
The data in this directory corresponds to: Skinner, R.K., Dietrich, C.H., Walden, K.K.O., Gordon, E., Sweet, A.D., Podsiadlowski, L., Petersen, M., Simon, C., Takiya, D.M., and Johnson, K.P. Phylogenomics of Auchenorrhyncha (Insecta: Hemiptera) using Transcriptomes: Examining Controversial Relationships via Degeneracy Coding and Interrogation of Gene Conflict. Systematic Entomology. Correspondance should be directed to: Rachel K. Skinner, email@example.com If you use these data, please cite our paper in Systematic Entomology. The following files can be found in this dataset: Amino_acid_concatenated_alignment.phy: the amino acid alignment used in this analysis in phylip format. Amino_acid_raxml_partitions.txt (for reference only): the partitions for the amino acid alignment, but a partitioned amino acid analysis was not performed in this study. Amino_acid_concatenated_tree.newick: the best maximum likelihood tree with bootstrap values in newick format. ASTRAL_input_gene_trees.tre: the concatenated gene tree input file for ASTRAL README_pie_charts.md: explains the the scripts and data needed to recreate the pie charts figure from our paper. There is also another Corresponds to the following files: ASTRAL_species_tree_EN_only.newick: the species tree with only effective number (EN) annotation ASTRAL_species_tree_pp1_only.newick: the species tree with only the posterior probability 1 (main topology) annotation ASTRAL_species_tree_q1_only.newick: the species tree with only the quartet scores for the main topology (q1) ASTRAL_species_tree_q2_only.newick: the species tree with only the quartet scores for the first alternative topology (q2) ASTRAL_species_tree_q3_only.newick: the species tree with only the quartet scores for the second alternative topology (q3) print_node_key_files.py: script needed to create the following files: node_keys.key: text file with node IDs and topologies complete_q_scores.key: text file with node IDs multiplied q scores EN_node_vals.key: text file with node IDs and EN values create_pie_charts_tree.py: script needed to visualize the tree with pie charts, pp1, and EN values plotted at nodes ASTRAL_species_tree_full_annotation.newick: the species tree with full annotation from the ASTRAL analysis. NOTE: It may be more useful to examine individual value files if you want to visualize the tree, e.g., in figtree, since the full annotations are extensive and can make viewing difficult. Complete_NT_concatenated_alignment.phy: the nucleotide alignment that includes unmodified third codon positions. The alignment is in phylip format. Complete_NT_raxml_partitions.txt: the raxml-style partition file of the nucleotide partitions Complete_NT_concatenated_tree.newick: the best maximum likelihood tree from the concatenated complete analysis NT with bootstrap values in newick format Complete_NT_partitioned_tree.newick: the best maximum likelihood tree from the partitioned complete NT analysis with bootstrap values in newick format Degeneracy_coded_nt_concatenated_alignment.phy: the degeneracy coded nucleotide alignment in phylip format Degeneracy_coded_nt_raxml_partitions.txt: the raxml-style partition file for the degeneracy coded nucleotide alignment Degeneracy_coded_nt_concatenated_tree.newick: the best maximum likelihood tree from the degeneracy-coded concatenated analysis with bootstrap values in newick format Degeneracy_coded_nt_partitioned_tree.newick: the best maximum likelihood tree from the degeneracy-coded partitioned analysis with bootstrap values in newick format count_ingroup_taxa.py: script that counts the number of ingroup and/or outgroup taxa present in an alignment
Auchenorrhyncha; Hemiptera; alignment; trees
Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2019): Data from TRACTION: Fast non-parametric improvement of estimated gene trees. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1747658_V1
Datasets used in the study, "TRACTION: Fast non-parametric improvement of estimated gene trees," accepted at the Workshop on Algorithms in Bioinformatics (WABI) 2019.
Gene tree correction; horizontal gene transfer; incomplete lineage sorting
Clark, Lindsay V.; Dwiyanti, Maria Stefanie; Anzoua, Kossonou G.; Brummer, Joe E.; Glowacka, Katarzyna; Hall, Megan; Heo, Kweon; Jin, Xiaoli; Lipka, Alexander E.; Peng, Junhua; Yamada, Toshihiko; Yoo, Ji Hye; Yu, Chang Yeon; Zhao, Hua; Long, Stephen P.; Sacks, Erik J. (2019): RAD-seq genotypes for a Miscanthus sinensis diversity panel. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1402948_V1
Genotype calls are provided for a collection of 583 Miscanthus sinensis clones across 1,108,836 loci mapped to version 7 of the Miscanthus sinensis reference genome. Sequence and alignment information for all unique RAD tags is also provided to facilitate cross-referencing to other genomes.
variant call format (VCF); sequence alignment/map format (SAM); miscanthus; single nucleotide polymorphism (SNP); restriction site-associated DNA sequencing (RAD-seq); bioenergy; grass
Buckles, Brittany J; Harmon-Threatt, Alexandra (2019): Data files for "Bee diversity in tallgrass prairies affected by management and its effects on above‐ and below‐ground resources". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0016089_V2
Data used in paper published in the Journal of Applied Ecology titled " Bee diversity in tallgrass prairies affected by management and its effects on above- and below-ground resources" Bee Community file contains info on bees sampled in each site. The first column contain the Tallgrass Prairie Sites sampled all additional columns contain the bee species name in the first row and all individuals recorded. Plant Community file contains info on plants sampled in each site. The first column contain the Tallgrass Prairie Sites sampled all additional columns contain the plant species name in the first row and all individuals recorded. Soil PC1 file contains the soil PC1 values used in the analyses. The first column contain the Tallgrass Prairie Sites sampled, the second column contains the calculated soil PC1 values.
bee; community; tallgrass prairie; grazing
Kehoe, Adam K.; Torvik, Vetle I. (2019): Datasets from "Predicting Controlled Vocabulary Based on Text and Citations: Case Studies in Medical Subject Headings in MEDLINE and Patents". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8020612_V1
# Overview These datasets were created in conjunction with the dissertation "Predicting Controlled Vocabulary Based on Text and Citations: Case Studies in Medical Subject Headings in MEDLINE and Patents," by Adam Kehoe. The datasets consist of the following: * twin_not_abstract_matched_complete.tsv: a tab-delimited file consisting of pairs of MEDLINE articles with identical titles, authors and years of publication. This file contains the PMIDs of the duplicate publications, as well as their medical subject headings (MeSH) and three measures of their indexing consistency. * twin_abstract_matched_complete.tsv: the same as above, except that the MEDLINE articles also have matching abstracts. * mesh_training_data.csv: a comma-separated file containing the training data for the model discussed in the dissertation. * mesh_scores.tsv: a tab-delimited file containing a pairwise similarity score based on word embeddings, and MeSH hierarchy relationship. ## Duplicate MEDLINE Publications Both the twin_not_abstract_matched_complete.tsv and twin_abstract_matched_complete.tsv have the same structure. They have the following columns: 1. pmid_one: the PubMed unique identifier of the first paper 2. pmid_two: the PubMed unique identifier of the second paper 3. mesh_one: A list of medical subject headings (MeSH) from the first paper, delimited by the "|" character 4. mesh_two: a list of medical subject headings from the second paper, delimited by the "|" character 5. hoopers_consistency: The calculation of Hooper's consistency between the MeSH of the first and second paper 6. nonhierarchicalfree: a word embedding based consistency score described in the dissertation 7. hierarchicalfree: a word embedding based consistency score additionally limited by the MeSH hierarchy, described in the dissertation. ## MeSH Training Data The mesh_training_data.csv file contains the training data for the model discussed in the dissertation. It has the following columns: 1. pmid: the PubMed unique identifier of the paper 2. term: a candidate MeSH term 3. cit_count: the log of the frequency of the term in the citation candidate set 4. total_cit: the log of the total number the paper's citations 5. citr_count: the log of the frequency of the term in the citations of the paper's citations 6. total_citofcit: the log of the total number of the citations of the paper's citations 7. absim_count: the log of the frequency of the term in the AbSim candidate set 8. total_absim_count: the log of the total number of AbSim records for the paper 9. absimr_count: the log of the frequency of the term in the citations of the AbSim records 10. total_absimr_count: the log of the total number of citations of the AbSim record 11. log_medline_frequency: the log of the frequency of the candidate term in MEDLINE. 12. relevance: a binary indicator (True/False) if the candidate term was assigned to the target paper ## Cosine Similarity The mesh_scores.tsv file contains a pairwise list of all MeSH terms including their cosine similarity based on the word embedding described in the dissertation. Because the MeSH hierarchy is also used in many of the evaluation measures, the relationship of the term pair is also included. It has the following columns: 1. mesh_one: a string of the first MeSH heading. 2. mesh_two: a string of the second MeSH heading. 3. cosine_similarity: the cosine similarity between the terms 4. relationship_type: a string identifying the relationship type, consisting of none, parent/child, sibling, ancestor and direct (terms are identical, i.e. a direct hierarchy match). The mesh_model.bin file contains a binary word2vec C format file containing the MeSH term embeddings. It was generated using version 3.7.2 of the Python gensim library (https://radimrehurek.com/gensim/). For an example of how to load the model file, see https://radimrehurek.com/gensim/models/word2vec.html#usage-examples, specifically the directions for loading the "word2vec C format."
MEDLINE;MeSH;Medical Subject Headings;Indexing
planned publication date: 2019-12-22
Zachwieja, Alexandra (2019): Data for: Climate, competition, and environment shaped human land use in Late Pleistocene Southeast Asia. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8284108_V1
Dataset providing calculation of a Competition Index (CI) for Late Pleistocene carnivore guilds in Laos and Vietnam and their relationship to humans. Prey mass spectra, Prey focus masses, and prey class raw data can be used to calculate the CI following Hemmer (2004). Mass estimates were calculated for each species following Van Valkenburgh (1990). Full citations to methodological papers are included as relationships with other resources
competition; Southeast Asia; carnivores; humans
Mishra, Shubhanshu (2019): Wikipedia category embeddings - Node2Vec, Poincare, Elmo. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4551278_V1
Wikipedia category tree embeddings based on wikipedia SQL dump dated 2017-09-20 (<a href="https://archive.org/download/enwiki-20170920">https://archive.org/download/enwiki-20170920</a>) created using the following algorithms: * Node2vec * Poincare embedding * Elmo model on the category title The following files are present: * wiki_cat_elmo.txt.gz (15G) - Elmo embeddings. Format: category_name (space replaced with "_") <tab> 300 dim space separated embedding. * wiki_cat_elmo.txt.w2v.gz (15G) - Elmo embeddings. Format: word2vec format can be loaded using Gensin Word2VecKeyedVector.load_word2vec_format. * elmo_keyedvectors.tar.gz - Gensim Word2VecKeyedVector format of Elmo embeddings. Nodes are indexed using * node2vec.tar.gz (3.4G) - Gensim word2vec model which has node2vec embedding for each category identified using the position (starting from 0) in category.txt * poincare.tar.gz (1.8G) - Gensim poincare embedding model which has poincare embedding for each category identified using the position (starting from 0) in category.txt * wiki_category_random_walks.txt.gz (1.5G) - Random walks generated by node2vec algorithm (https://github.com/aditya-grover/node2vec/tree/master/node2vec_spark), each category identified using the position (starting from 0) in category.txt * categories.txt - One category name per line (with spaces). The line number (starting from 0) is used as category ID in many other files. * category_edges.txt - Category edges based on category names (with spaces). Format from_category <tab> to_category * category_edges_ids.txt - Category edges based on category ids, each category identified using the position (starting from 1) in category.txt * wiki_cats-G.json - NetworkX format of category graph, each category identified using the position (starting from 1) in category.txt Software used: * <a href="https://github.com/napsternxg/WikiUtils">https://github.com/napsternxg/WikiUtils</a> - Processing sql dumps * <a href="https://github.com/napsternxg/node2vec">https://github.com/napsternxg/node2vec</a> - Generate random walks for node2vec * <a href="https://github.com/RaRe-Technologies/gensim">https://github.com/RaRe-Technologies/gensim</a> (version 3.4.0) - generating node2vec embeddings from random walks generated usinde node2vec algorithm * <a href="https://github.com/allenai/allennlp">https://github.com/allenai/allennlp</a> (version 0.8.2) - Generate elmo embeddings for each category title Code used: * wiki_cat_node2vec_commands.sh - Commands used to * wiki_cat_generate_elmo_embeddings.py - generate elmo embeddings * wiki_cat_poincare_embedding.py - generate poincare embeddings
Wikipedia; Wikipedia Category Tree; Embeddings; Elmo; Node2Vec; Poincare;
Daniels, Melissa; Larson, Eric (2019): Data for "Effects of forest windstorm disturbance on invasive plants in protected areas of southern Illinois, USA". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1401121_V1
We studied the effect of windstorm disturbance on forest invasive plants in southern Illinois. This data includes raw data on plant abundance at survey points, compiled data used in statistical analyses, and spatial data for surveyed plots and units. This file package also includes a readme.doc file that describes the data in detail, including attribute descriptions.
tornado, blowdowns, derecho, invasive plants, Shawnee National Forest, southern Illinois
MacDonald, Sean; Ward, Michael; Sperry, Jinelle (2019): Manipulating social information to promote frugivory by birds on a Hawaiian Island. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9223847_V1
conspecific attraction; fruit-eating bird; Hawaiian flora; playback experiment; seed dispersal; social information; Zosterops japonicas
Miller, Andrew; Raudabaugh, Daniel (2019): Data from Species Distribution, Phylogenetic Structure, and Functional Roles of Detritius Inhabiting Fungi Across Contrasting Aquatic Environments. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6862941_V1
This dataset contains the data files for the PhD thesis entitled: Species Distribution, Phylogenetic Structure and Functional Roles of Detritus Inhabiting Fungi Across Contrasting Aquatic Environments by Daniel Bruce Raudabaugh. More specifically, it contains the forward Illumina reads for ITS1, ITS2, Beta-tubulin and LSU in addition to the index files and map files needed to process the reads in QIIME 1.9.1. The sequences represent environmental sequencing from detrital samples from Black Moshannon State park (Pennsylvania), Pepper run (Pennsylvania), Nescopeck State park (Pennsylvania), Tannersville Cranberry bog (Pennsylvania), Beulah bog (Wisconsin) and Honey Creek Nature preserve (Wisconsin). The term Peatland includes both bogs and fens habitats. Peatland sites consisted of Black Moshannon State park, Tannersville Cranberry bog and Beulah bog. Stream sites consisted of Pepper Run, Nescopeck State park and Honey Creek Nature preserve. The data set also includes each OTU table with taxonomic determination and the OTU representative sequence, ITS1 alignment files for each fungal class and final RAxML phylogenetic tree MiSeq v2 platform run number 1 QIIME Map file.txt: map file is needed to parse information in QIIME 1.9.1. This file is used in conjunction with ITS1, ITS2, beta-tubulin, and LSU R1 forward read and Index files. ITS1 Index file.fastq: file is associated with ITS1 R1 forward Illumina reads.fastq. ITS1 R1 forward Illumina reads.fastq: ITS1 forward reads from the Illumina MiSeq v2 250 bp run. Reads were generated using PCR product amplified using ITS1F (5'-CTTGGTCATTTAGAGGAAGTAA-'3) and ITS2 (5'-GCTGCGTTCTTCATCGATGC-'3) primers. ITS2 Index file.fastq: file is associated with ITS2 R1 forward Illumina reads.fastq. ITS2 R1 forward Illumina reads.fastq: ITS2 forward reads from the Illumina MiSeq v2 250 bp run. Reads were generated using PCR product amplified using fITS7 (5'-GTGARTCATCGAATCTTTG-'3) and ITS4 (5'-TCCTCCGCTTATTGATATGC-'3) primers. Beta tubulin Index file.fastq: file is associated with Beta tubulin forward Illumina reads.fastq. Beta tubulin forward Illumina reads.fastq: : Beta tubulin forward reads from the Illumina MiSeq v2 250 bp run. Reads were generated using PCR product amplified using BT2AF (5'-GGTAACCAAATCGGTGCTGCTTTC-'3) and BT2BR (5'-ACCCTCAGTGTAGTGACCCTTGGC-'3) primers. MiSeq v2 platform run number 2 QIIME Map file is needed to parse information in QIIME 1.9.1. This file is used in conjunction with LSU R1forward read and Index files. LSU Index file.fastq: file is associated with LSU R1 forward Illumina reads.fastq. LSU R1 forward Illumina reads.fastq: : LSU forward reads from the Illumina MiSeq v2 250 bp run. Reads were generated using PCR product amplified using LROR (5'-CCGCTGAACTTAAGCATATCA-'3) and LR3 (5'-CCGTGTTTCAAGACGGG-'3) primers. OTU tables ITS1 OTU table with taxonomy and sequence data.csv: Standard OTU table with assigned taxonomy from Unite, NCBI, and CONSTAX. The representative sequence for each OTU is included. ITS2 OTU table with taxonomy and sequence data.csv: Standard OTU table with assigned taxonomy from Unite, NCBI, and CONSTAX. The representative sequence for each OTU is included. Beta tubulin OTU table with taxonomy and sequence data.csv: Standard OTU table with assigned taxonomy from NCBI. In addition, the representative. The representative sequence for each OTU is included. LSU OTU table with taxonomy and sequence data.csv: Standard OTU table with assigned taxonomy from SILVA and NCBI. The representative sequence for each OTU is included. Alignment files and resulting RAxML tree for Class level community phylogenetic analyses Alignments were completed in PASTA using the MAFFT alignment option. All alignment files contain backbone sequences obtained from TBAS or NCBI in addition to OTU sequences. All phylogenetic trees were completed in PASTA using the RAxML post-processing option. Alignment_file_Agarcomycetes_trimmed_Phylip RAxML_Agaricomycetes_ITS1_Tree.tre Alignment_file_Dothidiomycetes_trimmed_Phylip RAxML_Dothideomycetes_ITS1_Tree.tre Alignment_file_Eurotiomycetes_trimmed_Phylip: RAxML_Eurotiomycetes_ITS1_Tree.tre Alignment_file_Leotiomycetes_Trimmed_Phylip RAxML_Leotiomycetes_ITS1_Tree.tre Alignment_file_Microbotryomycetes_trimmed_phylip RAxML_Microbotryomycetes_ITS1_Tree.tre Alignment_file_Mortierellomycetes_Trimmed_phylip RAxML_Mortierella_ITS1_Tree.tre Alignment_file_Saccharomycetes_Trimmed_Phylip RAxML_Saccharomycetes_ITS1_Tree.tre Alignment_file_Sordariomycetes_trimmed_Phylip RAxML_Sordriomycetes_ITS1_Tree.tre Alignment_file_Tremellomycetes_trimmed_phylip RAxML_Tremellomycetes_ITS1_Tree.tre Alignment file and resulting RAxML tree for Community level phylogenetic analyses Alignment was completed in PASTA using the MAFFT alignment option and resulting tree was completed using the RAxML post-processing option. Alignment_file_ LSU_RDP_fungal_community.aln LSU_RDP_fungal_community_Tree.tre
ITS1 forward reads; Illumina; peatlands; streams; bogs; fens
Rapti, Zoi (2019): Control of bacterial infections via antibiotic-induced proviruses . University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9721455_V1
Software (Matlab .m files) for the article: Modeling the control of bacterial infections via antibiotic-induced proviruses. The files can be used to reproduce the analysis and figures in the article.
Matlab codes; antibiotic-induced dynamics
Krichels, Alexander (2019): Data For: Iron redox reactions can drive microtopographic variation in upland soil carbon dioxide and nitrous oxide emissions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8512100_V1
These files contain the data presented in the manuscript entitles "Iron redox reactions can drive microtopographic variation in upland soil carbon dioxide and nitrous oxide emissions".
Iron; redox; carbon dioxide; nitrous oxide; chemodenitrification; Feammox; dissimilatory iron reduction; upland soils; flooding; global change
Sashittal, Palash; El-Kebir, Mohammed (2019): SharpTNI Results. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9734610_V1
Results generated using SharpTNI on data collected from the 2014 Ebola outbreak in Sierra Leone.
Miller, Andrew; Raudabaugh, Daniel (2019): Supplemental data sets for Raudabaugh et al., Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1530363_V1
The data set contains Supplemental data sets for the Manuscript entitled "Where are they hiding? Testing the body snatchers hypothesis in pyrophilous fungi." Environmental sampling: Amplification of nuclear DNA regions (ITS1 and ITS2) were completed using the Fluidigm Access Array and the resulting amplicons were sequenced on an Illumina MiSeq v2 platform runs using rapid 2 × 250 nt paired-end reads. Illumina sequencing run amplicons that were size selected into <500nt and >500nt sub-pools, then remixed together <500nt: >500nt by nM concentration in a 1x:3x proportion. All amplification and sequencing steps were performed at the Roy J. Carver Biotechnology Center at the University of Illinois Urbana-Champaign. ITS1 region primers consisted of ITS1F (5'-CTTGGTCATTTAGAGGAAGTAA-'3) and ITS2 (5'-GCTGCGTTCTTCATCGATGC-'3). ITS2 region primers consisted of fITS7 (5'-GTGARTCATCGAATCTTTG-'3) and ITS4 (5'-TCCTCCGCTTATTGATATGC-'3). Supplemental files 1 through 5 contain the raw data files. Supplemental 1 is the ITS1 Illumina MiSeq forward reads and Supplemental 2 is the corresponding index files. Supplemental 3 is the ITS2 Illumina MiSeq forward reads and Supplemental 4 is the corresponding index files. Supplemental 5 is the map file needed to process the forward reads and index files in QIIME. Supplemental 6 and 7 contain the resulting QIIME 1.9.1. OTU tables along with UNITE, NCBI, and CONSTAX taxonomic assignments in addition to the representative OTU sequence. Numeric samples within the OTU tables correspond to the following: 1 Brachythecium sp. 2 Usnea cornuta 3 Dicranum sp. 4 Leucodon julaceus 5 Lobaria quercizans 6 Rhizomnium sp. 7 Dicranum sp. 8 Thuidium delicatulum 9 Myelochroa aurulenta 10 Atrichum angustatum 11 Dicranum sp. 12 Hypnum sp. 13 Atrichum angustatum 14 Hypnum sp. 15 Thuidium delicatulum 16 Leucobryum sp. 17 Polytrichum commune 18 Atrichum angustatum 19 Atrichum angustatum 20 Atrichum crispulum 21 Bryaceae 22 Leucobryum sp. 23 Conocephalum conicum 24 Climacium americanum 25 Atrichum angustatum 26 Huperzia serrata 27 Polytrichum commune 28 Diphasiastrum sp. 29 Anomodon attenuatus 30 Bryoandersonia sp. 31 Polytrichum commune 32 Thuidium delicatulum 33 Brachythecium sp. 34 Leucobryum glaucum 35 Bryoandersonia sp. 36 Anomodon attenuatus 37 Pohlia sp. 38 Cinclidium sp. 39 Hylocomium splendens 40 Polytrichum commune 41 negative control 42 Soil 43 Soil 44 Soil 45 Soil 46 Soil 47 Soil If a sample number is not present within the OTU table; either no sequences were obtained or no sequences passed the quality filtering step in QIIME. Supplemental 8 contains the Summary of unique species per location.
Rezapour, Rezvaneh; Diesner, Jana (2019): Expanded Morality Lexicon. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3805242_V1.1
This lexicon is the expanded/enhanced version of the Moral Foundation Dictionary created by Graham and colleagues (Graham et al., 2013). Our Enhanced Morality Lexicon (EML) contains a list of 4,636 morality related words. This lexicon was used in the following paper - please cite this paper if you use this resource in your work. Rezapour, R., Shah, S., & Diesner, J. (2019). Enhancing the measurement of social effects by capturing morality. Proceedings of the 10th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA). Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, MN. In addition, please consider citing the original MFD paper: <a href="https://doi.org/10.1016/B978-0-12-407236-7.00002-4">Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology (Vol. 47, pp. 55-130)</a>.
Soliman, Aiman; Mackay, Andrew; Schmidt , Arthur; Allan, Brian; Wang, Shaowen (2018): Dataset for: Quantifying the geographic distribution of building coverage across the US for urban sustainability studies. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4137411_V1
A complete building coverage area dataset (i.e. area occupied by building structures, excluding other built surfaces such as roads, parking lots, and public parks) at the level of census block groups for the contiguous United States (CONUS). The dataset was assembled based on an ensemble prediction of nonlinear hierarchical models to account for spatial heterogeneities in the distribution of built surfaces across different urban communities. Percentage of impervious land and housing density were used as predictors of the estimated area of buildings and cross-validation results showed that the product estimated area represented by buildings with a mean error of 0.049 %.
Building Coverage Area; Urban Geography; Regional; Sustainability; US Census Block Groups; CONUS Data
Tomkin, Jonathan (2018): COPUS observations for NSF WIDER study. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5634345_V1
Sixty undergraduate STEM lecture classes were observed across 14 departments at the University of Illinois Urbana-Champaign in 2015 and 2016. We selected the classes to observe using purposive sampling techniques with the objectives of (1) collecting classroom observations that were representative of the STEM courses offered; (2) conducting observations on non-test, typical class days; and (3) comparing these classroom observations using the Class Observation Protocol for Undergraduate STEM (COPUS) to record the presence and frequency of active learning practices utilized by Community of Practice (CoP) and non-CoP instructors. Decimal values are the result of combined observations. All COPUS codes listed are from Smith (2013) "The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize STEM Classroom Practices" paper. For more information on the data collection process, see "Evidence that communities of practice are associated with active learning in large STEM lectures" by Tomkin et. al. (2019) in the International Journal of STEM Education.
COPUS, Community of Practice
Wang, Wenrui; Wang, Tao; Amin, Vivek P.; Wang, Yang; Radhakrishnan, Anil; Davidson, Angie; Allen, Shane R.; Silva, T. J.; Ohldag, Hendrik; Balzar, Davor; Zink, Barry L.; Haney, Paul M.; Xiao, John Q.; Cahill, David G.; Lorenz, Virginia O.; Fan, Xin (2019): Dataset for "Anomalous Spin-Orbit Torques in Magnetic Single-Layer Films". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7281207_V1
This dataset provides the raw data, code and related figures for the paper, "Anomalous Spin-Orbit Torques in Magnetic Single-Layer Films."
spintronics; spin-orbit torques; magnetic materials
Rando, Halie; Wadlington, William; Johnson, Jennifer; Stutchman, Jeremy; Trut, Lyudmila; Farré, Marta; Kukekova, Anna (2019): Red Fox (Vulpes vulpes) Y-Chromosome Sequence. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4447017_V1
This dataset contains raw data associated with the red fox Y-chromosome assembly (see https://doi.org/10.3390/genes10060409). It includes a fasta file of the 171 scaffolds from the red fox reference genome assembly identified as likely to contain Y-chromosome sequence, the raw BLAST results, and the ABySS assemblies described in the manuscript.
Y-chromosome; carnivore; Vulpes vulpes; sex chromosomes; MSY; Y-chromosome genes; copy-number variation; BCORY2; UBE1Y; next-generation sequencing
Hahn, Jim (2019): Frequent pattern subject transactions from the University of Illinois Library (2016 - 2018). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9440404_V1
The data are provided to illustrate methods in evaluating systematic transactional data reuse in machine learning. A library account-based recommender system was developed using machine learning processing over transactional data of 383,828 transactions (or check-outs) sourced from a large multi-unit research library. The machine learning process utilized the FP-growth algorithm over the subject metadata associated with physical items that were checked-out together in the library. The purpose of this research is to evaluate the results of systematic transactional data reuse in machine learning. The analysis herein contains a large-scale network visualization of 180,441 subject association rules and corresponding node metrics.
evaluating machine learning; network science; FP-growth; WEKA; Gephi; personalization; recommender systems
Krichels, Alexander (2019): Data for: Dynamic controls on field-scale soil nitrous oxide hot spots and hot moments across a microtopographic gradient. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9733959_V1
This dataset includes all data presented in the manuscript entitled: "Dynamic controls on field-scale soil nitrous oxide hot spots and hot moments across a microtopographic gradient"
denitrification; depressions; microtopography; nitrous oxide; soil oxygen; soil temperature
Detmer, Thomas; Wahl, David (2019): Trophic cascade strength is influenced by size frequency distribution of primary consumers and size-selective predation: examined with mesocosms and modeling. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3292716_V1
Data set of trophic cascade in mesocosms experiments for zooplankton (biomass and body size) and phytoplankton (chlorophyll a concentration) caused by Bluegill as well as zooplankton production in those same treatment groups. Zooplankton were collected by tube sampler and phytoplankton were collected through grab samples.
Trophic cascades; size-selective predation; compensatory mechanisms; biomanipulation; invasive fish; Daphnia; Moina
Kansara, Yogeshwar; Hoang, Linh (2019): Articles With PubMed Identifiers. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4623305_V1
The data file contains a list of articles that have PubMed identifiers, which were used in a project associated with the manuscript "An in-situ evaluation of the RCT Tagger using 7413 articles included in 570 Cochrane reviews with RCT-only inclusion criteria".
Cochrane reviews; Randomized controlled trials; RCT; Automation; Systematic reviews