Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
University Library, University of Illinois at Urbana-Champaign
Toggle navigation
Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
<
1
2
3
4
5
6
7
8
9
…
23
24
>
25 per page
50 per page
Show All
Displaying datasets 1 - 25 of 585 in total
Clear Filters
Generate Report from Search Results
Subject Area
Life Sciences (312)
Social Sciences (128)
Physical Sciences (85)
Technology and Engineering (51)
Uncategorized (8)
Arts and Humanities (1)
Funder
U.S. National Science Foundation (NSF) (175)
Other (174)
U.S. Department of Energy (DOE) (60)
U.S. National Institutes of Health (NIH) (52)
U.S. Department of Agriculture (USDA) (33)
Illinois Department of Natural Resources (IDNR) (14)
U.S. Geological Survey (USGS) (6)
U.S. National Aeronautics and Space Administration (NASA) (5)
Illinois Department of Transportation (IDOT) (3)
U.S. Army (2)
Publication Year
2021 (108)
2022 (108)
2020 (96)
2019 (72)
2023 (72)
2018 (59)
2017 (35)
2016 (30)
2024 (5)
License
CC0 (328)
CC BY (240)
custom (17)
published: 2023-12-06
Starbuck, Clarissa; DeSchepper, Logan; Hoggatt, Meredith; O'Keefe, Joy (2023): Data for Tradeoffs in sound quality and cost for passive acoustic devices. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4200947_V1
This dataset accompanies an article published in the journal Bioacoustics: "Tradeoffs in sound quality and cost for passive acoustic devices", https://doi.org/10.1080/09524622.2023.2290715. The dataset contains measurements for acoustic call files for free-flying bats simultaneously recorded on both Audiomoth and Anabat Swift passive acoustic recording devices in a conservation area in northeastern Missouri, USA. We paired calls from the two devices and compared indicators of recording quality measured in a proprietary program (Bat Call Identification Software). The dataset also contains a file enumerating the proportions of calls classified as low frequency, mid frequency, or Myotis (three phonic groups) for each type of recording device. The data were used to compare the quality and sensitivity of the two devices. The scripts for modeling procedures and figures are included in the dataset.
keywords:
Bats; echolocation; passive acoustic monitoring; sensors
published: 2023-10-26
Digrado, Anthony; Montes, Christopher M.; Baxter, Ivan; Ainsworth, Elizabeth (2023): Soybean seed quality response to eCO2 data files. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6453957_V1
This data set is related to a SoyFACE experiment conducted in 2004, 2006, 2007, and 2008 with the soybean cultivars Loda and HS93-4118. The experiment looked at how seed elements were affected by elevated CO2 and yield. ---- The ionomic_data.txt file contains the ionomic data (mg/kg) for the two cultivars. The cultivars, years, treatment, and the plot from which the samples were collected are given for each entry. ---- The yield_data.txt contains the yield data for the two cultivars (seed yield in kg/ha, seed yield in bu/a, Protein (%), Oil (%)). The cultivars, years, treatment, and the plot from which the samples were collected are given for each entry. ---- The meteorological_data.txt contains the meteorological data recorded by a weather station located ~ 3km from the experimental site (Willard Airport Champaign). Data covering the period between May 28 and September 24 were used for 2004; between May 25 and September 24 were used in 2006; between May 23 and September 17 in 2007; and between June 16 and October 24 in 2008. The headers are explained below: year = year month = month day = day max_wind_gust = maximum daily wind gust (miles per hour) xwser = error flag for maximum daily wind gust avg_wind_speed = average daily wind speed(miles per hour) awser = error flag for average daily wind speed avg_wind_dir = average daily wind direction (degrees, clockwise from north) awder = error flag for average daily wind direction sol_rad = total daily solar radiation (mega-Joules per square meter) soler = error flag for total daily solar radiation max_air_temp = daily maximum air temperature (degrees Fahrenheit) xater = error flag for daily maximum air temperature min_air_temp = daily minimum air temperature (degrees Fahrenheit) nater = error flag for daily minimum air temperature avg_air_temp = average daily air temperature (degrees Fahrenheit) aater = error flag for average daily air temperature max_rel_hum = daily maximum relative humidity (percent) xrher = error flag for daily maximum relative humidity min_rel_hum = daily minimum relative humidity (percent) nrher = error flag for daily minimum relative humidity avg_rel_hum = average daily relative humidity (percent) arher = error flag for average daily relative humidity avg_dewpt_temp = average daily dew point temperature (degrees Fahrenheit) adper = error flag for average daily dew point temperature precip = total daily precipitation (inches) pcer = error flag for total daily precipitation pot_evapot = total potential evapotranspiration (inches) pevaper = error flag for total potential evapotranspiration max_soiltemp_4in = daily maximum 4-inch soil temperature under sod (degrees Fahrenheit) xst4er = error flag for daily maximum 4-inch soil temperature under sod min_soiltemp_4in = daily minimum 4-inch soil temperature under sod (degrees Fahrenheit) nst4er = error flag for daily minimum 4-inch soil temperature under sod avg_soiltemp_4in = average daily 4-inch soil temperature under sod (degrees Fahrenheit) ast4er = error flag for error flag for average daily 4-inch soil temperature under sod max_soiltemp_8in = daily maximum 8-inch soil temperature under sod (degrees Fahrenheit) xst8er = error flag for error flag for daily maximum 8-inch soil temperature under sod min_soiltemp_8in = daily minimum 8-inch soil temperature under sod (degrees Fahrenheit) nst8er = error flag for daily minimum 8-inch soil temperature under sod avg_soiltemp_8in = average daily 8-inch soil temperature under sod (degrees Fahrenheit) ast8er = error flag for error flag for average daily 8-inch soil temperature under sod max_soiltemp_4in_bare = daily maximum 4-inch soil temperature under bare soil (degrees Fahrenheit) xst4bareer = error flag for daily maximum 4-inch soil temperature under bare soil min_soiltemp_4in_bare = daily minimum 4-inch soil temperature under bare soil (degrees Fahrenheit) nst4bareer = error flag for daily minimum 4-inch soil temperature under bare soil avg_soiltemp_4in_bare = average daily 4-inch soil temperature under bare soil (degrees Fahrenheit) ast4bareer = error flag for error flag for average daily 4-inch soil temperature under bare soil max_soiltemp_2in_bare = daily maximum 2-inch soil temperature under bare soil (degrees Fahrenheit) xst2bareer = error flag for daily maximum 2-inch soil temperature under bare soil min_soiltemp_2in_bare = daily minimum 2-inch soil temperature under bare soil (degrees Fahrenheit) nst2bareer = error flag for daily minimum 2-inch soil temperature under bare soil avg_soiltemp_2in_bare = average daily 2-inch soil temperature under bare soil (degrees Fahrenheit) ast2bareer = error flag for error flag for average daily 2-inch soil temperature under bare soil site = station name
keywords:
protein; oil; mineral; SoyFACE; nutrient; Glycine max; soybean; yield; CO2; agriculture; climate change
has sharing link
published: 2023-12-01
Hohoff, Tara; Deppe, Jill (2023): Little brown occupancy and associated landcover data from McHenry County, Illinois. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0365076_V1
Mist netting data for little brown bats (Myotis lucifugus) in McHenry County, Illinois and output of acoustic data processed using Kaleidoscope (Version 5.1.9, Bats of North America 5.1.0; Wildlife Acoustics) auto-identification software. Associated survey metadata and landcover metrics calculated using Fragstats included.
keywords:
little brown bats; mist netting; acoustics
has sharing link
planned publication date: 2024-01-31
Kent, Angela; Bohn, Martin (2024): Nitrogen cycling activity associated with nitrification-inhibiting maize near-isogenic lines. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4878391_V1
This dataset contains: field study design parameters, plant performance metrics, and nitrogen cycling rates associated with a field experiment that compared nitrification rates between maize lines with and without nitrification inhibition loci nitrogen fixation rates with with and without a nitrogen fixing inoculant product. The overarching goal was to evaluate nitrogen fixation by a diazotroph inoculant and retention of nitrogen in the rhizosphere via a novel nitrification inhibition phenotype of maize.
keywords:
maize; microbiome; nitrogen cycling; nitrification; nitrogen fixation
published: 2023-11-14
Gotsis, Dimitrios; Kelkar, Varun; Deshpande, Rucha; Brooks, Frank; KC, Prabhat; Myers, Kyle; Zeng, Rongping; Anastasio, Mark (2023): Data for the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2773204_V3
This repository contains the training dataset associated with the 2023 Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics (DGM-Image Challenge), hosted by the American Association of Physicists in Medicine. This dataset contains more than 100,000 8-bit images of size 512x512. These images emulate coronal slices from anthropomorphic breast phantoms adapted from the VICTRE toolchain [1], with assigned X-ray attenuation coefficients relevant for breast computed tomography. Also included are the labels indicating the breast type. The challenge has now concluded. More information about the challenge can be found here: <a href="https://www.aapm.org/GrandChallenge/DGM-Image/">https://www.aapm.org/GrandChallenge/DGM-Image/</a>. * New in V3: we added a CSV file containing the image breast type labels and example images (PNG).
keywords:
Deep generative models; breast computed tomography
planned publication date: 2023-12-18
Edmonds, Devin; Adamovicz, Laura; Allender, Matthew; Colton, Andrea; Randy, Nyboer; Michael, Dreslik (2023): Data for Evaluating Population Persistence of Ornate Box Turtles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6384815_V1
We conducted long-term capture-mark-recapture surveys on two isolated ornate box turtle (Terrapene ornata) populations in northern Illinois, USA. This dataset provides the capture history strings and additional demographic information used for estimating population vital rates with robust design capture-mark-recapture models. The vital rates were then used in a stage-based population projection matrix model for each population.
keywords:
demography; capture-mark-recapture; vital rates; conservation; wildlife ecology
published: 2023-10-26
Louie, Allison Y.; Rund, Laurie A.; Komiyama-Kasai, Karin A.; Weisenberger, Kelsie E.; Stanke, Kayla L.; Larsen, Ryan J.; Leyshon, Brian J.; Kuchan, Matthew J.; Das, Tapas; Steelman, Andrew J. (2023): Data for "A hydrolyzed lipid blend diet promotes myelination in neonatal piglets in a region and concentration-dependent manner.". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4218705_V1
This dataset contains MRI data and Imaris modeling analysis of CLARITY-cleared, immunostained tissue associated with a study that assessed the effects of lipid blends containing various levels of a hydrolyzed fat system on myelin development in healthy neonatal piglets. Data are from thirty-two piglets of mixed sexes across four diet treatment groups and includes a sow-fed reference group. MRI data (presented in Figure 2 of the associated article) consists of volumetric data from Voxel-Based Morphometry analysis in brain grey matter and white matter, as well as mean fractional anisotropy and mean orientation dispersion index data from Tract-Based Spatial Statistics analysis. Imaris data (presented in Figure 3 of the associated article) consists of twenty-one select output measures from 3D modeling analysis of PLP-stained prefrontal cortex tissue. All methods used for collection/generation/processing of data are described in the associated article: Louie AY, Rund LA, Komiyama-Kasai KA, Weisenberger KE, Stanke KL, Larsen RJ, Leyshon BJ, Kuchan MJ, Das T, Steelman AJ. A hydrolyzed lipid blend diet promotes myelination in neonatal piglets in a region and concentration-dependent manner. J Neurosci Res. 2023.
keywords:
myelin; dietary lipid; white matter; CLARITY; Imaris; voxel-based morphometry; diffusion tensor imaging
published: 2023-10-26
Maffeo, Christopher; Aksimentiev, Aleksei (2023): Simulation trajectories for "A DNA turbine powered by a transmembrane potential across a nanopore". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3458097_V1
Simulation trajectory data and scripts for Nature Nanotechnology manuscript "A DNA turbine powered by a transmembrane potential across a nanopore" that demonstrates a rationally designed nanoscale DNA-origami turbine with three chiral blades that uses a transmembrane electrochemical potential across a nanopore to drive a DNA bundle into sustained unidirectional rotations of up to 10 revolutions/s. Driven by the asymmetric mobility of a DNA duplex, the rotation direction of the turbine is set by its designed chirality and the salinity of the solvent.
keywords:
All-atom MD simulation; DNA; nanotechnology; motors and rotors
planned publication date: 2024-01-01
Edmonds, Devin; Bach, Elizabeth; Colton, Andrea; Jaquet, Izabelle; Kessler, Ethan; Dreslik, Michael (2024): Data for Ornate Box Turtle (Terrapene ornata) Emergence. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7298951_V1
These data were used to make a predictive model of when ornate box turtles (Terrapene ornata) are likely to be above ground and at risk from fire. The data were generated using shell temperatures, soil temperatures at 0.35 m deep from known overwintering sites, and the spring and fall soil temperature inversion dates during 2019–2022 to infer if 26 individual radio-tracked turtles were above or below ground at three sites in Illinois.
keywords:
turtle; conservation; controlled burn; fire management; ectotherm; hibernation; brumation; reptile
published: 2023-10-22
Davidson, Ruth; Vachaspati, Pranjal; Mirarab, Siavash; Warnow, Tandy (2023): Data from: Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6670066_V1
HGT+ILS datasets from Davidson, R., Vachaspati, P., Mirarab, S., & Warnow, T. (2015). Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC genomics, 16(10), 1-12. Contains model species trees, true and estimated gene trees, and simulated alignments.
keywords:
evolution; computational biology; bioinformatics; phylogenetics
has sharing link
planned publication date: 2024-10-16
Smith, Rebecca; Huang, Conghui (2024): Data for A modeling study on SARS-CoV-2 transmission in primary and middle schools in Illinois. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3705306_V1
School testing data were provided by Shield Illinois (ShieldIL), which conducted weekly in-school testing on behalf of the Illinois Department of Public Health (IDPH) for all participating schools in the state excluding Chicago Public Schools. The populations and proportions of students and employees in the studied school districts are reported by Elementary/Secondary Information System (ElSi) database.
keywords:
COVID-19; school testing
published: 2023-10-16
Rasoarimanana, Tantely; Edmonds, Devin; Marquis, Olivier (2023): Data for Mantella baroni Habitat Preference and Abundance. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2234820_V1
This dataset provides microhabitat and environmental variables collected in the habitat of the poison frog Mantella baroni from 155 1-meter square quadrats in Vohimana Reserve along forest valleys, on slopes, and on ridgelines. We also provide data from photographic capture-recapture surveys used for estimating abundance.
keywords:
occupancy; abundance; amphibian; Madagascar; microhabitat; capture-recapture
published: 2019-07-11
Daniels, Melissa; Larson, Eric (2019): Data for Effects of forest windstorm disturbance on invasive plants in protected areas of southern Illinois, USA. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1401121_V1
We studied the effect of windstorm disturbance on forest invasive plants in southern Illinois. This data includes raw data on plant abundance at survey points, compiled data used in statistical analyses, and spatial data for surveyed plots and units. This file package also includes a readme.doc file that describes the data in detail, including attribute descriptions.
keywords:
tornado, blowdowns, derecho, invasive plants, Shawnee National Forest, southern Illinois
published: 2023-09-21
Clarke, Caitlin; Lischwe Mueller, Natalie; Joshi, Manasi Ballal; Fu, Yuanxi; Schneider, Jodi (2023): The Inclusion Network of 27 Review Articles Published between 2013-2018 Investigating the Relationship Between Physical Activity and Depressive Symptoms. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4614455_V4
The relationship between physical activity and mental health, especially depression, is one of the most studied topics in the field of exercise science and kinesiology. Although there is strong consensus that regular physical activity improves mental health and reduces depressive symptoms, some debate the mechanisms involved in this relationship as well as the limitations and definitions used in such studies. Meta-analyses and systematic reviews continue to examine the strength of the association between physical activity and depressive symptoms for the purpose of improving exercise prescription as treatment or combined treatment for depression. This dataset covers 27 review articles (either systematic review, meta-analysis, or both) and 365 primary study articles addressing the relationship between physical activity and depressive symptoms. Primary study articles are manually extracted from the review articles. We used a custom-made workflow (Fu, Yuanxi. (2022). Scopus author info tool (1.0.1) [Python]. <a href="https://github.com/infoqualitylab/Scopus_author_info_collection">https://github.com/infoqualitylab/Scopus_author_info_collection</a> that uses the Scopus API and manual work to extract and disambiguate authorship information for the 392 reports. The author information file (author_list.csv) is the product of this workflow and can be used to compute the co-author network of the 392 articles. This dataset can be used to construct the inclusion network and the co-author network of the 27 review articles and 365 primary study articles. A primary study article is "included" in a review article if it is considered in the review article's evidence synthesis. Each included primary study article is cited in the review article, but not all references cited in a review article are included in the evidence synthesis or primary study articles. The inclusion network is a bipartite network with two types of nodes: one type represents review articles, and the other represents primary study articles. In an inclusion network, if a review article includes a primary study article, there is a directed edge from the review article node to the primary study article node. The attribute file (article_list.csv) includes attributes of the 392 articles, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Collectively, this dataset reflects the evidence production and use patterns within the exercise science and kinesiology scientific community, investigating the relationship between physical activity and depressive symptoms. FILE FORMATS 1. article_list.csv - Unicode CSV 2. author_list.csv - Unicode CSV 3. Chinese_author_name_reference.csv - Unicode CSV 4. inclusion_net_edges.csv - Unicode CSV 5. review_article_details.csv - Unicode CSV 6. supplementary_reference_list.pdf - PDF 7. README.txt - text file 8. systematic_review_inclusion_criteria.csv - Unicode CSV <b>UPDATES IN THIS VERSION COMPARED TO V3</b> (Clarke, Caitlin; Lischwe Mueller, Natalie; Joshi, Manasi Ballal; Fu, Yuanxi; Schneider, Jodi (2023): The Inclusion Network of 27 Review Articles Published between 2013-2018 Investigating the Relationship Between Physical Activity and Depressive Symptoms. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4614455_V3) - We added a new file systematic_review_inclusion_criteria.csv.
keywords:
systematic reviews; meta-analyses; evidence synthesis; network visualization; tertiary studies; physical activity; depressive symptoms; exercise; review articles
published: 2023-09-20
Chase, Marissa H. ; Charles, Brian; Harmon-Threatt, Alexandra; Fraterrigo, Jennifer (2023): Diverse forest management strategies support functionally and temporally distinct bee communities. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8891496_V1
Dataset includes bee trait information and species abundance information for bees collected at 29 forests plots in southern Illinois, USA. Plots are located within three public land sites. Environmental data were also collected for each of the 29 plots.
keywords:
wild bees; forest management; functional traits
published: 2023-09-19
Salami, Malik; Lee, Jou; Schneider, Jodi (2023): Stopwords and keywords for manual field assignment for the STI 2023 paper Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8847584_V2
We used the following keywords files to identify categories for journals and conferences not in Scopus, for our STI 2023 paper "Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science". The first four text files each contains keywords/content words in the form: 'keyword1', 'keyword2', 'keyword3', .... The file title indicates the name of the category: file1: healthscience_words.txt file2: lifescience_words.txt file3: physicalscience_words.txt file4: socialscience_words.txt The first four files were generated from a combination of software and manual review in an iterative process in which we: - Manually reviewed venue titles were not able to automatically categorize using the Scopus categorization or extending it as a resource. - Iteratively reviewed uncategorized venue titles to manually curate additional keywords as content words indicating a venue title could be classified in the category healthscience, lifescience, physicalscience, or socialscience. We used English content words and added words we could automatically translate to identify content words. NOTE: Terminology with multiple potential meanings or contain non-English words that did not yield useful automatic translations e.g., (e.g., Al-Masāq) were not selected as content words. The fifth text file is a list of stopwords in the form: 'stopword1', 'stopword2, 'stopword3', ... file5: stopwords.txt This file contains manually curated stopwords from venue titles to handle non-content words like 'conference' and 'journal,' etc. This dataset is a revision of the following dataset: Version 1: Lee, Jou; Schneider, Jodi: Keywords for manual field assignment for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois at Urbana-Champaign Data Bank. Changes from Version 1 to Version 2: - Added one author - Added a stopwords file that was used in our data preprocessing. - Thoroughly reviewed each of the 4 keywords lists. In particular, we added UTF-8 terminology, removed some non-content words and misclassified content words, and extensively reviewed non-English keywords.
keywords:
health science keywords; scientometrics; stopwords; field; keywords; life science keywords; physical science keywords; science of science; social science keywords; meta-science; RISRS
published: 2023-09-13
Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2023): Additional datasets (RNASim10k) for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4194451_V1
This upload contains one additional set of datasets (RNASim10k, ten replicates) used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zipped file has the following structure: 10k |__R0 |__unaln.fas |__true.fas |__true.tre |__R1 ... # Alignment files: 1. `unaln.fas`: all unaligned sequences. 2. `true.fas`: the reference alignment of all sequences. 3. `true.tre`: the reference tree on all sequences. For other datasets that uniquely appeared in EMMA, please refer to the related dataset (which is linked below): Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1
keywords:
SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published: 2022-08-08
Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1
This upload contains all datasets used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zip file has the following structure (presented as an example): salma_paper_datasets/ |_README.md |_10aa/ |_crw/ |_homfam/ |_aat/ | |_... |_... |_het/ |_5000M2-het/ | |_... |_5000M3-het/ ... |_rec_res/ Generally, the structure can be viewed as: [category]/[dataset]/[replicate]/[alignment files] # Categories: 1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate. 2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM). 3. homfam: There are the 10 largest Homfam datasets, each with one replicate. 4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates. 5. rec\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper. # Alignment files There are at most 6 `.fasta` files in each sub-directory: 1. `all.unaln.fasta`: All unaligned sequences. 2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included. 3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences). 4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included. 5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences). 6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included. >If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing. >If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing. >If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing. # Additional file(s) 1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.
keywords:
SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
has sharing link
published: 2023-09-01
Chakraborty, Sulagna; Steckler, Teresa; Gronemeyer, Peg; Mateus-Pinilla, Nohra; Smith, Rebecca (2023): Farmers’ knowledge, attitudes, and prevention practices regarding ticks and tickborne diseases in Illinois. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3988796_V1
An online and paper knowledge, attitudes, and practices survey on ticks and tick-borne diseases (TBD) was distributed to farmers in Illinois during summer 2020 to spring 2022 (paper version titled Final Draft Farmer KAP_v.SoftCopy_Revised.docx). These are the raw data associated with that survey and the survey questions used (FarmerTickKAPdata.csv, data dictionary in Data Description.docx). We have added calculated values (columns 286 to end, code for calculation in FarmerKAPvariableCalculation.R), including: the tick knowledge score, TBD knowledge score, and total knowledge score, which are the sum of the total number of correct answers in each category, and score percent, which are the proportion of correct answers in each category.
keywords:
ticks; survey; tick-borne disease; farmer
planned publication date: 2024-08-24
Jones, Todd; Llamas, Alfredo; Phillips, Jennifer (2024): Data for Jones et al. GCB-23-1273.R1. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6010827_V1
Dataset associated with Jones et al. GCB-23-1273.R1 submission: Phenotypic signatures of urbanization? Resident, but not migratory, songbird eye size varies with urban-associated light pollution levels. Excel CSV file with all of the data used in analyses and file with descriptions of each column.
keywords:
body size; demographics; eye size; phenotypic divergence; songbirds; sensory pollution; urbanization
has sharing link
published: 2023-08-24
Kim, Hyunchul; Zhao, Helin; van der Zande, Arend (2023): Data for Strain-resilient field-effect transistors based on wrinkled graphene/MoS2 heterostructures. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6434046_V1
This data set includes all of data related to strain-resilient FETs based on 2D heterostructures including optical images of FETs, Raman characteristics data, Transport measurement data, and AFM topography data.
keywords:
2D materials; Stretchable electronics
published: 2023-08-11
Li, Shuai; Leakey, Andrew D.B.; Moller, Christopher A.; Montes, Christopher M.; Sacks, Erik J.; DeKyoung, Lee; Ainsworth, Elizabeth A. (2023): Similar photosynthetic, but different yield responses of C3 and C4 crops to elevated O3. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9446886_V1
This dataset contains leaf photosynthetic and biochemical traits, plant biomass, and yield in five C3 crops (chickpea, rice, snap bean, soybean, wheat) and four C4 crops (sorghum, maize, Miscanthus × giganteus, switchgrass) grown under ambient and elevated O3 concentration ([O3]) in the field at free-air O3 concentration enrichment (O3-FACE) facilities over the past 20 years.
keywords:
C3 and C4 crops; elevated O3; FACE; photosynthesis; yield
published: 2023-08-04
Zinnen, Jack; Matthews, Jeffrey W.; Zaya, David N. (2023): Genetic, demographic, and spatial information for a study of Phlox pilosa ssp. sangamonensis, and congeners. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5376622_V1
Data are provided that are relevant to the rare plant Phlox pilosa ssp. sangamonensis, or Sangamon phlox, and other members of the genus that occur in its native range. Sangamon phlox is a state-endangered subspecies that is only known to occur in two Illinois counties. Data provided come from all known Sangamon phlox populations, which we estimate as 10 separate populations. Data include genetic data from DNA microsatellite loci (allele sizes and basic summaries), flowering population size estimates, rates of fruit set, and rates of seed set. Additionally, genetic data (from microsatellites) are provided for Phlox divaricata ssp. laphamii (three populations), Phlox pilosa ssp. pilosa (two populations), and Phlox pilosa ssp. fulgida (two populations).
keywords:
Phlox; conservation genetics; microsatellites; endemism; rare plants
published: 2023-08-03
Dalling, James William (2023): Data for Zombie leaves: novel repurposing of senescent fronds in the tree fern Cyathea rojasiana for nutrient uptake in a tropical montane forest. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2925327_V1
This file contains the delta 15N values for leaf material collected from Cyathea rojasiana tree ferns before and after fertilization using ammonium -15N chloride solution to determine whether 15N update is possible from senescent leaves. Details of the experiment are provided in the online supplement to the published paper. Briefly, In February 2022 we selected three mature C. rojasiana individuals 1-1.5m in height that had leaves rooted in the soil and one new developing (but unexpanded) leaf. For each fern, two plastic pots (10 x 10 x 12 cm) were filled with a 50:50 mixture of washed river sand and soil from the Chorro watershed. For each pot, one senescent leaf that was rooted in the soil was carefully excavated and its roots transplanted into the pot. Pots were then fertilized by adding 30 ml of a 0.02 M 15N solution of ammonium-15N chloride (98% 15N; Sigma-Aldrich 299251; St Louis, MO) to yield a target concentration of 2 µg15N cm-3 of soil. After fertilization pots were carefully enclosed within thick plastic bags, and sealed around the senescent leaf rachis to prevent leaching any of 15N from the pot to the surrounding soil. At the time of N fertilization, pinnae of the youngest fully expanded leaf were collected from each fern. One pinna was collected from the base of the leaf and one from the distal end of the leaf. In March 2022, after 28 days the roots were removed from pots and two additional leaf pinnae sampled from each fern: one from the base and one from the distal end of the youngest (now fully expanded) leaf. Leaf samples were dried for 72 hours at 60 C and then leaf lamina tissue finely ground with a bead beater. The delta 15N for each leaf sample determined at the University of Illinois, Urbana-Champaign using a Thermo Delta V Advantage IRMS run in combination with a Costech 4010 Elemental Analyzer. Samples were run in continuous flow relative to laboratory standards that were calibrated with USGS 40, 41, and NBS 19 reference materials.
keywords:
15N; Cyathea rojasiana; N fertilization; montane forest
published: 2023-08-02
Jeng, Amos; Bosch, Nigel; Perry, Michelle (2023): Data for: Phatic Expressions Influence Perceived Helpfulness in Online Peer Help-Giving: A Mixed Methods Study. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6591732_V1
This dataset was developed as part of an online survey study that investigates how phatic expressions—comments that are social rather than informative in nature—influence the perceived helpfulness of online peer help-giving replies in an asynchronous college course discussion forum. During the study, undergraduate students (N = 320) rated and described the helpfulness of examples of replies to online requests for help, both with and without four types of phatic expressions: greeting/parting tokens, other-oriented comments, self-oriented comments, and neutral comments.
keywords:
help-giving; phatic expression; discussion forum; online learning; engagement