Displaying 101 - 125 of 696 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2023-11-14
 
This repository contains the training dataset associated with the 2023 Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics (DGM-Image Challenge), hosted by the American Association of Physicists in Medicine. This dataset contains more than 100,000 8-bit images of size 512x512. These images emulate coronal slices from anthropomorphic breast phantoms adapted from the VICTRE toolchain [1], with assigned X-ray attenuation coefficients relevant for breast computed tomography. Also included are the labels indicating the breast type. The challenge has now concluded. More information about the challenge can be found here: <a href="https://www.aapm.org/GrandChallenge/DGM-Image/">https://www.aapm.org/GrandChallenge/DGM-Image/</a>. * New in V3: we added a CSV file containing the image breast type labels and example images (PNG).
keywords: Deep generative models; breast computed tomography
published: 2019-06-13
 
This lexicon is the expanded/enhanced version of the Moral Foundation Dictionary created by Graham and colleagues (Graham et al., 2013). Our Enhanced Morality Lexicon (EML) contains a list of 4,636 morality related words. This lexicon was used in the following paper - please cite this paper if you use this resource in your work. Rezapour, R., Shah, S., & Diesner, J. (2019). Enhancing the measurement of social effects by capturing morality. Proceedings of the 10th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA). Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, MN. In addition, please consider citing the original MFD paper: <a href="https://doi.org/10.1016/B978-0-12-407236-7.00002-4">Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology (Vol. 47, pp. 55-130)</a>.
keywords: lexicon; morality
published: 2024-05-07
 
Photographs and video of two Lesser Chameleons (Furcifer minor) nesting together at the same time near Itremo, Madagascar.
keywords: reproductive biology; ecology; Madagascar; lizard; eggs; reptile
published: 2024-05-07
 
This dataset builds on an existing dataset which captures artists’ demographics who are represented by top tier galleries in the 2016–2017 New York art season (Case-Leal, 2017, https://web.archive.org/web/20170617002654/http://www.havenforthedispossessed.org/) with a census of reviews and catalogs about those exhibitions to assess proportionality of media coverage across race and gender. The readme file explains variables, collection, relationship between the datasets, and an example of how the Case-Leal dataset was transformed. The ArticleDataset.csv provides all articles with citation information as well as artist, artistic identity characteristic, and gallery. The ExhibitionCatalog.csv provides exhibition catalog citation information for each identified artist.
keywords: diversity and inclusion; diversity audit; contemporary art; art exhibitions; art exhibition reviews; exhibition catalogs; magazines; newspapers; demographics
published: 2024-04-19
 
Read me file for the data repository ******************************************************************************* This repository has raw data for the publication "Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process Induced Strain". We arrange the data following the figure in which it first appeared. For all electrical transfer measurement, we provide the up-sweep and down-sweep data, with voltage units in V and conductance unit in S. All Raman modes have unit of cm^-1. ******************************************************************************* How to use this dataset All data in this dataset is stored in binary Numpy array format as .npy file. To read a .npy file: use the Numpy module of the python language, and use np.load() command. Example: suppose the filename is example_data.npy. To load it into a python program, open a Jupyter notebook, or in the python program, run: import numpy as np data = np.load("example_data.npy") Then the example file is stored in the data object. *******************************************************************************
published: 2024-02-08
 
Photographs and video of the snake Compsophis infralineatus predating upon the chameleons Calumma crypticum and Calumma gastrotaenia near Mandraka, Madagascar.
keywords: predation; reptile; diet
published: 2024-01-01
 
These data were used to make a predictive model of when ornate box turtles (Terrapene ornata) are likely to be above ground and at risk from fire. The data were generated using shell temperatures, soil temperatures at 0.35 m deep from known overwintering sites, and the spring and fall soil temperature inversion dates during 2019–2022 to infer if 26 individual radio-tracked turtles were above or below ground at three sites in Illinois.
keywords: turtle; conservation; controlled burn; fire management; ectotherm; hibernation; brumation; reptile
published: 2024-01-30
 
This data set includes the cochlear implant (CI) electrodograms recorded in 2 different acoustic conditions using acoustic head KEMAR. It is a part of a study intended to explore the effect of interaural asymmetry on interaural coherence after CI processing.
keywords: cochlear implant; electrodogram; KEMAR; interaural coherence
published: 2024-01-31
 
This dataset contains: field study design parameters, plant performance metrics, and nitrogen cycling rates associated with a field experiment that compared nitrification rates between maize lines with and without nitrification inhibition loci nitrogen fixation rates with with and without a nitrogen fixing inoculant product. The overarching goal was to evaluate nitrogen fixation by a diazotroph inoculant and retention of nitrogen in the rhizosphere via a novel nitrification inhibition phenotype of maize.
keywords: maize; microbiome; nitrogen cycling; nitrification; nitrogen fixation
published: 2024-03-06
 
These data are the result of analyses of the metagenome of North American bats, including 18s and 16s barcode genes designed to target microorganisms of the gut. These files are Phyloseq import files created by the DADA2 program. Each barcode gene is uploaded separately as the four files required to build a phyloseq object. For each barcode gene, the files include amplicon sequence variant (ASV) sequences, sequence tables (seqtab) which connect individual samples to the ASVs, tax tables (taxtab) which identify the taxa present as determined by a Bayesian RDP classifier, and rooted phylogenetic trees for the ASVs. Additionally, we have included a "sample_data" file which is necessary for sorting of samples across all four sequence analysis data sets by study and species. Some sample information which could identify the location of endangered species has been restricted. Multiple studies are represented in the data which can be accessed using standard methods in the Phyloseq program (e.g. For a study of bats, parasites, and gut microbiome dysregulation by Bennett, Suski, and OKeefe 2024 [in prep March 2024], study specific data can be accessed using the Study variable "DYSBIOMICS." File names include reference to the primer set used to generate them (18s primer sets: G3, G4, G6; 16s primer set: 341F3_806R5).
keywords: metagenomics
published: 2023-10-22
 
HGT+ILS datasets from Davidson, R., Vachaspati, P., Mirarab, S., & Warnow, T. (2015). Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC genomics, 16(10), 1-12. Contains model species trees, true and estimated gene trees, and simulated alignments.
keywords: evolution; computational biology; bioinformatics; phylogenetics
published: 2023-08-03
 
This file contains the delta 15N values for leaf material collected from Cyathea rojasiana tree ferns before and after fertilization using ammonium -15N chloride solution to determine whether 15N update is possible from senescent leaves. Details of the experiment are provided in the online supplement to the published paper. Briefly, In February 2022 we selected three mature C. rojasiana individuals 1-1.5m in height that had leaves rooted in the soil and one new developing (but unexpanded) leaf. For each fern, two plastic pots (10 x 10 x 12 cm) were filled with a 50:50 mixture of washed river sand and soil from the Chorro watershed. For each pot, one senescent leaf that was rooted in the soil was carefully excavated and its roots transplanted into the pot. Pots were then fertilized by adding 30 ml of a 0.02 M 15N solution of ammonium-15N chloride (98% 15N; Sigma-Aldrich 299251; St Louis, MO) to yield a target concentration of 2 µg15N cm-3 of soil. After fertilization pots were carefully enclosed within thick plastic bags, and sealed around the senescent leaf rachis to prevent leaching any of 15N from the pot to the surrounding soil. At the time of N fertilization, pinnae of the youngest fully expanded leaf were collected from each fern. One pinna was collected from the base of the leaf and one from the distal end of the leaf. In March 2022, after 28 days the roots were removed from pots and two additional leaf pinnae sampled from each fern: one from the base and one from the distal end of the youngest (now fully expanded) leaf. Leaf samples were dried for 72 hours at 60 C and then leaf lamina tissue finely ground with a bead beater. The delta 15N for each leaf sample determined at the University of Illinois, Urbana-Champaign using a Thermo Delta V Advantage IRMS run in combination with a Costech 4010 Elemental Analyzer. Samples were run in continuous flow relative to laboratory standards that were calibrated with USGS 40, 41, and NBS 19 reference materials.
keywords: 15N; Cyathea rojasiana; N fertilization; montane forest
planned publication date: 2025-01-23
 
These are the responses to an open, convenience sample survey of residents of Illinois to understand their interactions with wild deer. The survey was available on REDCap between December 19, 2022 and December 19, 2023, and was publicized through listserves, Facebook groups, and media reporting. The file "COVID Deer Survey _ REDCap.pdf" contains the codebook for the survey, including the questions; all factor variables have ".factor" added to their name in the dataset. The file "DeerSurveyData.csv" contains the dataset. The file "Score_calculation_for_sharing.R" is the code to create the cleaned dataset used for analysis from the raw survey responses. Throughout, NA is used to represent null/not available/not applicable; this is most likely either a failure to answer the question or, in some cases, a question that was not presented as it is not relevant based on answers to previous questions.
keywords: deer; survey
planned publication date: 2025-04-24
 
These are the datasets underlying the figures in the manuscript "Methods of active surveillance for hard ticks and associated tick-borne pathogens of public health importance in the contiguous United States: A Comprehensive Systematic Review". The review considered only publications reporting on active tick or tick-borne pathogen surveillance in the contiguous United States published between 1944 and 2018. For the purposes of this review, we were only concerned with studies of Ixodidae (hard ticks) and/or studies of tick-borne pathogens (in humans, animals, or hard ticks) of public health importance to humans. Study designs included cross-sectional, serological, epidemiological, ecological, or observational studies. Only peer-reviewed publications published in the English language were included. Studies were excluded if they focused on a tick that is not a vector of a human pathogen or on a pathogen that does not cause disease in humans, if the tick or tick-borne pathogen findings were incidental, or if they did not include quantitative surveillance data. For the purpose of this study, we defined surveillance data as information on ticks or pathogens provided through active sampling in natural areas; it should be noted that this does not match the strict definition used by the CDC, which requires sustained sampling efforts across time. Studies were also excluded if they: explored regions other than the contiguous US; focused on treatment, vaccine, or therapeutics development and/or diagnostics of human disease; focused on tick or pathogen genetics; focused on experimental studies with ticks or hosts; were tick control and/or management studies; performed only passive surveillance; were review articles; were not peer reviewed; were in a language other than English; the full text was not available; and if the disease was not a risk to the general public. In addition, for articles which reported data that had previously been published, we only included previously unreported information collected by the authors, and we referenced the specific period of collection for these data to ensure we were not double-recording data. Due to publication delays, we also performed a non-systematic review of the literature of articles published between 2019 – 2023 on tick and tickborne pathogen surveillance methods conducted in the contiguous United States. Keyword search was performed in PubMed Central and Web of Science Core Collection databases. The search algorithm keywords included tick(s), Amblyomma, Dermacentor, Ixodes, Rhipicephalus, Acari Ixodidea, tick host(s), Lyme disease, Rocky Mountain Spotted Fever, Spotted Fever Group, Rickettsiosis, Ehrlichiosis, Anaplasmosis, Borreliosis, Tularemia, Babesiosis, tick-borne pathogen, Powassan, Heartland, Bourbon, Colorado tick fever, Pacific Coast tick fever, tick surveillance, surveillance, (sero)epidemiology, prevalence, distribution, ecology, United States. The search algorithm utilized is provided as follows: TI= ((ticks OR Ixodes OR Amblyomma OR Dermacentor OR Rhipicephalus OR "Acari Ixodidi" OR "tick hosts" OR "tick host") OR ("Lyme Disease" OR "Rocky Mountain Spotted Fever" OR "Spotted Fever Group" OR Rickettsiosis OR Rickettsial OR Ehrlichiosis OR Anaplasmosis OR Borreliosis OR Tularemia OR Babesiosis OR Borrelia OR Ehrlichia OR Anaplasma OR Rickettsia OR Babesia OR "tick-borne pathogen" OR "tick borne pathogen")) AND TS= ("tick surveillance" OR surveillance OR epidemiology OR seroepidemiology OR ecology) AND CU=("United States of America" OR "USA" OR "United States" OR United-States). These datasets are the collated data underlying the figures in the manuscript. For more details, please see the publication. The following are explanations for variables used in all the CSV files: Tick: Species of tick collected Tick_Method: Method of collecting ticks Pathogen: Species of pathogen tested for Path_Method: Method of testing for pathogens Decade: Decade of publication n: Number of publications STATE: state in which study was conducted COUNTY: county in which study was conducted 1944 - 2018 (Was surveillance performed?): was there at least one publication included with a publication date within the 1944-2018 period in this geographic region? 2019 - 2023 (Was surveillance performed?): was there at least one publication included with a publication date within the 2019-2023 period in this geographic region?
keywords: ticks; systematic review; surveillance
published: 2024-03-25
 
This accompanying study is published under the title "Estimating soil N2O emissions induced by organic and inorganic fertilizer inputs using a Tier-2, regression-based meta-analytic approach for U.S. agricultural lands" at Science of the Total Environment. The study is authored by Dr. Yushu Xia, Dr. Hoyoung Kwon, and Dr. Michelle Wander. The DOI for this study is <a href="https://doi.org/10.1016/j.scitotenv.2024.171930">https://doi.org/10.1016/j.scitotenv.2024.171930</a>.
keywords: soil; nitrous oxide; agriculture; fertilizers; meta-analysis
published: 2019-02-19
 
The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.
keywords: databases; research infrastructure; sustainability; data sharing; molecular biology; bioinformatics; bibliometrics
published: 2019-03-22
 
This data publication provides example video clips related to research on association among flight ability of juvenile songbirds at fledging and juvenile morphological traits (wing emergence, wing length, body condition, mass, and tarsus length. File names reflect the species dropped in each video. These videos are supplemental material for scientific publications by the authors and reflect an example subset of all videos collected form 2017-2018 as part of a larger study on the post-fledging ecology of grassland and shrubland birds in east-Central Illinois, USA. No birds were harmed/injured in the production of these videos and procedures were approved by the Illinois Institutional Animal Care and Use Committee (IACUC), protocol no. 18221. Individuals depicted in the videos have given consent for the videos to be shared (talent/model release form; <a href="https://publicaffairs.illinois.edu/resources/release/">https://publicaffairs.illinois.edu/resources/release/</a>)
keywords: songbirds; flight ability; wing development; wing length; wing emergence; nestling development; post-fledging
published: 2022-10-13
 
The text file contains the original DNA nucleotide sequence data used in the phylogenetic analyses of Xue et al. (in review), comprising the 13 protein-coding genes and 2 ribosomal gene subunits of the mitochondrial genome. The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 30 taxa (species) and 13078 characters, indicate that the characters are DNA sequence, that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes (version 3.2.6) beginning near the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the Methods section of the submitted manuscript. Two supplementary tables in the provided PDF file provide additional information on the species in the dataset, including the GenBank accession numbers for the sequence data (Table S1) and the DNA substitution models used for each of the individual mitochondrial genes and for different codon positions of the protein-coding genes used for analyses in the programs MrBayes and IQ-Tree (version 1.6.8) (Table S2). Full citations for references listed in Table S1 can be found by searching GenBank using the corresponding accession number. The supplemental tables will also be linked to the article upon publication at the journal website.
keywords: Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published: 2023-07-10
 
Bee movement between habitat patches in a naturally fragmented ecosystem depended on species, patch, and matrix variables. Using a mark-recapture methodology in the naturally fragmented Ozark glade ecosystem, we assessed the importance of bee size, nesting biology, the distance between patches (e.g., isolation), and nesting and floral resources in habitat patches and the surrounding matrix on bee movement. This dataset includes seven data files, three R code files, and a QGIS tool. Three of the data files include information collected at the study sites with regard to bees and matrix and patch characteristics. The other four data files are spatial files used to quantify the characteristics of the forest canopy between the study sites and the edge-to-edge distances between the study sites. R code in the R Markdown file recreates the analysis and data presentation for the associated publication. R script files contain processes for calculating some of the explanatory variables used in the analysis. The QGIS tool can be used as the first step to obtaining average values from a raster file where the cells are large relative to the areas of interest (AOI) that you would like to characterize. The second step is contained in one of the aforementioned R scripts. Detected effects included: Larger bees were more likely to move between patches. Bee movement was less likely as the distance between patches increased. However, relatively short distances (~50 m) inhibited movement more than our a priori expectations. Bees were unlikely to move away from home patches with abundant and diverse floral and below-ground nesting resources. When home patches were less resource-rich, bee movement depended on the characteristics of the away patch or the matrix. In these cases, bees were more likely to move to away patches with greater below-ground nesting and floral resources. Matrix habitats with more available floral and below-ground nesting resources appear to impede movement to neighboring patches, potentially because they already provide supplemental resources for bees.
keywords: habitat fragmentation; bees; movement; mark-recapture; nesting resources; floral resources; isolation
published: 2019-05-16
 
This repository includes scripts and datasets for the paper, "Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge." All data files in this repository are for analyses using the logdet distance matrix computed on the concatenated alignment. Data files for analyses using the average gene-tree internode distance matrix can be downloaded from the Illinois Data Bank (https://doi.org/10.13012/B2IDB-1424746_V1). The latest version of NJMerge can be downloaded from Github (https://github.com/ekmolloy/njmerge).<br /> <strong>List of Changes:</strong> &bull; Updated timings for NJMerge pipelines to include the time required to estimate distance matrices; this impacted files in the following folder: <strong>data.zip</strong> &bull; Replaced "Robinson-Foulds" distance with "Symmetric Difference"; this impacted files in the following folders: <strong> tools.zip; data.zip; scripts.zip</strong> &bull; Added some additional information about the java command used to run ASTRAL-III; this impacted files in the following folders: <strong>data.zip; astral64-trees.tar.gz (new)</strong>
keywords: divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
published: 2019-05-31
 
The data are provided to illustrate methods in evaluating systematic transactional data reuse in machine learning. A library account-based recommender system was developed using machine learning processing over transactional data of 383,828 transactions (or check-outs) sourced from a large multi-unit research library. The machine learning process utilized the FP-growth algorithm over the subject metadata associated with physical items that were checked-out together in the library. The purpose of this research is to evaluate the results of systematic transactional data reuse in machine learning. The analysis herein contains a large-scale network visualization of 180,441 subject association rules and corresponding node metrics.
keywords: evaluating machine learning; network science; FP-growth; WEKA; Gephi; personalization; recommender systems
published: 2023-12-20
 
Important Note: the raw transient files need to be downloaded through this separate link: https://uofi.box.com/s/oagdxhea1wi8tvfij4robj0z0w8wq7j4. Once downloaded, place the file within the within the .d folder in the unzipped 20210930_ShortTransient_S3_5 folder to perform reconstruction step. The minimal datasets to run the computational pipeline MEISTER introduced in the manuscript titled "Integrative Multiscale Biochemical Mapping of the Brain via Deep-Learning-Enhanced High-Throughput Mass Spectrometry". The key steps of our computational pipeline include (1) tissue mass spectrometry imaging (MSI) reconstruction; (2) multimodal image registration and 3D reconstruction; (3) regional analysis; and (4) single-cell and tissue data integration. Detailed protocols to reproduce our results in the manuscript are provided with an example data set shared for learning the protocols. Our computational processing codes are implemented mostly in Python as well as MATLAB (for image registration).
keywords: deep learning;mass spectrometry;single cells
published: 2024-02-21
 
Data associated with the manuscript "Niche conservatism and spread explain hybridization and introgression between native and invasive fish" by Jordan H. Hartman, Joel B. Corush, Eric R. Larson, Jeremy S. Tiemann, Philip Willink, and Mark A. Davis. For this project, we combined results of ecological niche models (ENMs) and next-generation restriction site-associated DNA sequencing (RADseq) to test theories of niche conservatism and biotic resistance on the success of invasion, hybridization, and extent of introgression between native Western Banded Killifish and non-native Eastern Banded Killifish. This dataset provides the sampling locations and number of Banded Killifish in each population, accession numbers for RADseq from the National Center for Biotechnology Information Sequence Read Archive and the assignment of each Banded Killifish, the habitat associations of each population from the ENMs, and the occurrence points used to build the ENMs.
keywords: Banded Killifish; ecological niche model; Fundulus diaphanus; hybrid swarm; invasive species; Laurentian Great Lakes
published: 2018-09-06
 
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 to present, with awards continuing through the end of the second XSEDE award in 2021. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
keywords: allocations; cyberinfrastructure; XSEDE
published: 2023-06-29
 
This database provides estimates of agricultural and food commodity flows [in both tons and $US] between the US and China for the year 2017. Pairwise information is provided between US states and Chinese provinces, and US counties and Chinese provinces for 7 Standardized Classification of Transported Goods (SCTG) commodity categories. Additionally, crosswalks are provided to match Harmonized System (HS) codes and China's Multi-Regional Input Output (MRIO) commodity sectors to their corresponding SCTG commodity codes. The included SCTG commodities are: - SCTG 01: Iive animals and fish - SCTG 02: cereal grains - SCTG 03: agricultural products (except for animal feed, cereal grains, and forage products) - SCTG 04: animal feed, eggs, honey, and other products of animal origin - SCTG 05: meat, poultry, fish, seafood, and their preparations - SCTG 06: milled grain products and preparations, and bakery products - SCTG 07: other prepared foodstuffs, fats and oils For additional information, please see the related paper by Pandit et al. (2022) in Environmental Research Letters. ADD DOI WHEN RECEIVED
keywords: Food flows; High-resolution; County-scale; Bilateral; United States; China