Displaying 276 - 300 of 707 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2022-04-29
 
Thank you for using these datasets! These files contain trees and reference alignments, as well as the selected query sequences for testing phylogenetic placement methods against and within the SCAMPP framework. There are four datasets from three different sources, each containing their source alignment and "true" tree, any estimated trees that may have been generated, and any re-estimated branch lengths that were created to be used with their requisite phylogenetic placement method. Three biological datasets (16S.B.ALL, PEWO/LTP_s128_SSU, and PEWO/green85) and one simulated dataset (nt78) is contained. See README.txt in each file for more information.
keywords: Phylogenetic Placement; Phylogenetics; Maximum Likelihood; pplacer; EPA-ng
published: 2022-07-08
 
Dataset for "Spatial drivers of wetland bird occupancy within an urbanized matrix in the Upper Midwestern United States" manuscript contains occupancy data for ten wetland bird species used in single-species occupancy models at four spatial scales and four wetland habitat types. Data were collected from 2017-2019 in NE Illinois and NW Indiana. Dataset includes wetland bird occupancy data, habitat parameter values for each survey location, and R code used to run analyses.
keywords: wetland birds; occupancy; emergent wetland; urbanization; Great Lakes region
published: 2022-08-05
 
Simulated sequences provide a way to evaluate multiple sequence alignment (MSA) methods where the ground truth is exactly known. However, the realism of such simulated conditions often comes under question compared to empirical datasets. In particular, simulated data often does not display heterogeneity in the sequence lengths, a common feature in biological datasets. In order to imitate sequence length heterogeneity, we here present a set of data that are evolved under a mixture model of indel lengths, where indels have an occasional chance of being promoted to long indels (emulating large insertion/deletion events, e.g., domain-level gain/loss). This dataset is otherwise (e.g., in GTR parameters) analogous to the 1000M condition as presented in the SATe paper (doi: 10.1126/science.1171243) but with 5000 sequences and simulated with INDELible (http://abacus.gene.ucl.ac.uk/software/indelible/). For more information, see README.txt. For the INDELible control files, see https://github.com/ThisBioLife/5000M-234-het.
keywords: simulated data; sequence length heterogeneity; multiple sequence alignment;
published: 2022-08-22
 
This dataset contains Raman spectra, each acquired from an individual, living, primary murine cell belonging to one of the six most immature hematopoietic cell populations found in the body: hematopoietic stem cell (HSC), mutipotent progenitor 1 (MPP1), multipotent progenitor 2 (MPP2), multipotent progenitor 3 (MPP3), common lymphoid progenitor, common myeloid progenitor (CLP). These spectra are useful for identifying spectral signatures that are characteristic of each hematopoietic stem or early progenitor cell population. *NOTE: __MACOSX folder and files start with “._[file name]” found in "Raman spectra of single cells text files.zip" were created by the computer operation system, in unreadable format, which are not part of the data and can be removed/ignored when using the data.
keywords: Raman spectroscopy; single-cell spectrum; hematopoietic cell; hematopoietic stem cell; multipotent progenitor cell; common myeloid progenitor; common lymphoid progenitor
published: 2023-07-20
 
This is a dataset from a choice experiment survey on family forest landowner preferences for managing invasive species.
keywords: ecosystem services, forests, invasive species control, neighborhood effect
published: 2021-10-04
 
This dataset contains all the necessary information to recreate the study presented in the paper entitled "Learning coagulation processes with combinatorially-invariant neural networks". This consists of (1) the aggregated output files used for machine learning, (2) the machine learning codes used to learn the presented models, (3) the PartMC model source code that was used to generate the simulation data and (4) the Python scripts used construct the scenario library for training and testing simulations. This data was used to investigate a method (combinatorally-invariant neural network) for learning the aerosol process of coagulation. This data may be useful for application of other methods.
keywords: Machine learning; Atmospheric chemistry; Particle-resolved modeling; Coagulation; Atmospheric Science
published: 2022-04-21
 
This dataset was created based on the publicly available microdata from PNS-2019, a national health survey conducted by the Instituto Brasileiro de Geografia e Estatistica (IBGE, Brazilian Institute of Geography and Statistics). IBGE is a federal agency responsible for the official collection of statistical information in Brazil – essentially, the Brazilian census bureau. Data on selected variables focusing on biopsychosocial domains related to pain prevalence, limitations and treatment are available. The Fundação Instituto Oswaldo Cruz has detailed information about the PNS, including questionnaires, survey design, and datasets (www.pns.fiocruz.br). The microdata can be found on the IBGE website (https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html?caminho=PNS/2019/Microdados/Dados).
keywords: back pain; health status disparities; biopsychosocial; Brazil
published: 2022-04-26
 
ICoastalDB, which was developed using Microsoft structured query language (SQL) Server, consists of water quality and related data in the Illinois coastal zone that were collected by various organizations. The information in the dataset includes, but is not limited to, sample data type, method of data sampling, location, time and date of sampling and data units.
keywords: Illinois Coastal Zone; Water Quality Data
published: 2022-05-26
 
The data files are for the paper entitled: Long-lifetime spin excitations near domain walls in 1T-TaS2 to be published in PNAS. The data was obtained on a 300 mK custom designed Unisoku scanning tunneling microscope using the Nanonis module. All the data files have been named based on the Figure numbers that they represent.
keywords: Mott Insulator; Spins; Charge Density Wave; Domain walls; Long lifetime
published: 2023-06-21
 
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the <a href="https://docs.google.com/forms/d/e/1FAIpQLSf-J937V6I4sMSxQt7gR3SIbUASR26KXxqSurrkBvlF-CIQnQ/viewform?usp=pp_url"><b>Archer Access Request Form</b></a> so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the <a href="https://forms.gle/6eA2yJUGFMtj5swY7"><b>Archer User Feedback Form</b></a>. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this <a href="https://groups.webservices.illinois.edu/subscribe/123172"><b>form</b></a> to subscribe to it. <b>Citation Guidelines:</b> 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [codebook], v1.2.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V5 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [database], v1.2.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V5 *NOTE: V4 is suppressed and V5 is replacing V4 with updated ‘Archer’ documents.
published: 2021-04-28
 
An Atlas.ti dataset and accompanying documentation of a thematic analysis of problems and opportunities associated with retracted research and its continued citation.
keywords: Retraction; Citation; Problems and Opportunities
published: 2021-10-24
 
This dataset contains daily and hourly temperature measurements in twenty different bat box designs deployed in central Indiana, USA from May to September 2018. Daily and hourly environmental data (temperature, solar radiation, wind speed and direction) are also included for days and hours sampled. Bat box temperature data were reclassified to cool (</= 30°C), permissive (30.1–39.9°C), and stressful (>/= 40°C) categories according to known temperature tolerances of temperate-zone bats.
keywords: bat box; design; environmental variables; microclimate; temperature
published: 2023-03-06
 
This dataset includes mass spectrometry, library screening, and gas chromatography data used for creating a high-throughput screening in metabolic engineering.
keywords: mass spectrometry; gas chromatography
published: 2019-12-10
 
The dataset consists of two types of data: the estimate of land productivity (the maximum productivity, MP) and the estimate of land that has low productivity for any major crops planted in the Contiguous United States and then may be available for growing bioenergy crops (the marginal land, ML). All data items are in GeoTiff format, under the World Geodetic System (WGS) 84 project, and with a resolution of 0.0020810045 degree (~250 m). The MP values are calculated based on machine learning model estimated yields of major crops in the CONUS, and its expected value (MP_mean.tif), and associated uncertainty (MP_IDP.tif). The ML availability data have two versions: a deterministic version and a version with uncertainty. The deterministic MLs are determined as the land pixels with expected MP values falling in the range defined in the following criteria, and the MLs with uncertainty are determined as the probability that the MP value of a land pixel falls in the range defined in the following criteria: Criteria_____Description S1________ Current crop and pasture land with MP <= P50 S2________ Current crop and pasture land with MP <= P25 S3________ S1 + current grass and shrub land with P25 < MP < P50 S4________ S2 + current grass and shrub land with P10 < MP < P25 Economic__ Current crop and pasture land with potential profitability < 0 Here P10, P25 and P50 are the 10th, 25th and 50th percentile of crop MP values
keywords: Land productivity;marginal land;land use
published: 2021-11-05
 
This data set contains survey results from a 2021 survey of University of Illinois University Library patrons who identify as transgender or gender non-conforming conducted as part of the Becoming a Trans Inclusive Library Project to assess the experiences of transgender patrons seeking information and services in the University Library. Survey instruments are available in the IDEALS repository: http://hdl.handle.net/2142/110081.
keywords: transgender awareness; academic library; gender identity awareness; patron experience
published: 2021-08-24
 
This repository includes datasets for the paper "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling" accepted for RECOMB2021 and submitted to Journal of Computational Biology. Each zipped file contains a README.
keywords: deep neural networks; heterotachy; GHOST; quartet estimation; phylogeny estimation
published: 2021-11-04
 
This dataset contains all the data for the results section in the study presented in the paper entitled "Chemistry Across Multiple Phases (CAMP) version 1.0: An integrated multi-phase chemistry mode" submitted to Geoscientific Model Development (GMD). In this paper, two sets of simulations were run to test CAMP with this results included here. This consists of (1) box model inputs and outputs presented in Section 4.2 for modal, binned and particle-resolved simulations to compare the application of identical chemical mechanisms to different aerosol representations and (2) the 3D Eulerian output presented in Section 4.3.
keywords: Atmospheric chemistry; Aerosols and particles; Numerical Modeling
published: 2022-06-10
 
This dataset contains nucleotide sequences of 16S rRNA gene from phytoplasmas and other bacteria detected in phloem-feeding insects (Hemiptera, Auchenorrhyncha). The datasets were used to compare traditional Sanger sequencing with a next-generation sequencing method, Anchored Hybrid Enrichment (AHE) for detecting and characterizing phytoplasmas in insect DNA samples. The file “Trivellone_etal_SangerSequencing.fas”, comprising 1397 positions (the longest sequence), includes 35 not aligned bacterial 16S rRNA sequences (16 phytoplasmas and 19 other bacterial strains) yielded using Sanger sequencing. The file “Trivellone_etal_AHEmethod1.fas” includes 34 not aligned bacterial 16S rRNA sequences (28 phytoplasmas and 6 other bacterial strains) and it contains 1530 positions (the longest sequence). Each sequence was assembled using assembled based on ABySS v2.1.0 pipeline. The file “Trivellone_etal_AHEmethod2.fas” includes 31 not aligned bacterial 16S rRNA sequences (27 phytoplasmas and 4 other bacterial strains) and it contains 1530 positions (the longest sequence). Each sequence was assembled based on the HybPiper v2.0.1 pipeline . Additional details in the "read_me_trivellone.txt" file attached below.
keywords: anchored hybrid enrichment; biodiversity, biorepository; nested PCR; Sanger sequencing
published: 2023-03-04
 
These data represent the raw data from the paper “Evaluating the ability of wetland mitigation banks to replace plant species lost from destroyed wetlands” published in Journal of Applied Ecology in 2023 by Stephen C. Tillman and Jeffrey W. Matthews.
published: 2019-09-01
 
Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014. The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).
keywords: global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series
published: 2022-02-07
 
This dataset provides estimates of agricultural and food commodity flows [kg] between all county pairs within the United States for the years 2007, 2012, and 2017. The database provides 206.3 million data points, since pairwise information is provided between 3134 counties, for 7 commodity categories, and 3 time periods. The commodity categories correspond to the Standardized Classification of Transported Goods and are: - SCTG 1: Iive animals and fish - SCTG 2: cereal grains - SCTG 3: agricultural products (except for animal feed, cereal grains, and forage products) - SCTG 4: animal feed, eggs, honey, and other products of animal origin - SCTG 5: meat, poultry, fish, seafood, and their preparations - SCTG 6: milled grain products and preparations, and bakery products - SCTG 7: other prepared foodstuffs, fats and oils For additional information, please see the related paper by Karakoc et al. (2022) in Environmental Research Letters.
keywords: food flows; high-resolution; county-scale; time-series; United States
published: 2022-02-08
 
Matlab codes for the article "Phage-antibiotic synergy inhibited by temperate and chronic virus competition". Code can be used to reproduce the article figures, perform the parameter sensitivity analysis and simulate the model.
keywords: bacterium-phage-antibiotic model; ODEs; Matlab; sensitivity analysis
published: 2023-04-19
 
Supplemental data sets for the Manuscript entitled " Assembly of wood-inhabiting archaeal, bacterial and fungal communities along a salinity gradient: common taxa are broadly distributed but locally abundant in preferred habitats"
keywords: wood decomposition; aquatic fungi; aquatic bacteria; aquatic archaea; microbial succession; microbial life-history
published: 2022-07-25
 
This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1) and collects entity mentions that were manually determined to be noisy, non-chemical entities.
keywords: synthetic biology; NERC data; chemical mentions, noisy entities