Displaying datasets 51 - 75 of 510 in total

Subject Area

Life Sciences (276)
Social Sciences (116)
Physical Sciences (71)
Technology and Engineering (44)
Uncategorized (2)
Arts and Humanities (1)

Funder

U.S. National Science Foundation (NSF) (151)
Other (145)
U.S. National Institutes of Health (NIH) (51)
U.S. Department of Energy (DOE) (49)
U.S. Department of Agriculture (USDA) (24)
Illinois Department of Natural Resources (IDNR) (11)
U.S. National Aeronautics and Space Administration (NASA) (5)
U.S. Geological Survey (USGS) (5)
U.S. Army (2)
Illinois Department of Transportation (IDOT) (1)

Publication Year

2021 (109)
2022 (106)
2020 (96)
2019 (72)
2018 (59)
2017 (35)
2016 (30)
2023 (3)

License

CC0 (294)
CC BY (203)
custom (13)
published: 2022-07-25
 
This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4950847_V1) for species entities and represents those that were determined to be species (i.e., were not noisy entities) but for which no corresponding concept could be found in the NCBI taxonomy database.
keywords: synthetic biology; NERC data; species mentions, not found entities
published: 2022-07-25
 
Related to the raw entity mentions (https://doi.org/10.13012/B2IDB-4163883_V1), this dataset represents the effects of the data cleaning process and collates all of the entity mentions which were too ambiguous to successfully link to the ChEBI ontology.
keywords: synthetic biology; NERC data; chemical mentions; ambiguous entities
published: 2022-07-25
 
A set of chemical entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; chemical mentions
published: 2022-07-25
 
A set of cell-line entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; cell-line mentions
published: 2022-07-25
 
This dataset represents the results of manual cleaning and annotation of the entity mentions contained in the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1). Each mention has been consolidated and linked to an identifier for a matching concept from the NCBI's taxonomy database.
keywords: synthetic biology; NERC data; chemical mentions; cleaned data; ChEBI ontology
published: 2022-07-25
 
This dataset is derived from the raw dataset (https://doi.org/10.13012/B2IDB-4163883_V1) and collects entity mentions that were manually determined to be noisy, non-chemical entities.
keywords: synthetic biology; NERC data; chemical mentions, noisy entities
published: 2022-07-25
 
This dataset is derived from the raw entity mention dataset (https://doi.org/10.13012/B2IDB-4163883_V1) for checmical entities and represents those that were determined to be chemicals (i.e., were not noisy entities) but for which no corresponding concept could be found in the ChEBI ontology.
keywords: synthetic biology; NERC data; chemical mentions, not found entities
published: 2022-07-25
 
A set of gene and gene-related entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; gene mentions
published: 2021-05-10
 
This dataset contains data used in publication "Institutional Data Repository Development, a Moving Target" submitted to Code4Lib Journal. It is a tabular data file describing attributes of data files in datasets published in Illinois Data Bank 2016-04-01 to 2021-04-01.
keywords: institutional repository
published: 2022-07-11
 
This dataset was developed as part of an online survey study that explores student characteristics that may predict what one finds helpful in replies to requests for help posted to an online college course discussion forum. 223 college students enrolled in an introductory statistics course were surveyed on their sense of belonging to their course community, as well as how helpful they found 20 examples of replies to requests for help posted to a statistics course discussion forum.
keywords: help-giving; discussion forums; sense of belonging; college student
published: 2022-07-19
 
#### Details of Pseudomonas aeruginosa biofilm dataset #### ----------------*Folder Structure*------------------------------------- This dataset contains peak intensity tables extracted from mass spectrometry imaging (MSI) data using tools, SCiLS and MSI reader. There are 2 folders in "MSI-Data-Paeruginosa-biofilms-UIUC-DP-JVS-July2022.zip", each folder contains 3 sub-folders as listed below. 1. PellicleBiofilms-and-Supernatant [Pellicle biofilms collected from air-liquid interface and spend supernatant medium after 96 h incubation period]: (1) Full-Scan-Data-96h; (2) MSMS-data-from-C7-Quinolones-96h; and (3) MSMS-data-from-C9-Quinolones-96h 2. StaticBiofilms [Static biofilms grown on mucin surface]: (1) Full-Scan-Data; (2) MSMS-data-from-C7-Quinolones; and (3) MSMS-data-from-C9-Quinolones ----------------*File name*---------------------------------------------- Sample information is included in the file names for easy identification and processing. Attributes covered in file names are explained in the example below. *Example file name "Rep1-Stat-FRD1-mPat-48-FS"* ~ Each unit of information is separated by "-" ~Unit 1 - "Rep1" - Biological replicate ( Rep1, Rep2, and Rep3) ~Unit 2 - "Stat" - Sample type (Stat = Static Biofilm, Pel = Pellicle biofilm, Sup = Supernatant) ~Unit 3 - "FRD1" - Strain (FRD1 = Mucoid strain, PAO1C = Non-mucoid strain) ~Unit 4 - "mPat" - Type of mucin surface used (mPat = patterned mucin surface, mUni = uniform mucin surface) ~Unit 5 - "48" - Sample time point (hours = 48, 72, 96) ~Unit 6 - "FS" - Scan type used in MSI (FS = high resolution full-scan, 260 = targeted MS/MS of C7 quinolones (m/z 260), 288 = targeted MS/MS of C9 quinolones (m/z 288)) ----------------*File structure*------------------------------------------ All MSI data has been exported to CSV format. Each CSV files contains information about scan number, Coordinates (x,y,z), m/z values, extraction window (absolute), and corresponding intensities in the form of a matrix. ----------------*End of Information*--------------------------------------
keywords: mass spectrometry imaging (MSI); biofilm; antibiotic resistance; Pseudomonas aeruginosa; quorum sensing; rhamnolipids
published: 2022-06-20
 
This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.
keywords: sentence-level parallel corpus; optical character recognition; OCR errors; Project Gutenberg; HathiTrust Digital Library; digital libraries; digital humanities;
published: 2022-06-22
 
This dataset helps to investigate the Spatial Accessibility to HIV Testing, Treatment, and Prevention Services in Illinois and Chicago, USA. The main components are: population data, healthcare data, GTFS feeds, and road network data. The core components are: 1) `GTFS` which contains GTFS (<a href="https://gtfs.org/">General Transit Feed Specification</a>) data which is provided by Chicago Transit Authority (CTA) from <a href="https://developers.google.com/transit/gtfs">Google's GTFS feeds</a>. Documentation defines the format and structure of the files that comprise a GTFS dataset: <a href="https://developers.google.com/transit/gtfs/reference?csw=1">https://developers.google.com/transit/gtfs/reference?csw=1</a>. 2) `HealthCare` contains shapefiles describing HIV healthcare providers in Chicago and Illinois respectively. The services come from <a href="https://locator.hiv.gov/">Locator.HIV.gov</a>. 3) `PopData` contains population data for Chicago and Illinois respectively. Data come from The American Community Survey and <a href="https://map.aidsvu.org/map">AIDSVu</a>. AIDSVu (https://map.aidsvu.org/map) provides data on PLWH in Chicago at the census tract level for the year 2017 and in the State of Illinois at the county level for the year 2016. The American Community Survey (ACS) provided the number of people aged 15 to 64 at the census tract level for the year 2017 and at the county level for the year 2016. The ACS provides annually updated information on demographic and socio economic characteristics of people and housing in the U.S. 4) `RoadNetwork` contains the road networks for Chicago and Illinois respectively from <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> using the Python <a href="https://osmnx.readthedocs.io/en/stable/">osmnx</a> package. <b>The abstract for our paper is:</b> Accomplishing the goals outlined in “Ending the HIV (Human Immunodeficiency Virus) Epidemic: A Plan for America Initiative” will require properly estimating and increasing access to HIV testing, treatment, and prevention services. In this research, a computational spatial method for estimating access was applied to measure distance to services from all points of a city or state while considering the size of the population in need for services as well as both driving and public transportation. Specifically, this study employed the enhanced two-step floating catchment area (E2SFCA) method to measure spatial accessibility to HIV testing, treatment (i.e., Ryan White HIV/AIDS program), and prevention (i.e., Pre-Exposure Prophylaxis [PrEP]) services. The method considered the spatial location of MSM (Men Who have Sex with Men), PLWH (People Living with HIV), and the general adult population 15-64 depending on what HIV services the U.S. Centers for Disease Control (CDC) recommends for each group. The study delineated service- and population-specific accessibility maps, demonstrating the method’s utility by analyzing data corresponding to the city of Chicago and the state of Illinois. Findings indicated health disparities in the south and the northwest of Chicago and particular areas in Illinois, as well as unique health disparities for public transportation compared to driving. The methodology details and computer code are shared for use in research and public policy.
keywords: HIV;spatial accessibility;spatial analysis;public transportation;GIS
published: 2022-07-08
 
Dataset for "Spatial drivers of wetland bird occupancy within an urbanized matrix in the Upper Midwestern United States" manuscript contains occupancy data for ten wetland bird species used in single-species occupancy models at four spatial scales and four wetland habitat types. Data were collected from 2017-2019 in NE Illinois and NW Indiana. Dataset includes wetland bird occupancy data, habitat parameter values for each survey location, and R code used to run analyses.
keywords: wetland birds; occupancy; emergent wetland; urbanization; Great Lakes region
published: 2022-05-20
 
This dataset includes images and annotated counts for 150 airborne pollen samples from the Center for Tropical Forest Science 50 ha forest dynamics plot on Barro Colorado Island, Panama. Samples were collected once a year from April 1994 to June 2010.
keywords: aerial pollen traps; automated pollen identification; Barro Colorado Island; convolutional neural networks; Neotropics; palynology; phenology
published: 2022-02-08
 
Matlab codes for the article "Phage-antibiotic synergy inhibited by temperate and chronic virus competition". Code can be used to reproduce the article figures, perform the parameter sensitivity analysis and simulate the model.
keywords: bacterium-phage-antibiotic model; ODEs; Matlab; sensitivity analysis
published: 2021-11-18
 
This dataset contains sequencing data obtained from Illumina MiSeq device to prove the concept of the proposed 2DDNA framework. Please refer to README.txt for detailed description of each file.
keywords: machine learning;image processing;computer vision;rewritable storage system;2D DNA-based data storage
published: 2022-03-09
 
MATLAB files for the analysis of an ODE model for disease transmission. The codes may be used to find equilibrium points, study transient dynamics, evaluate the basic reproductive number (R0), and simulate the model when parameters depend on the independent variables. In addition, the codes may be used to perform local sensitivity analysis of R0 on the model parameters.
published: 2022-03-20
 
Data for "Generic character of charge and spin density waves in superconducting cuprates". - Neutron scattering data for SDW - RSXS scans of CDW of LESCO x=0.10, 0.125, 0.15, 0.17, 0.20 at various temperatures. - Temperature dependence of CDW peak intensity, correlation length, Qcdw (Lorentzian fit, S(q,T) fit, Landau-Ginzburg fit) - XAS data of LESCO x=0.10, 0.125, 0.15, 0.17, 0.20
published: 2022-03-31
 
This dataset contains our bi-hourly temperature recordings from 40 rocket box style artificial roosts of 5 designs deployed in Indiana and Kentucky, USA from April through September 2019. This dataset also includes our endothermic and faculatively heterothermic daily energy expenditure datasets used in our bioenergetic analysis, which were calculated from the bi-hourly rocket box temperature data. Lastly, we include our overheating counts dataset which summarizes daily overheating events (i.e., temperatures > 40 Celsius) in each rocket box style bat box over the course of the study period, these daily summaries were also calculated from the bi-hourly rocket box temperature recordings.
keywords: artificial roost; bat box; microcllimate; temperature
published: 2022-04-15
 
This dataset is provided to support the statements in Kim, H., and R.Y. Makhnenko. 2022. "Evaluation of CO2 sealing potential of heterogeneous Eau Claire shale". Journal of the Geological Society. In geologic carbon dioxide (CO2) storage in deep saline aquifers, buoyant CO2 tends to float upwards in the reservoirs overlaid by low permeable formations called caprocks. Caprocks should serve as barriers to potential CO2 leakage that can happen through a diffusion loss and permeation through faults, fractures, or pore spaces. The leakage through intact caprock would mainly depend on its permeability and CO2 breakthrough pressure, and is affected by the heterogeneities in the material. Here, we study the sealing potential of a caprock from Illinois Basin - Eau Claire shale, with sandy and shaly fractions distinguished via electron microscopy and grain/pore size and surface area characterization. The direct measurements of permeability of sandy shale provides the values ~ 10-15 m2, while clayey specimens are three orders of magnitude less permeable. The CO2 breakthrough pressure under in-situ stress conditions is 0.1 MPa for the sandy shale and 0.4 MPa for the clayey counterpart – these values are higher than those predicted by the porosimetry methods performed on the unconfined specimens. Sandy Eau Claire shale would allow penetration of large CO2 volumes at low overpressures, while the clayey formation can serve as a caprock in the absence of faults and fractures in it.
keywords: Geologic carbon storage; Caprock; Shale; CO2 breakthrough pressure; Porosimetry.
published: 2022-04-29
 
Thank you for using these datasets! These files contain trees and reference alignments, as well as the selected query sequences for testing phylogenetic placement methods against and within the SCAMPP framework. There are four datasets from three different sources, each containing their source alignment and "true" tree, any estimated trees that may have been generated, and any re-estimated branch lengths that were created to be used with their requisite phylogenetic placement method. Three biological datasets (16S.B.ALL, PEWO/LTP_s128_SSU, and PEWO/green85) and one simulated dataset (nt78) is contained. See README.txt in each file for more information.
keywords: Phylogenetic Placement; Phylogenetics; Maximum Likelihood; pplacer; EPA-ng
published: 2022-05-13
 
The files are plain text and contain the original data used in phylogenetic analyses of of Typhlocybinae (Bin, Dietrich, Yu, Meng, Dai and Yang 2022: Ecology & Evolution, in press). The three files with extension .phy are text files with aligned DNA sequences in the standard PHYLIP format and correspond to Matrix 1 (amino acid alignment), Matrix 2 (nucleotide alignment of first two codon positions of protein-coding genes) and Matrix 3 (nucleotide alignment of protein-coding genes plus 2 ribosomal genes) described in the Methods section. An additional text file in NEXUS format (.nex extension) contains the morphological character data used in the ancestral state reconstruction (ASCR) analysis described in the Methods. NEXUS is a standard format used by various phylogenetic analysis software. For more information on data file content, see the included "readme" files.
keywords: Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper