Displaying datasets 1 - 25 of 456 in total

Subject Area

Life Sciences (251)
Social Sciences (100)
Physical Sciences (65)
Technology and Engineering (37)
Uncategorized (2)
Arts and Humanities (1)

Funder

Other (127)
U.S. National Science Foundation (NSF) (125)
U.S. National Institutes of Health (NIH) (47)
U.S. Department of Energy (DOE) (41)
U.S. Department of Agriculture (USDA) (23)
Illinois Department of Natural Resources (IDNR) (9)
U.S. National Aeronautics and Space Administration (NASA) (5)
U.S. Geological Survey (USGS) (4)
Illinois Department of Transportation (IDOT) (1)
U.S. Army (1)

Publication Year

2021 (109)
2020 (96)
2019 (72)
2018 (59)
2022 (53)
2017 (35)
2016 (30)
2023 (2)

License

CC0 (262)
CC BY (183)
custom (11)
published: 2022-07-01
 
The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (<a href="https://doi.org/10.1093/ije/dyv184">Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184</a>), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below. This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes. We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. <a href="https://github.com/infoqualitylab/Scopus_author_info_collection">https://github.com/infoqualitylab/Scopus_author_info_collection</a> ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports. We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016).
keywords: systematic reviews; evidence synthesis; network analysis; public health; salt controversy;
published: 2022-06-22
 
This dataset helps to investigate the Spatial Accessibility to HIV Testing, Treatment, and Prevention Services in Illinois and Chicago, USA. The main components are: population data, healthcare data, GTFS feeds, and road network data. The core components are: 1) `GTFS` which contains GTFS (<a href="https://gtfs.org/">General Transit Feed Specification</a>) data which is provided by Chicago Transit Authority (CTA) from <a href="https://developers.google.com/transit/gtfs">Google's GTFS feeds</a>. Documentation defines the format and structure of the files that comprise a GTFS dataset: <a href="https://developers.google.com/transit/gtfs/reference?csw=1">https://developers.google.com/transit/gtfs/reference?csw=1</a>. 2) `HealthCare` contains shapefiles describing HIV healthcare providers in Chicago and Illinois respectively. The services come from <a href="https://locator.hiv.gov/">Locator.HIV.gov</a>. 3) `PopData` contains population data for Chicago and Illinois respectively. Data come from The American Community Survey and <a href="https://map.aidsvu.org/map">AIDSVu</a>. AIDSVu (https://map.aidsvu.org/map) provides data on PLWH in Chicago at the census tract level for the year 2017 and in the State of Illinois at the county level for the year 2016. The American Community Survey (ACS) provided the number of people aged 15 to 64 at the census tract level for the year 2017 and at the county level for the year 2016. The ACS provides annually updated information on demographic and socio economic characteristics of people and housing in the U.S. 4) `RoadNetwork` contains the road networks for Chicago and Illinois respectively from <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> using the Python <a href="https://osmnx.readthedocs.io/en/stable/">osmnx</a> package. <b>The abstract for our paper is:</b> Accomplishing the goals outlined in “Ending the HIV (Human Immunodeficiency Virus) Epidemic: A Plan for America Initiative” will require properly estimating and increasing access to HIV testing, treatment, and prevention services. In this research, a computational spatial method for estimating access was applied to measure distance to services from all points of a city or state while considering the size of the population in need for services as well as both driving and public transportation. Specifically, this study employed the enhanced two-step floating catchment area (E2SFCA) method to measure spatial accessibility to HIV testing, treatment (i.e., Ryan White HIV/AIDS program), and prevention (i.e., Pre-Exposure Prophylaxis [PrEP]) services. The method considered the spatial location of MSM (Men Who have Sex with Men), PLWH (People Living with HIV), and the general adult population 15-64 depending on what HIV services the U.S. Centers for Disease Control (CDC) recommends for each group. The study delineated service- and population-specific accessibility maps, demonstrating the method’s utility by analyzing data corresponding to the city of Chicago and the state of Illinois. Findings indicated health disparities in the south and the northwest of Chicago and particular areas in Illinois, as well as unique health disparities for public transportation compared to driving. The methodology details and computer code are shared for use in research and public policy.
keywords: HIV;spatial accessibility;spatial analysis;public transportation;GIS
published: 2022-05-20
 
This dataset includes images and annotated counts for 150 airborne pollen samples from the Center for Tropical Forest Science 50 ha forest dynamics plot on Barro Colorado Island, Panama. Samples were collected once a year from April 1994 to June 2010.
keywords: aerial pollen traps; automated pollen identification; Barro Colorado Island; convolutional neural networks; Neotropics; palynology; phenology
published: 2022-06-20
 
This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.
keywords: sentence-level parallel corpus; optical character recognition; OCR errors; Project Gutenberg; HathiTrust Digital Library; digital libraries; digital humanities;
published: 2022-02-08
 
Matlab codes for the article "Phage-antibiotic synergy inhibited by temperate and chronic virus competition". Code can be used to reproduce the article figures, perform the parameter sensitivity analysis and simulate the model.
keywords: bacterium-phage-antibiotic model; ODEs; Matlab; sensitivity analysis
published: 2021-11-18
 
This dataset contains sequencing data obtained from Illumina MiSeq device to prove the concept of the proposed 2DDNA framework. Please refer to README.txt for detailed description of each file.
keywords: machine learning;image processing;computer vision;rewritable storage system;2D DNA-based data storage
published: 2022-03-09
 
MATLAB files for the analysis of an ODE model for disease transmission. The codes may be used to find equilibrium points, study transient dynamics, evaluate the basic reproductive number (R0), and simulate the model when parameters depend on the independent variables. In addition, the codes may be used to perform local sensitivity analysis of R0 on the model parameters.
published: 2022-03-20
 
Data for "Generic character of charge and spin density waves in superconducting cuprates". - Neutron scattering data for SDW - RSXS scans of CDW of LESCO x=0.10, 0.125, 0.15, 0.17, 0.20 at various temperatures. - Temperature dependence of CDW peak intensity, correlation length, Qcdw (Lorentzian fit, S(q,T) fit, Landau-Ginzburg fit) - XAS data of LESCO x=0.10, 0.125, 0.15, 0.17, 0.20
published: 2022-03-31
 
This dataset contains our bi-hourly temperature recordings from 40 rocket box style artificial roosts of 5 designs deployed in Indiana and Kentucky, USA from April through September 2019. This dataset also includes our endothermic and faculatively heterothermic daily energy expenditure datasets used in our bioenergetic analysis, which were calculated from the bi-hourly rocket box temperature data. Lastly, we include our overheating counts dataset which summarizes daily overheating events (i.e., temperatures > 40 Celsius) in each rocket box style bat box over the course of the study period, these daily summaries were also calculated from the bi-hourly rocket box temperature recordings.
keywords: artificial roost; bat box; microcllimate; temperature
published: 2022-04-15
 
This dataset is provided to support the statements in Kim, H., and R.Y. Makhnenko. 2022. "Evaluation of CO2 sealing potential of heterogeneous Eau Claire shale". Journal of the Geological Society. In geologic carbon dioxide (CO2) storage in deep saline aquifers, buoyant CO2 tends to float upwards in the reservoirs overlaid by low permeable formations called caprocks. Caprocks should serve as barriers to potential CO2 leakage that can happen through a diffusion loss and permeation through faults, fractures, or pore spaces. The leakage through intact caprock would mainly depend on its permeability and CO2 breakthrough pressure, and is affected by the heterogeneities in the material. Here, we study the sealing potential of a caprock from Illinois Basin - Eau Claire shale, with sandy and shaly fractions distinguished via electron microscopy and grain/pore size and surface area characterization. The direct measurements of permeability of sandy shale provides the values ~ 10-15 m2, while clayey specimens are three orders of magnitude less permeable. The CO2 breakthrough pressure under in-situ stress conditions is 0.1 MPa for the sandy shale and 0.4 MPa for the clayey counterpart – these values are higher than those predicted by the porosimetry methods performed on the unconfined specimens. Sandy Eau Claire shale would allow penetration of large CO2 volumes at low overpressures, while the clayey formation can serve as a caprock in the absence of faults and fractures in it.
keywords: Geologic carbon storage; Caprock; Shale; CO2 breakthrough pressure; Porosimetry.
published: 2022-04-29
 
Thank you for using these datasets! These files contain trees and reference alignments, as well as the selected query sequences for testing phylogenetic placement methods against and within the SCAMPP framework. There are four datasets from three different sources, each containing their source alignment and "true" tree, any estimated trees that may have been generated, and any re-estimated branch lengths that were created to be used with their requisite phylogenetic placement method. Three biological datasets (16S.B.ALL, PEWO/LTP_s128_SSU, and PEWO/green85) and one simulated dataset (nt78) is contained. See README.txt in each file for more information.
keywords: Phylogenetic Placement; Phylogenetics; Maximum Likelihood; pplacer; EPA-ng
published: 2022-05-13
 
The files are plain text and contain the original data used in phylogenetic analyses of of Typhlocybinae (Bin, Dietrich, Yu, Meng, Dai and Yang 2022: Ecology & Evolution, in press). The three files with extension .phy are text files with aligned DNA sequences in the standard PHYLIP format and correspond to Matrix 1 (amino acid alignment), Matrix 2 (nucleotide alignment of first two codon positions of protein-coding genes) and Matrix 3 (nucleotide alignment of protein-coding genes plus 2 ribosomal genes) described in the Methods section. An additional text file in NEXUS format (.nex extension) contains the morphological character data used in the ancestral state reconstruction (ASCR) analysis described in the Methods. NEXUS is a standard format used by various phylogenetic analysis software. For more information on data file content, see the included "readme" files.
keywords: Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published: 2022-06-10
 
This dataset contains nucleotide sequences of 16S rRNA gene from phytoplasmas and other bacteria detected in phloem-feeding insects (Hemiptera, Auchenorrhyncha). The datasets were used to compare traditional Sanger sequencing with a next-generation sequencing method, Anchored Hybrid Enrichment (AHE) for detecting and characterizing phytoplasmas in insect DNA samples. The file “Trivellone_etal_SangerSequencing.fas”, comprising 1397 positions (the longest sequence), includes 35 not aligned bacterial 16S rRNA sequences (16 phytoplasmas and 19 other bacterial strains) yielded using Sanger sequencing. The file “Trivellone_etal_AHEmethod1.fas” includes 34 not aligned bacterial 16S rRNA sequences (28 phytoplasmas and 6 other bacterial strains) and it contains 1530 positions (the longest sequence). Each sequence was assembled using assembled based on ABySS v2.1.0 pipeline. The file “Trivellone_etal_AHEmethod2.fas” includes 31 not aligned bacterial 16S rRNA sequences (27 phytoplasmas and 4 other bacterial strains) and it contains 1530 positions (the longest sequence). Each sequence was assembled based on the HybPiper v2.0.1 pipeline . Additional details in the "read_me_trivellone.txt" file attached below.
keywords: anchored hybrid enrichment; biodiversity, biorepository; nested PCR; Sanger sequencing
published: 2021-04-16
 
This dataset includes five files developed using the procedures described in the article 'Developing County-level Data of Nitrogen Fertilizer and Manure Inputs for Corn Production in the United States' and Supplemental Information published in the Journal of Cleaner Production in 2021. Citation: Xia, Yushu, Hoyoung Kwon, and Michelle Wander. "Developing county-level data of nitrogen fertilizer and manure inputs for corn production in the United States." Journal of Cleaner Production 309 (2021): e126957. Brief method: The fertilizer and manure inputs for corn were generated with a top-down approach by assigning county-level total N inputs reported by USGS to different crops using state- and county-level survey data. The corn N needs were estimated using empirical extension-based equations coupled with soil and environmental covariates. The estimates of fertilizer N inputs were further refined for corn grain and silage production at the county level and gap-filling (using state-level averages) was carried out to generate final files for U.S. county-level N inputs. The dataset is provided in an alternative format in Google Earth Engine: https://code.earthengine.google.com/13a0078e7ee727bc001e045ad0e8c6fc
keywords: Corn; Nitrogen Fertilizer; Manure; Conterminous U.S.
has sharing link
 
planned publication date: 2023-06-01
 
An online knowledge, attitudes, and practices survey on ticks and tick-borne diseases was distributed to medical professionals in Illinois during summer 2020 to fall 2021. These are the raw data associated with that survey and the survey questions used. Age, gender, and county of practice have been removed for identifiability. We have added calculated values (columns 165 to end), including: the tick knowledge score, TBD knowledge score, and total knowledge score, which are the sum of the total number of correct answers in each category, and score percent, which are the proportion of correct answers in each category; region, which is determined from the county of practice; TBD relevant practice, which separates the practice variable into TBD primary, secondary, and non-responders; and several variables which group categories.
keywords: ticks; medicine; tick-borne disease; survey
published: 2022-06-01
 
This dataset contain information for the paper "Changes in neuropeptide prohormone genes among Cetartio-dactyla livestock and wild species associated with evolution and domestication" Veterinary Sciences, MDPI. Protein sequences were predicted using GeneWise for 98 neuropeptide prohormone genes from publicly available genomes of 118 Cetartiodactyla species. All predictions (CetartiodactylaSequences2022.zip) were manually verified. Sequences were aligned within each prohormone using MAFFT (MDPImultalign2022.zip includes multiple sequence alignment of all species available for each prohormone). Phylogenetic gene trees were constructed using PhyML and the species tree was constructed using ASTRAL (MDPItree2022.zip). The data is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
keywords: prohormone; neuropeptide; Cetartiodactyla; Cetartiodactyla; phylogenetics; gene tree; species tree
published: 2022-05-26
 
The data files are for the paper entitled: Long-lifetime spin excitations near domain walls in 1T-TaS2 to be published in PNAS. The data was obtained on a 300 mK custom designed Unisoku scanning tunneling microscope using the Nanonis module. All the data files have been named based on the Figure numbers that they represent.
keywords: Mott Insulator; Spins; Charge Density Wave; Domain walls; Long lifetime
published: 2022-05-16
 
This dataset is for the publication "Do Nearctic hover flies (Diptera: Syrphidae) engage in long-distance migration? An assessment of evidence and mechanisms." It consists of 11 Excel spreadsheets and 4 R scripts which correspond to the analyses which were conducted. Paper abstract: Long-distance insect migration is poorly understood despite its tremendous ecological and economic importance. As a group, Nearctic hover flies (Diptera: Syrphidae: Syrphinae), which are crucial pollinators as adults and biological control agents as larvae, are almost entirely unrecognized as migratory despite examples of highly migratory behavior among several Palearctic species. Here, we examined evidence and mechanisms of migration for four hover fly species (Allograpta obliqua, Eupeodes americanus, Syrphus rectus, and Syrphus ribesii) common throughout eastern North America using stable hydrogen isotope (δ2H) measurements of chitinous tissue, morphological assessments, abundance estimations, and cold-tolerance assays. While further studies are needed, non-local isotopic values obtained from hover fly specimens collected in central Illinois support the existence of long-distance fall migratory behavior in Eu. americanus, and to a lesser extent S. ribesii and S. rectus. Elevated abundance of Eu. americanus during the expected autumn migratory period further supports the existence of such behavior. Moreover, high phenotypic plasticity of morphology associated with dispersal coupled with significant differences between local and non-local specimens suggest that Eu. americanus exhibits a unique suite of morphological traits that decrease costs associated with long-distance flight. Finally, compared to the ostensibly non-migratory A. obliqua, Eu. americanus was less cold tolerant, a factor that may be associated with migratory behavior. Collectively, our findings imply that fall migration occurs in Nearctic hover flies, but we consider methodological limitations of our study in addition to potential ecological and economic consequences of these novel findings.
keywords: Insect migration; hover fly; Syrphidae; stable isotopes; deuterium; morphometrics; cold tolerance
published: 2022-05-04
 
This dataset includes data on soil properties, soil N pools, and soil N fluxes presented in the manuscript, "Refining the role of nitrogen mineralization in mycorrhizal nutrient syndromes". Please refer to that publication for details about methodologies used to generate these data and for the experimental design.
keywords: Nitrogen cycling; Ectomycorrhizal fungi; Arbuscular mycorrhizal fungi; Nitrogen fertilization; Gross mineralization
published: 2022-04-26
 
ICoastalDB, which was developed using Microsoft structured query language (SQL) Server, consists of water quality and related data in the Illinois coastal zone that were collected by various organizations. The information in the dataset includes, but is not limited to, sample data type, method of data sampling, location, time and date of sampling and data units.
keywords: Illinois Coastal Zone; Water Quality Data
published: 2022-04-21
 
This dataset was created based on the publicly available microdata from PNS-2019, a national health survey conducted by the Instituto Brasileiro de Geografia e Estatistica (IBGE, Brazilian Institute of Geography and Statistics). IBGE is a federal agency responsible for the official collection of statistical information in Brazil – essentially, the Brazilian census bureau. Data on selected variables focusing on biopsychosocial domains related to pain prevalence, limitations and treatment are available. The Fundação Instituto Oswaldo Cruz has detailed information about the PNS, including questionnaires, survey design, and datasets (www.pns.fiocruz.br). The microdata can be found on the IBGE website (https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html?caminho=PNS/2019/Microdados/Dados).
keywords: back pain; health status disparities; biopsychosocial; Brazil
published: 2022-04-20
 
This is the core data for Zinnen et al., "Functional traits and responses to nutrient and mycorrhizal addition are inconsistently related to wetland plant species’ coefficients of conservatism." This is submitted to Wetlands Ecology and Management. Two datasets are submitted here. The first is greenhouse-collected data of 9 plant traits and concurrent treatment responses of Illinois wetland plant species. The second are field-collected leaf trait data of Illinois wetland plant species. These data are analyzed in the paper. Please refer to the main manuscript to see how these data were produced and specific analyses.
keywords: ecological indicators; Floristic Quality Assessment; Floristic Quality Index; wetland degradation