Displaying datasets 176 - 200 of 585 in total

Subject Area

Life Sciences (312)
Social Sciences (128)
Physical Sciences (85)
Technology and Engineering (51)
Uncategorized (8)
Arts and Humanities (1)


U.S. National Science Foundation (NSF) (175)
Other (174)
U.S. Department of Energy (DOE) (60)
U.S. National Institutes of Health (NIH) (52)
U.S. Department of Agriculture (USDA) (33)
Illinois Department of Natural Resources (IDNR) (14)
U.S. Geological Survey (USGS) (6)
U.S. National Aeronautics and Space Administration (NASA) (5)
Illinois Department of Transportation (IDOT) (3)
U.S. Army (2)

Publication Year

2021 (108)
2022 (108)
2020 (96)
2019 (72)
2023 (72)
2018 (59)
2017 (35)
2016 (30)
2024 (5)


CC0 (328)
CC BY (240)
custom (17)
published: 2022-04-21
This dataset was created based on the publicly available microdata from PNS-2019, a national health survey conducted by the Instituto Brasileiro de Geografia e Estatistica (IBGE, Brazilian Institute of Geography and Statistics). IBGE is a federal agency responsible for the official collection of statistical information in Brazil – essentially, the Brazilian census bureau. Data on selected variables focusing on biopsychosocial domains related to pain prevalence, limitations and treatment are available. The Fundação Instituto Oswaldo Cruz has detailed information about the PNS, including questionnaires, survey design, and datasets (www.pns.fiocruz.br). The microdata can be found on the IBGE website (https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html?caminho=PNS/2019/Microdados/Dados).
keywords: back pain; health status disparities; biopsychosocial; Brazil
published: 2022-04-20
This is the core data for Zinnen et al., "Functional traits and responses to nutrient and mycorrhizal addition are inconsistently related to wetland plant species’ coefficients of conservatism." This is submitted to Wetlands Ecology and Management. Two datasets are submitted here. The first is greenhouse-collected data of 9 plant traits and concurrent treatment responses of Illinois wetland plant species. The second are field-collected leaf trait data of Illinois wetland plant species. These data are analyzed in the paper. Please refer to the main manuscript to see how these data were produced and specific analyses.
keywords: ecological indicators; Floristic Quality Assessment; Floristic Quality Index; wetland degradation
published: 2022-04-19
This data repository includes the features and the trained backbone parameters used in the ICLR 2022 Paper "On the Importance of Firth Bias Reduction in Few-Shot Classification". The code accompanying this data is open-source and available at https://github.com/ehsansaleh/firth_bias_reduction The code and the data have three modules: 1. The "code_firth" module (10 files) relates to the basic ResNet backbones and logistic classifiers (e.g., Figures 2 and 3 in the main paper). 2. The "code_s2m2rf" module (2 files) relates to the S2M2R feature backbones and cosine classifiers (e.g., Figure 4 in the main paper). 3. The "code_dcf" module (3 files) relates to the few-shot Distribution Calibration (DC) method (e.g., Table 1 in the main paper). The relevant files for each module have the module name as a prefix in their name. 1. For instance, the "code_dcf_features.tar" file should be placed at the "features" directory of the "code_dcf" module. 2. As another example, "code_firth_features_cifarfs_novel.tar" should be placed in the "features" directory of the "code_firth" module, and it includes the features extracted from the novel split of mini-ImageNet dataset. Each tar-ball should be extracted in its relevant directory, and the md5 check-sums of the extracted files are also provided in the open-source code repository for verification. Please note that the actual datasets of images are not included here (since we do not own those datasets). However, helper scripts for automatically downloading the original datasets are also provided in the every module and sub-directory of the GitHub code repository.
keywords: Computer Vision; Few-Shot Classification; Few-Shot Learning; Firth Bias Reduction
published: 2022-04-19
List of differentially expressed genes in human endometrial stromal cells with knockdown of Basigin (BSG) gene expression during decidualization. The BSG siRNA or negative scrambled control siRNA were transfected into human endometrial stromal cells (HESCs) following the protocol of siLentFect™ Lipid (Bio-Rad, Hercules, CA. Following complete knock down of BSG in HESCs (72 hours after adding siRNA), HESCs were treated with medium containing estrogen, progesterone and cAMP to induce decidualization. BSG siRNA and negative control scrambled siRNA were added to the cells every four days (day 0, 4) over the course of the decidualization protocol. Total RNA was harvested at day 6 of the decidualization protocol for microarray analysis. Microarray analysis was performed at the University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center. Briefly, 0.2 micrograms of total RNA were labeled using the Agilent two color QuickAmp labeling kit (Agilent Technologies, Santa Clara, CA) according to the manufacturer’s protocol. The optional spike-in controls were not used. Samples were hybridized to Human Gene Expression 4x44K v2 Microarray (Agilent Technologies, Santa Clara, CA) in an Agilent Hybridization Cassette according to standard protocols. The arrays were then scanned on an Axon GenePix 4000B scanner and the images were quantified using Axon GenePix 6.1. Microarray data pre-processing and statistical analyses were done in R (v3.6.2) using the limma package (3.42.0 (Ritchie et al., 2015). Median foreground and median background values from the 4 arrays were read into R and any spots that had been manually flagged (-100 values) were given a weight of zero. The background values were ignored because investigations showed that trying to use them to adjust for background fluorescence added more noise to the data; background was low and even for all arrays, therefore no background correction was done. The individual Cy5 and Cy3 fluorescence for each array were normalized together using the quantile method 3 (Yang and Thorne, 2003). Agilent's Human Gene Expression 4x44K v2 Microarray has a total of 45,220 probes: 1224 probes for positive controls, 153 negative control, 823 labeled “ignore” and 43,118 labeled “cDNA”. The pos+neg+ignore probes were used to ascertain the background level of fluorescence (6, on the log2 scale) then discarded. The cDNA probes comprise 34,127 unique 60mer probes, of which 999 probes are spotted 10 times each and the rest one time each. We averaged the replicate probes for those spotted 10 times and then fit a mixed model that had treatment and dye as fixed effects and array pairing as a random effect (Phipson et al., 2016; Smyth et al., 2005). After fitting the model but before False Discovery Rate (FDR) correction (Benjamini and Hochberg, 1995), probes were filtered out by the following criteria: 1) did not have at least 4/8 samples with expression values > 6 (14,105 probes removed), 2) no longer had an assigned Entrez Gene ID in Bioconductor’s HsAgilentDesign026652.db annotation package (v3.2.3; 2,152 probes removed) (Huber et al., 2015), 3) mapped to the same Entrez Gene ID as another probe but had a larger p-value for treatment effect (4,141 probes removed). This left 13,729 probes representing 13,729 unique genes. <b>*Please note: that there is a discrepancy between the file and the readme as this plain text is the actual data file of this dataset.</b>
keywords: Basigin; endometrium; decidualization; human
published: 2022-04-11
This data set contains all the map data used for "Quantifying transportation energy vulnerability and its spatial patterns in the United States". The multiple dimensions (i.e., exposure, sensitivity, adaptive capacity) of transportation energy vulnerability (TEV) at the census tract level in the United States, the changes in TEV with electric vehicles adoption, and the detailed data for Chicago, Los Angeles, and New York are in the dataset.
keywords: Transport energy; Vulnerability; Fuel costs; Electric vehicles
published: 2022-03-25
Ground based radar data sets collected during the 2013 NASA EVEX Campaign conducted in Roi-Namur island of the Kwajalein Atoll in the Republic of Marshall Islands are deposited in this databank. Radar data were collected with IRIS VHF and ALTAIR VHF/UHF systems.
published: 2022-03-23
This dataset is a estimation of county-to-county commodity delivery through cold chain in 2017. For each county pair, the weight[kg] and value[$] of the cold chain flow between origin and destination for SCTG 5 and SCTG 7 commodities are estimated by our model. - SCTG 5 - Meat, poultry, fish, seafood, and their preparations - SCTG 7 - Other prepared foodstuffs, fats, and oils
keywords: food flows; cold chain; county-scale; United States; carbon footprint
published: 2022-03-19
Raw arthroscopic scores, histologic scores, cytokine measurements, and performance data for the study cohort described in the accompanying publication.
keywords: horse; metatarsophalangeal joint; arthroscopy; exercise; developmental orthopedic disease
published: 2022-03-11
Data sets relating to the manuscript “Long-term yields in annual and perennial bioenergy crops in the Midwestern USA” published in Global Change Biology Bioenergy. Field data, including annual peak biomass and harvest yields from maize/soy, miscanthus, switchgrass, and prairie field trials from 2008-2018 are included. Peak and harvest biomass for fertilized and unfertilized miscanthus are included from 2014-2018.
keywords: miscanthus; switchgrass; yield; drought; crop; perennial; bioenergy
published: 2022-03-01
The following files were used to reconstruct the phylogeny of the leafhopper subfamily Deltocephalinae, using IQ-TREE v1.6.12 and ASTRAL v 4.10.5. <b>1) taxon_sampling.csv:</b> contains the sequencing ids (1st column) and the taxonomic information (2nd column) of each sample. Sequencing ids were used in the alignment files and partition files. <b>2)concatenated_nt.phy:</b> concatenated nucleotide alignment used for the maximum likelihood analysis of Deltocephalinae by IQ-TREE v1.6.12. The file lists the sequences of 163,365 nucleotide positions from 429 genes in 730 samples. Hyphens are used to represent gaps. <b>3) concatenated_nt_partition.nex:</b> the partitions for the concatenated nucleotide alignment. The file partitions the 163,365 nucleotide characters into 429 character sets, and defines the best substitution model for each character set. <b>4) concatenated_aa.phy:</b> concatenated amino acid alignment used for the maximum likelihood analysis of Deltocephalinae by IQ-TREE v1.6.12. The file gives the sequences of 53,969 amino acids from 429 genes in 730 samples. Hyphens are used to represent gaps. <b>5) concatenated_aa_partition.nex:</b> the partitions for the concatenated amino acid alignment. The file partitions the 53,969 characters into 429 character sets, and defines the best substitution model for each character set. <b>6) concatenated_nt_106taxa.phy:</b> a reduced concatenated nucleotide alignment representing 107 samples x 86 genes. This alignment is used to estimate the divergence times of Deltocephalinae using MCMCTree in PAML v4.9. The file lists the sequences of 79,239 nucleotide positions from 86 genes in 107 samples. Hyphens are used to represent gaps. <b>7) concatenated_nt_106taxa_partition.nex:</b> the partitions for the nucleotide alignment concatenated_nt_106taxa.phy. The file partitions the 79,239 nucleotide characters into 86 character sets, and defines the best substitution model for each character set. <b>8) individual_gene_alignment.zip:</b> contains 429 FAS files, one for each of the partitioned nucleotide character sets in the concatenated_nt_partition.nex file. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5.
published: 2022-02-20
This dataset contains the files used to perform the work savings and recall evaluation in the study titled "Data from Testing a filtering strategy for systematic reviews: Evaluating work savings and recall."
keywords: systematic reviews; machine learning; work savings; recall; search results filtering
published: 2022-02-14
Dataset associated with Allen et al. (In Review): Food caching by a solitary large carnivore supports optimal foraging theory If using this dataset, please cite this manuscript.
published: 2022-02-14
This dataset contains simulation results from numerical model PartMC-MOSAIC used in the article "Quantifying the effects of mixing state on aerosol optical properties". This article is submitted to the journal Atmospheric Physics and Chemistry. There are total 100 scenario directories in this dataset, denoted from 00-99. Each scenario contains 25 NetCDF files hourly output from PartMC-MOSAIC simulations containing the simulated gas and particle information. The data was produced using version 2.5.0 of PartMC-MOSAIC. Instructions to compile and run PartMC-MOSAIC are available at https://github.com/compdyn/partmc. The chemistry code MOSAIC is available by request from Rahul.Zaveri@pnl.gov. For more details of reproducing the cases, please contact nriemer@illinois.edu and yuyao3@illinois.edu.
keywords: Aerosol mixing state; Aerosol optical properties; Mie calculation; Black Carbon
published: 2022-02-11
The Culex_Trivellone_etal.fas fasta file contains the original final sequence alignment used in the haplotype analyses of Trivellone et al. (Frontiers in Public Health, under review). The 492 sequences (from specimens of Culex pipiens complex collected in different habitat types using a BG-sentinel traps) were aligned using PASTA v1.8.5 under default settings. The final dataset contains 686 positions of the cytochrome c oxidase subunit I (COI) mitochondrial gene. The data analyses are further described in the cited original paper.
keywords: Culex; Culicidae; COI; mosquito surveillance, species assemblages
published: 2022-02-11
Upon treatment removal, spontaneous and random reactivation of latently infected T cells remains a major barrier toward curing HIV. Due to its stochastic nature, fluctuations in gene expression (or “noise”) can bias HIV reactivation from latency, and conventional drug screens for mean gene expression neglect compounds that modulate noise. Here we present a time-lapse fluorescence microscopy image set obtained from a Jurkat T-cell line, infected with a minimal HIV gene circuit, treated with 1,806 small molecule compounds, and imaged for 48 hours. In addition, the single-cell time-dependent reporter dynamics (single-cell gene expression intensity and noise trajectories) extracted from the image dataset are included. Based on this dataset, a total of 5 latency promoting agents of HIV was found through further experimentation in Lu et al., PNAS 2021 (doi: 10.1073/pnas.2012191118). For a detailed description of the dataset, please refer to the readme file.
keywords: HIV; latency; drug screen; fluorescence microscopy; time-lapse; microscopy; single-cell data; noise; gene expression fluctuation;
published: 2022-02-11
The data contains a list of articles given low score by the RCT Tagger and an error analysis of them, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews". Change made in this V3 is that the data is divided into two parts: - Error Analysis of 44 Low Scoring Articles with MEDLINE RCT Publication Type. - Error Analysis of 244 Low Scoring Articles without MEDLINE RCT Publication Type.
keywords: Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews
published: 2022-02-10
The compiled datasets include plot level observations of energy crops (miscanthus and switchgrass) from recent experimental field trials in the US including dry biomass yield, location, state, region, harvest year, growing season degree days (GDD), winter season heating degree days (HDD), growing season cumulative precipitation, annual nitrogen application rate, age of the pant when harvested, National Commodity Crop Productivity Index (NCCPI) values, and cultivar type (switchgrass) from various published and unpublished sources. The stata codes include estimation procedures for four different specifications, i.e., Model A includes deterministic effect without interaction terms; Model B includes deterministic effect with interaction terms (N2, age2, N × age, GDD2, precip2, N × NCCPI); Model C includes deterministic effect with interaction terms, study, and location random effect; Model D includes deterministic effect with interaction terms, harvest year augmented study, and location random effect.
keywords: Age; Miscanthus; Nitrogen; Switchgrass; Yield; Center for Advanced Bioenergy and Bioproducts Innovation
published: 2022-02-09
The data file contains a list of articles with PMIDs information, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".
keywords: Cochrane reviews; Randomized controlled trials; RCT; Automation; Systematic reviews
published: 2022-02-09
The data file contains a list of articles and their RCT Tagger prediction scores, which were used in a project associated with the manuscript "Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews".
keywords: Cochrane reviews; automation; randomized controlled trial; RCT; systematic reviews
published: 2021-11-19
This is a general description of the datasets included in this upload; details of each dataset can be found in the individual README.txt in each compressed folder. We have: 1. ROSE-HF.tar.gz 2. ROSE-LF.tar.gz HF (high fragmentary): 50% of the sequences are made fragmentary, which have average lengths of 25% of the original lengths with a standard deviation of 60 bp. LF (low fragmentary): 25% of the sequences are made fragmentary, which have average lengths of 50% of the original lengths with a standard deviation of 60 bp. The seven ROSE datasets made fragmentary are: 1000L1, 1000L3, 1000L4, 1000M3, 1000S1, 1000S2 and 1000S4. "ROSE-HF.tar.gz" contains HF versions of the seven ROSE datasets. "ROSE-LF.tar.gz" contains LF versions of the seven ROSE datasets.
keywords: ROSE; simulation; fragmentary
published: 2022-01-30
This dataset contains temperature measurements in four different bat box designs deployed in central Indiana, USA from May to September 2018. Hourly environmental data (temperature, solar radiation, and wind speed) are also included for days and hours sampled. Bat box temperature data were used as inputs in a free program, GNU Octave, to assess design performance with respect to suitability indices for endothermic metabolism and pup development. Scripts are included in the dataset.
keywords: bats;thermal refuge;reproduction;conservation;bat box;microclimate
published: 2022-02-07
This dataset provides estimates of agricultural and food commodity flows [kg] between all county pairs within the United States for the years 2007, 2012, and 2017. The database provides 206.3 million data points, since pairwise information is provided between 3134 counties, for 7 commodity categories, and 3 time periods. The commodity categories correspond to the Standardized Classification of Transported Goods and are: - SCTG 1: Iive animals and fish - SCTG 2: cereal grains - SCTG 3: agricultural products (except for animal feed, cereal grains, and forage products) - SCTG 4: animal feed, eggs, honey, and other products of animal origin - SCTG 5: meat, poultry, fish, seafood, and their preparations - SCTG 6: milled grain products and preparations, and bakery products - SCTG 7: other prepared foodstuffs, fats and oils For additional information, please see the related paper by Karakoc et al. (2022) in Environmental Research Letters.
keywords: food flows; high-resolution; county-scale; time-series; United States
has sharing link
published: 2022-01-31
This dataset contains results from WRF simulations over northern South America. The Orinoco Low-Level Jet (OLLJ) and the Cross-Equatorial Moisture Transport are important circulation structures of the climate of tropical South America. We explore the sensitivity of the OLLJ and cross-equatorial transport to the representation of surface fluxes and turbulence by using two different Land Surface Model (LSM) schemes (Noah and CLM) and three Planetary Boundary Layer (PBL) schemes (YSU, QNSE and MYNN).
keywords: WRF; Orinoco LLJ; preicpitation
published: 2022-01-27
Twenty-two genotypes of C4 species grown under ambient and elevated O3 concentration were studied at the SoyFACE (40°02’N, 88°14’W) in 2019. This dataset contains leaf morphology, photosynthesis and nutrient contents measured at three time points. The results of CO2 response curves are also included.
keywords: C4, O3, photosynthesis