published: 2021-05-10
UAV-based high-resolution multispectral time-series orthophotos utilized to understand the relation between growth dynamics, imagery temporal resolution, and end-of-season biomass productivity of biomass sorghum as bioenergy crop. Sensor utilized is a RedEdge Micasense flown at 40 meters above ground level at the Energy Farm- UIUC in 2019.
keywords: Unmanned aerial vehicles; High throughput phenotyping; Machine learning; Bioenergy crops
published: 2021-07-30
This data comes from a scoping review associated with the project called Reducing the Inadvertent Spread of Retracted Science. The data summarizes the fields that have been explored by existing research on retraction, a list of studies comparing retraction in different fields, and a list of studies focused on retraction of COVID-19 articles.
keywords: retraction; fields; disciplines; research integrity
published: 2021-04-30
This repository includes scripts and datasets for the paper, "Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy" submitted to Bioinformatics.
keywords: BAli-Phy;Bayesian co-estimation;multiple sequence alignment
published: 2021-05-26
Steady-state and dynamic gas exchange data for maize (B73), sugarcane (CP88-1762) and sorghum (Tx430)
keywords: C4 plants; gas exchange
published: 2022-03-23
This dataset is a estimation of county-to-county commodity delivery through cold chain in 2017. For each county pair, the weight[kg] and value[$] of the cold chain flow between origin and destination for SCTG 5 and SCTG 7 commodities are estimated by our model. - SCTG 5 - Meat, poultry, fish, seafood, and their preparations - SCTG 7 - Other prepared foodstuffs, fats, and oils
keywords: food flows; cold chain; county-scale; United States; carbon footprint
published: 2024-01-01
Supplementary data tables for the dissertation "Hybridization dynamics and population genomics of a Manacus hybrid zone." This work focuses on the dynamics of hybridization over time in two species of tropical birds, the golden-collared manakin (Manacus vitellinus) and white-collared manakin (Manacus candei) comparing data from historical museum samples and contemporary wild-caught birds. Table A1 contains the sample metadata for the Manacus Restriction site-associated DNA sequencing dataset used in the dissertation with associated NCBI Biosample Accession numbers, Smithsonian Museum of Natural History number (where applicable), sample IDs, sampling site locations, and sample information of year the sample was taken, age, and sex. Table A6 contains phenotypic measurements of male plumage traits of manakins used in cline analyses to assess hybrid zone movement over time in historical and contemporary datasets, including beard length (mm), epaulet width (mm), tail length (mm), collar color (nm), and belly color (nm). Table A7 contains a summary of male plumage measurements across the hybrid zone. Table C1 contains a list of annotated protein coding genes in candidate regions of interest in Manacus genomes using outlier regions of genomic divergence, linkage disequilibrium, and enrichment of parental private alleles.
keywords: csv; manacus; manakin; genomics; dissertation
published: 2020-08-22
We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Kubernetes cluster consisting of 15 heterogeneous nodes. The dataset is not sampled and is from selected types of requests in each benchmark, i.e., compose-posts in the social network application, compose-reviews in the media service application, book-rooms in the hotel reservation application, and reserve-tickets in the train ticket booking application. The four microservice applications come from [DeathStarBench](https://github.com/delimitrou/DeathStarBench) and [Train-Ticket](https://github.com/FudanSELab/train-ticket). The performance anomaly injector is from [FIRM](https://gitlab.engr.illinois.edu/DEPEND/firm.git). The dataset was preprocessed from the raw data generated in FIRM's tracing system. The dataset is separated by on which microservice component is the performance anomaly located (as the file name suggests). Each dataset is in CSV format and fields are separated by commas. Each line consists of the tracing ID and the duration (in 10^(-3) ms) of each component. Execution paths are specified in `execution_paths.txt` in each directory.
keywords: Microservices; Tracing; Performance
published: 2020-10-16
Video footage of an Eastern Box Turtle (Terrapene carolina carolina) partially predating a Field Sparrow nest (Spizella pusilla) at 0845 h on the 31 of May 2020. Please note that the date on the video footage is incorrect due to user error, but the time is correct.
keywords: nest predation; turtle; songbird; nest camera; Terrapene carolina carolina; Spizella pusilla;
published: 2020-12-30
High-speed X-ray videos of four E. abruptus specimens recorded at the Advanced Photron Source (Argonne National lab) in the Summer of 2018 and corresponding position data of landmarks tracked during the motion. See readme file for more details.
published: 2020-12-31
This dataset contains the amino acid and nucleotide alignments corresponding to the phylogenetic analyses of South et al. 2020 in Systematic Entomology. This dataset also includes the gene trees that were used as input for coalescent analysis in ASTRAL.
keywords: Plecoptera; stoneflies; phylogeny; insects
published: 2020-11-25
Video recorded by Louise Barker using a Cannon Powershot camera documents late-season combat behavior in Agkistrodon contortrix. Recorded in Beaufort County, North Carolina, 11.1 km SE of downtown Washington on 21 October 2020.
keywords: Agkistrodon contortrix; combat; mating; reproduction; copperhead; pit viper; Viperidae;
published: 2020-12-15
The dataset consists of results and various input data that are used in the GAMS model for the publication "Repeal of the Clean Power Plan: Social Cost and Distributional Implications". All the data are either excel files or in the .inc format which can be read within GAMS or Notepad. Main data sources include: agriculture, transportation and electricity data. Model details can be found in the paper and the GAMS model package.
keywords: carbon abatement; welfare cost; electricity sector; partial equilibrium model
published: 2021-01-23
Data sets from "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss." It contains data simulated with gene duplication and loss under a variety of different conditions.
keywords: gene duplication and loss; species-tree inference;
published: 2021-06-16
Thank you for using these datasets. These RNAsim aligned fragmentary sequences were generated from the query sequences selected by Balaban et al. (2019) in their variable-size datasets (https://doi.org/10.5061/dryad.78nf7dq). They were created for use for phylogenetic placement with the multiple sequence alignments and backbone trees provided by Balaban et al. (2019). The file structures included here also correspond with the data Balaban et al. (2020) provided. This includes: Directories for five varying backbone tree sizes, shown as 5000, 10000, 50000, 100000, and 200000. These directory names are also used by Balaban et al. (2019), and indicate the size of the backbone tree included in their data. Subdirectories for each replicate from the backbone tree size labelled 0 through 4. For the smaller four backbone tree sizes there are five replicates, and for the largest there is one replicate. Each replicate contains 200 text files with one aligned query sequence fragment in fasta format.
keywords: Fragmentary Sequences; RNAsim
published: 2019-10-23
Raw MD simulation trajectory, input and configuration files, SEM current data, and experimental raw data accompanying the publication, "Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore". README.md contains a description of all associated files.
keywords: molecular dynamics; protein sequencing; aerolysin; nanopore sequencing
published: 2019-10-05
This dataset contains collected and aggregated network information from NCSA’s Blue Waters system, which is comprised of 27,648 nodes connected via Cray Gemini* 3D torus (dimension 24x24x24) interconnect, from Jan/01/2017 to May/31/2017. Network performance counters for links are exposed via Cray's gpcdr (<a href="https://github.com/ovis-hpc/ovis/wiki/gpcdr-kernel-module">https://github.com/ovis-hpc/ovis/wiki/gpcdr-kernel-module</a>) kernel module. Lightweight Distributed Metric Service ([LDMS](<a href="https://github.com/ovis-hpc/ovis">https://github.com/ovis-hpc/ovis</a>)) is used to sampled the performance counters at 60 second intervals. Please read "README.md" file. <b>Acknowledgement:</b> This dataset is collected as a part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
keywords: HPC; Interconnect; Network; Congestion; Blue Waters; Dataset
published: 2021-11-19
This is a general description of the datasets included in this upload; details of each dataset can be found in the individual README.txt in each compressed folder. We have: 1. ROSE-HF.tar.gz 2. ROSE-LF.tar.gz HF (high fragmentary): 50% of the sequences are made fragmentary, which have average lengths of 25% of the original lengths with a standard deviation of 60 bp. LF (low fragmentary): 25% of the sequences are made fragmentary, which have average lengths of 50% of the original lengths with a standard deviation of 60 bp. The seven ROSE datasets made fragmentary are: 1000L1, 1000L3, 1000L4, 1000M3, 1000S1, 1000S2 and 1000S4. "ROSE-HF.tar.gz" contains HF versions of the seven ROSE datasets. "ROSE-LF.tar.gz" contains LF versions of the seven ROSE datasets.
keywords: ROSE; simulation; fragmentary
published: 2022-03-20
Data for "Generic character of charge and spin density waves in superconducting cuprates". - Neutron scattering data for SDW - RSXS scans of CDW of LESCO x=0.10, 0.125, 0.15, 0.17, 0.20 at various temperatures. - Temperature dependence of CDW peak intensity, correlation length, Qcdw (Lorentzian fit, S(q,T) fit, Landau-Ginzburg fit) - XAS data of LESCO x=0.10, 0.125, 0.15, 0.17, 0.20
published: 2020-09-18
Restriction site-associated DNA sequencing (RAD-seq) data from 643 Miscanthus accessions from a diversity panel, including 613 Miscanthus sacchariflorus, three M. sinensis, and 27 M. xgiganteus. DNA was digested with PstI and MspI, and single-end Illumina sequencing was performed adjacent to the PstI site. Variant and genotype calling was performed with TASSEL-GBSv2, using the Miscanthus sinensis v7.1 reference genome from Phytozome 12 (https://phytozome.jgi.doe.gov). Additional ploidy-aware genotype calling was performed by polyRAD v1.1.
keywords: variant call format (VCF); genotyping-by-sequencing (GBS); single nucleotide polymorphism (SNP); grass; genetic diversity; biomass
published: 2020-08-01
The Empoascini_morph_data.nex text file contains the original data used in the phylogenetic analyses of Xu et al. (Systematic Entomology, in review). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first nine lines of the file indicate the file type (Nexus), that 110 taxa were analyzed, that a total of 99 characters were analyzed, the format of the data, and specification for symbols used in the dataset to indicate different character states. For species that have more than one state for a particular character, the states are enclosed in square brackets. Question marks represent missing data.The pdf file, Appendix1.pdf, is available here and describes the morphological characters and character states that were scored in the dataset. The data analyses are described in the cited original paper.
keywords: Hemiptera; Cicadellidae; morphology; biogeography; evolution
published: 2021-02-28
This dataset contains the RegCM4 simulations used in the article " Implementation of dynamic ageing of carbonaceous aerosols in regional climate model RegCM". This dataset was used to investigate the impact of a new aging parameterisation scheme implemented in a regional climate model RegCM4. The dataset contains two sets of simulations: Expt_fix and Expt_dyn. It consists of the seasonal mean and daily mean values of the variables that were used to create the visualizations of this study. The Expt_fix and Expt_dyn dataset contain 34 and 38 NetCDF files, respectively. The CERES_vs_2expts_new.mat file is the comparison between CERES shortwave downward flux at the surface and same model outputs from two experiments for clear sky and all sky conditions. -------------------------------------------------- The following information about the dataset was generated on 2021-01-08 by SUDIPTA GHOSH <b>GENERAL INFORMATION</b> <i>1. Date of data collection (single date, range, approximate date):</i> 2019-01-01 to 2019-12-31 <i>2. Geographic location of data collection:</i> Urbana-Champaign,Illinois, USA <i>3. Information about funding sources that supported the collection of the data:</i> This work is supported by the MoEFCC under the NCAP-COALESCE project [Grant No. 14/10/2014-CC]. The first author acknowledges DST-INSPIRE fellowship [IF150055] and Fulbright-Kalam Climate Doctoral fellowship. N. R. acknowledges funding from NSF AGS-1254428 and DOE grant DE-SC0019192. Department of Science and Technology, Funds for Improvement of Science and Technology infrastructure in universities and higher educational institutions (DST-FIST) grant (SR/FST/ESII-016/2014) are acknowledged for the computing support. <b>DATA & FILE OVERVIEW</b> <i>1. File List:</i> Expt_fix and Expt_dyn datasets contain the analysed seasonal means and daily means of the variables that have been used to create the visualizations of this study. Each of the Expt_fix and Expt_dyn datasets contains 34 and 38 NetCDF files, respectively. <i>2. Relationship between files, if important:</i> NA <i>3. Additional related data collected that was not included in the current data package:</i> No <b>METHODOLOGICAL INFORMATION</b> <i>1. Description of methods used for collection/generation of data: </i> The model RegCM4 code is freely available online from <a href="http://gforge.ictp.it/gf/project/regcm/">http://gforge.ictp.it/gf/project/regcm/</a>. The anthropogenic aerosol emissions considered for the simulations are taken from IIASA inventory. The data used can be easily accessed online <a href="http://clima-dods.ictp.it/regcm4/">http://clima-dods.ictp.it/regcm4/</a> website. TRMM observed precipitation data can be assessed from <a href="https://giovanni.gsfc.nasa.gov/giovanni/">https://giovanni.gsfc.nasa.gov/giovanni/</a> website. CRU temperature data is available at <a href="https://crudata.uea.ac.uk/cru/data/hrg/">https://crudata.uea.ac.uk/cru/data/hrg/</a>. CERES satellite surface shortwave downward fluxes are available at <a href="https://ceres.larc.nasa.gov/data/">https://ceres.larc.nasa.gov/data/</a> website. Input files for the RegCM4 model are archived in <a href="http://clima-dods.ictp.it/regcm4/">http://clima-dods.ictp.it/regcm4/</a> website. This dataset contains the RegCM4 simulations used in the article " Implementation of dynamic ageing of carbonaceous aerosols in regional climate model RegCM ". Two sets of simulations: Expt_fix and Expt_dyn consists of the output data . This dataset only contains the analysed seasonal mean and daily mean of the variables that have been used to create the visualizations of this study. Each of Expt_fix and Expt_dyn contains 34 and 38 NetCDF files respectively. This dataset was used to investigate the impact of a new aging parameterisation scheme implemented in a regional climate model RegCM4. <i>2. Methods for processing the data:</i> Seasonal Mean and daily average values were extracted from 6-hourly model output. <i>3. Instrument- or software-specific information needed to interpret the data:</i> CDO-1.7.1, Grads-2.0.a9, Matlab2016b <i>4. Standards and calibration information, if appropriate:</i> NA <i>5. Environmental/experimental conditions:</i> NA <i>6. Describe any quality-assurance procedures performed on the data:</i> NA <i>7. People involved with sample collection, processing, analysis and/or submission:</i> Sudipta Ghosh, Nicole Riemer, Graziano Giuliani, Filippo Giorgi, Dilip Ganguly, Sagnik Dey <b>DATA-SPECIFIC INFORMATION FOR: Expt_fix_data.tar.gz</b> <i>1. Number of variables:</i> 29 <i>2. Number of cases/rows:</i> NA <i>3. Variable List:</i> Mass concentration (Kg m-3) of BC, BC_HB, BC_HL, OC, OC_HB, OC_HL; Columnar burden (mg m-2)] of BC, BC_HL, BC_HB, OC; Dry deposition flux (mg m-2 day-1) of BC_HB, BC_HL, OC_HB, OC_HL; Wet deposition flux due washout (mg m-2 day-1) of BC_HB, BC_HL, OC_HB, OC_HL; Wet deposition flux due to rainout (mg m-2 day-1) of BC_HB, BC_HL OC_HB, OC_HL; AOD (unit less), precipitation (Kg m-2 s-1), temperature (K) , v-wind (m s-1), u-wind (m s-1), Surface shortwave downward flux (W m-2), Shortwave radiative forcing at the surface and top of atmosphere (W m-2) <b>DATA-SPECIFIC INFORMATION FOR: Expt_dyn_data.tar.gz</b> <i>1. Number of variables:</i> 30 <i>2. Number of cases/rows:</i> NA <i>3. Variable List:</i> Mass concentration (Kg m-3) of BC, BC_HB, BC_HL, OC, OC_HB, OC_HL; Columnar burden (mg m-2)] of BC, BC_HL, BC_HB, OC; Dry deposition flux (mg m-2 day-1) of BC_HB, BC_HL OC_HB, OC_HL; Wet deposition flux due washout (mg m-2 day-1) of BC_HB, BC_HL OC_HB, OC_HL; Wet deposition flux due to rainout (mg m-2 day-1) of BC_HB, BC_HL OC_HB, OC_HL; AOD (unit less); precipitation (Kg m-2 s-1); temperature (K); v-wind (m s-1); u-wind (m s-1); Surface shortwave downward flux (W m-2); Shortwave radiative forcing at the surface and top of atmosphere (W m-2); ageingscale (s-1) <b>DATA-SPECIFIC INFORMATION FOR: CERES_vs_2expts_new.mat</b> <i>1. Number of variables:</i> 12 <i>2. Number of cases/rows:</i> NA <i>3. Variable List:</i> Surface shortwave downward flux for clear sky (W/m-2) for CERES, Expt_fix, Expt_dyn (for winter JF and monsoon JJAS seasons); Surface shortwave downward flux for all sky conditions (W/m-2) for CERES, Expt_fix, Expt_dyn (for winter JF and monsoon JJAS seasons). <b>NOTE:</b> The following information applies for all three (3) files: <i> Missing data codes:</i> NA <i>Specialized formats or other abbreviations used:</i> NA
keywords: Carbonaceous aerosols; ageing parameterisation scheme; regional climate model; NetCDF
published: 2021-08-05
This geodatabase serves two purposes: 1) to provide State of Illinois agencies with a fast resource for the preparation of maps and figures that require the use of shape or line files from federal agencies, the State of Illinois, or the City of Chicago, and 2) as a start for social scientists interested in exploring how geographic information systems (whether this is data visualization or geographically weighted regression) can bring new meaning to the interpretation of their data. All layer files included are relevant to the State of Illinois. Sources for this geodatabase include the U.S. Census Bureau, U.S. Geological Survey, City of Chicago, Chicago Public Schools, Chicago Transit Authority, Regional Transportation Authority, and Bureau of Transportation Statistics.
keywords: State of Illinois; City of Chicago; Chicago Public Schools; GIS; Statistical tabulation areas; hydrography
published: 2021-03-08
In a set of field studies across four years, the effect of self-shading on photosynthetic performance in lower canopy sorghum leaves was studied at sites in Champaign County, IL. Photosynthetic parameters in upper and lower canopy leaves, carbon assimilation, electron transport, stomatal conductance, and activity of three C4-specific photosynthetic enzymes, were compared within a genetically diverse range of accessions varying widely in canopy architecture and thereby in the degree of self-shading. Accessions with erect leaves and high light transmission through the canopy are henceforth referred to as ‘erectophile’ and those with low leaf erectness, ‘planophile’. In the final year of the study, bundle sheath leakiness in erectophile and planophile accessions was also compared.
keywords: Sorghum; Photosynethic Performance; Leaf Inclination
published: 2019-09-17
Trained models for multi-task multi-dataset learning for text classification as well as sequence tagging in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: <a href="https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py">https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py</a> See <a href="https://github.com/socialmediaie/SocialMediaIE">https://github.com/socialmediaie/SocialMediaIE</a> and <a href="https://socialmediaie.github.io">https://socialmediaie.github.io</a> for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
keywords: twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning; classification; sequence tagging