published: 2024-04-10
This dataset provides estimates of total Irrigation Water Use (IWU) by crop, county, water source, and year for the Continental United States. Total irrigation from Surface Water Withdrawals (SWW), total Groundwater Withdrawals (GWW), and nonrenewable Groundwater Depletion (GWD) is provided for 20 crops and crop groups from 2008 to 2020 at the county spatial resolution. In total, there are nearly 2.5 million data points in this dataset (3,142 counties; 13 years; 3 water sources; and 20 crops). This dataset supports the paper by Ruess et al (2024) "Total irrigation by crop in the Continental United States from 2008 to 2020", Scientific Data, doi: 10.1038/s41597-024-03244-w When using, please cite as: Ruess, P.J., Konar, M., Wanders, N., and Bierkens, M.F.P. (2024) Total irrigation by crop in the Continental United States from 2008 to 2020, Scientific Data, doi: 10.1038/s41597-024-03244-w
keywords: water use; irrigation; surface water; groundwater; groundwater depletion; counties; crops; time series
published: 2024-03-19
This dataset contains all material required to produce the figures found within the manuscript submitted to Geoscientific Model Development entitled “Explicit stochastic advection algorithms for the regional scale particle-resolved atmospheric aerosol model WRF-PartMC (v1.0)”. The dataset consists of Python Jupyter notebooks and any applicable WRF-PartMC output. This dataset covers the three numerical examples of the manuscript, 1D advection by a uniform constant wind, a 2D rotational flow and a 3D time-evolving WRF simulated flow.
keywords: Atmospheric chemistry; Atmospheric Science; Particle-resolved modeling; Numerical modeling; Advection;
published: 2024-04-19
Read me file for the data repository ******************************************************************************* This repository has raw data for the publication "Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process Induced Strain". We arrange the data following the figure in which it first appeared. For all electrical transfer measurement, we provide the up-sweep and down-sweep data, with voltage units in V and conductance unit in S. All Raman modes have unit of cm^-1. ******************************************************************************* How to use this dataset All data in this dataset is stored in binary Numpy array format as .npy file. To read a .npy file: use the Numpy module of the python language, and use np.load() command. Example: suppose the filename is example_data.npy. To load it into a python program, open a Jupyter notebook, or in the python program, run: import numpy as np data = np.load("example_data.npy") Then the example file is stored in the data object. *******************************************************************************
published: 2023-10-26
This dataset contains MRI data and Imaris modeling analysis of CLARITY-cleared, immunostained tissue associated with a study that assessed the effects of lipid blends containing various levels of a hydrolyzed fat system on myelin development in healthy neonatal piglets. Data are from thirty-two piglets of mixed sexes across four diet treatment groups and includes a sow-fed reference group. MRI data (presented in Figure 2 of the associated article) consists of volumetric data from Voxel-Based Morphometry analysis in brain grey matter and white matter, as well as mean fractional anisotropy and mean orientation dispersion index data from Tract-Based Spatial Statistics analysis. Imaris data (presented in Figure 3 of the associated article) consists of twenty-one select output measures from 3D modeling analysis of PLP-stained prefrontal cortex tissue. All methods used for collection/generation/processing of data are described in the associated article: Louie AY, Rund LA, Komiyama-Kasai KA, Weisenberger KE, Stanke KL, Larsen RJ, Leyshon BJ, Kuchan MJ, Das T, Steelman AJ. A hydrolyzed lipid blend diet promotes myelination in neonatal piglets in a region and concentration-dependent manner. J Neurosci Res. 2023.
keywords: myelin; dietary lipid; white matter; CLARITY; Imaris; voxel-based morphometry; diffusion tensor imaging
published: 2020-08-22
We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Kubernetes cluster consisting of 15 heterogeneous nodes. The dataset is not sampled and is from selected types of requests in each benchmark, i.e., compose-posts in the social network application, compose-reviews in the media service application, book-rooms in the hotel reservation application, and reserve-tickets in the train ticket booking application. The four microservice applications come from [DeathStarBench](https://github.com/delimitrou/DeathStarBench) and [Train-Ticket](https://github.com/FudanSELab/train-ticket). The performance anomaly injector is from [FIRM](https://gitlab.engr.illinois.edu/DEPEND/firm.git). The dataset was preprocessed from the raw data generated in FIRM's tracing system. The dataset is separated by on which microservice component is the performance anomaly located (as the file name suggests). Each dataset is in CSV format and fields are separated by commas. Each line consists of the tracing ID and the duration (in 10^(-3) ms) of each component. Execution paths are specified in `execution_paths.txt` in each directory.
keywords: Microservices; Tracing; Performance
published: 2022-12-07
The Morrow Plots at the University of Illinois at Urbana-Champaign are the longest-running continuous experimental plots in the Americas. In continuous operation since 1876, the plots were established to explore the impact of crop rotation and soil treatment on corn crop yields. In 2018, The Morrow Plots Data Curation Working Group began to identify, collect and curate the various data records created over the history of the experiment. The resulting data table published here includes planting, treatment and yield data for the Morrow Plots since 1888. Please see the included codebook for a detailed explanation of the data sources and their content. This dataset will be updated as new yield data becomes available. *NOTE: While digitized and accessed through IDEALS, the physical copy of the field notebook: <a href="https://archon.library.illinois.edu/archives/index.php?p=collections/controlcard&id=11846">Morrow Plots Notebook, 1876-1913, 1967</a> is also held at the University of Illinois Archives.
keywords: Corn; Crop Science; Experimental Fields; Crop Yields; Agriculture; Illinois; Morrow Plots
published: 2023-04-06
This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.
keywords: sequence length heterogeneity;indelible;computational biology;multiple sequence alignment
published: 2023-07-01
This is the data used in the paper "Assessment of spatiotemporal flood risk due to compound precipitation extremes across the contiguous United States". Code from the Github repository https://github.com/adtonks/precip_extremes can be used with the data here to reproduce the paper's results. v1.0.0 of the code is also archived at https://doi.org/10.5281/zenodo.8104252 This dataset is derived from NOAA-CIRES-DOE 20th Century Reanalysis V3. The NOAA-CIRES-DOE Twentieth Century Reanalysis Project version 3 used resources of the National Energy Research Scientific Computing Center managed by Lawrence Berkeley National Laboratory which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and used resources of NOAA's Remotely Deployed High Performance Computing Systems.
keywords: spatiotemporal; CONUS; United States; precipitation; extremes; flooding
published: 2023-07-11
The dissertation_demo.zip contains the base code and demonstration purpose for the dissertation: A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning. Each chapter has a demo folder for demonstrating provenance queries or tools. The Airbnb dataset for demonstration and simulation is not included in this demo but is available to access directly from the reference website. Any updates on demonstration and examples can be found online at: https://github.com/nikolausn/dissertation_demo
published: 2023-09-13
This upload contains one additional set of datasets (RNASim10k, ten replicates) used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zipped file has the following structure: 10k |__R0 |__unaln.fas |__true.fas |__true.tre |__R1 ... # Alignment files: 1. `unaln.fas`: all unaligned sequences. 2. `true.fas`: the reference alignment of all sequences. 3. `true.tre`: the reference tree on all sequences. For other datasets that uniquely appeared in EMMA, please refer to the related dataset (which is linked below): Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1
keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published: 2024-03-01
This dataset contains model output from the Community Earth System Model, Version 1 (CESM1; Hurrell et al., 2013) and variables from the European Centre for Medium-Range Weather Forecast (ECMWF) Reanalysis v5 (ERA5; Hersbach et al., 2020). These data were used for analysis in “The location of large-scale soil moisture anomalies affects moisture transport and precipitation over southeastern South America”, published in Geophysical Research Letters. Acknowledgments: This work was supported by NSF Award AGS-1852709. We acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the NSF. We thank Dr. Haiyan Teng for providing guidance on setting up the CESM experiments and offering valuable advice. References: Hersbach H, Bell B, Berrisford P, et al. The ERA5 global reanalysis. Q J R Meteorol Soc. 2020; 146: 1999–2049. https://doi.org/10.1002/qj.3803 Hurrell, J. W., and Coauthors, 2013: The Community Earth System Model: A Framework for Collaborative Research. Bull. Amer. Meteor. Soc., 94, 1339–1360, https://doi.org/10.1175/BAMS-D-12-00121.1
keywords: atmospheric sciences; climate modeling; land-atmosphere interactions; soil moisture; regional atmospheric circulation; southeastern South America
published: 2023-12-20
Important Note: the raw transient files need to be downloaded through this separate link: https://uofi.box.com/s/oagdxhea1wi8tvfij4robj0z0w8wq7j4. Once downloaded, place the file within the within the .d folder in the unzipped 20210930_ShortTransient_S3_5 folder to perform reconstruction step. The minimal datasets to run the computational pipeline MEISTER introduced in the manuscript titled "Integrative Multiscale Biochemical Mapping of the Brain via Deep-Learning-Enhanced High-Throughput Mass Spectrometry". The key steps of our computational pipeline include (1) tissue mass spectrometry imaging (MSI) reconstruction; (2) multimodal image registration and 3D reconstruction; (3) regional analysis; and (4) single-cell and tissue data integration. Detailed protocols to reproduce our results in the manuscript are provided with an example data set shared for learning the protocols. Our computational processing codes are implemented mostly in Python as well as MATLAB (for image registration).
keywords: deep learning;mass spectrometry;single cells
published: 2024-02-26
Traces created using DeathStarBench (https://github.com/delimitrou/DeathStarBench) benchmark of microservice applications with injected failures on containers. Failures consist of disk/CPU/memory failures.
keywords: Murphy;Performance Diagnosis;Microservice;Failures
published: 2024-02-16
This dataset contains five files. (i) open_citations_jan2024_pub_ids.csv.gz, open_citations_jan2024_iid_el.csv.gz, open_citations_jan2024_el.csv.gz, and open_citation_jan2024_pubs.csv.gz represent a conversion of Open Citations to an edge list using integer ids assigned by us. The integer ids can be mapped to omids, pmids, and dois using the open_citation_jan2024_pubs.csv and open_citations_jan2024_pub_ids.scv files. The network consists of 121,052,490 nodes and 1,962,840,983 edges. Code for generating these data can be found https://github.com/chackoge/ERNIE_Plus/tree/master/OpenCitations. (ii) The fifth file, baseline2024.csv.gz, provides information about the metadata of PubMed papers. A 2024 version of PubMed was downloaded using Entrez and parsed into a table restricted to records that contain a pmid, a doi, and has a title and an abstract. A value of 1 in columns indicates that the information exists in metadata and a zero indicates otherwise. Code for generating this data: https://github.com/illinois-or-research-analytics/pubmed_etl
keywords: PubMed
published: 2024-02-29
This dataset consists the 286 publications retrieved from Web of Science and Scopus on July 6, 2023 as citations for (Willoughby et al., 2014): Willoughby, Patrick H., Jansma, Matthew J., & Hoye, Thomas R. (2014). A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042 We added the DOIs of the citing publications into a Zotero collection, which we exported into a .csv file and an .rtf file. Willoughby2014_286citing_publications.csv is a Zotero data export of the citing publications. Willoughby2014_286citing_publications.rtf is a bibliography of the citing publications, using a variation of American Psychological Association style (7th edition) with full names instead of initials.
keywords: scientific publications; arguments; citation contexts; defeasible reasoning; Zotero; Web of Science; Scopus;
published: 2021-01-04
This dataset contains the emulated global multi-model urban climate projections under RCP 8.5 and RCP 4.5 used in the article "Global multi-model projections of local urban climates" (https://www.nature.com/articles/s41558-020-00958-8). Details about this dataset and the local urban climate emulator are described in the article. This dataset documents the monthly mean projections of urban temperatures and urban relative humidity of 26 CMIP5 Earth system models (ESMs) from 2006 to 2100 across the globe. This dataset may be useful for multiple communities regarding urban climate change, impacts, vulnerability, risks, and adaptation applications.
keywords: Urban climate; multi-model climate projections; CMIP; urban warming; heat stress
published: 2020-11-18
This is the dataset that accompanies the paper titled "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network", submitted for peer review in August 2020. Please see the github for the most up-to-date data after the revision process: https://github.com/dopplerchase/Chase_et_al_2021_NN Authors: Randy J. Chase, Stephen W. Nesbitt and Greg M. McFarquhar Corresponding author: Randy J. Chase (randyjc2@illinois.edu) Here we have the data used in the manuscript. Please email me if you have specific questions about units etc. 1) DDA/GMM database of scattering properties: base_df_DDA.csv This is the combined dataset from the following papers: Leinonen & Moisseev, 2015; Leinonen & Szyrmer, 2015; Lu et al., 2016; Kuo et al., 2016; Eriksson et al., 2018. The column names are D: Maximum dimension in meters, M: particle mass in grams kg, sigma_ku: backscatter cross-section at ku in m^2, sigma_ka: backscatter cross-section at ka in m^2, sigma_w: backscatter cross-section at w in m^2. The first column is just an index column. 2) Synthetic Data used to train and test the neural network: Unrimed_simulation_wholespecturm_train_V2.nc, Unrimed_simulation_wholespecturm_test_V2.nc This was the result of combining the PSDs and DDA/GMM particles randomly to build the training and test dataset. 3) Notebook for training the network using the synthetic database and Google Colab (tensorflow): Train_Neural_Network_Chase2020.ipynb This is the notebook used to train the neural network. 4)Trained tensorflow neural network: NN_6by8.h5 This is the hdf5 tensorflow model that resulted from the training. You will need this to run the retrieval. 5) Scalers needed to apply the neural network: scaler_X_V2.pkl, scaler_y_V2.pkl These are the sklearn scalers used in training the neural network. You will need these to scale your data if you wish to run the retrieval. 6) <b>New in this version</b> - Example notebook of how to run the trained neural network on Ku- Ka- band observations. We showed this with the 3rd case in the paper: Run_Chase2021_NN.ipynb 7) <b>New in this version</b> - APR data used to show how to run the neural network retrieval: Chase_2021_NN_APR03Dec2015.nc The data for the analysis on the observations are not provided here because of the size of the radar data. Please see the GHRC website (<a href="https://ghrc.nsstc.nasa.gov/home/">https://ghrc.nsstc.nasa.gov/home/</a>) if you wish to download the radar and in-situ data or contact me. We can coordinate transferring the exact datafiles used. The GPM-DPR data are avail. here: <a href="http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05">http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05</a>
published: 2021-03-17
This dataset was developed as part of a study that assessed data reuse. Through bibliometric analysis, corresponding authors of highly cited papers published in 2015 at the University of Illinois at Urbana-Champaign in nine STEM disciplines were identified and then surveyed to determine if data were generated for their article and their knowledge of reuse by other researchers. Second, the corresponding authors who cited those 2015 articles were identified and surveyed to ascertain whether they reused data from the original article and how that data was obtained. The project goal was to better understand data reuse in practice and to explore if research data from an initial publication was reused in subsequent publications.
keywords: data reuse; data sharing; data management; data services; Scopus API
published: 2021-05-10
This dataset contains the emulated global multi-model urban daily temperature projections under RCP 8.5 scenario. The dataset is derived from the study "Large model structural uncertainty in global projections of urban heat waves" (XXXX). Details about this dataset and the local urban climate emulator are described in the article. This dataset documents the global urban daily temperatures of 17 CMIP5 Earth system models for 2006-2015 and 2061-2070. This dataset may be useful for multiple communities regarding urban climate change, heat waves, impacts, vulnerability, risks, and adaptation applications.
keywords: Urban heat waves; CMIP; urban warming; heat stress; urban climate change
published: 2021-07-15
The dataset contains the high-throughput matrix-assisted laser desorption/ionization mass spectrometry XmL files for the atrial gland and red hemiduct of Aplysia californica.
keywords: Dense-core vesicle; High-throughput; Mass Spectrometry; MALDI; Organelle; Image-Guided; Atrial gland; red hemiduct; Lucent Vesicle
published: 2021-08-24
This repository includes datasets for the paper "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling" accepted for RECOMB2021 and submitted to Journal of Computational Biology. Each zipped file contains a README.
keywords: deep neural networks; heterotachy; GHOST; quartet estimation; phylogeny estimation
published: 2021-12-09
These data were collected in 2018 and 2019 at the University of Illinois Energy Farm (N 40.063607, W 88.206926). During each growing season, bulk and rhizosphere soil were collected from replicate Sorghum bicolor nitrogen use efficiency trial plots at three separate time points (approximately July 1, August 1, and September 1). We measured soil moisture, pH, soil nitrate and ammonium, potential nitrification, potential denitrification, and extracted and sequenced the V4 region of the 16S rRNA gene for microbial community analysis. All microbial sequence data is archived in the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (accession number SRP326979, project number PRJNA741261).
keywords: soil nitrogen; nitrification; nitrogen cycle; sorghum; bioenergy; Center for Advanced Bioenergy and Bioproducts Innovation
published: 2023-01-12
This dataset was developed as part of a study that examined the correlational relationships between local journal authorship, local and external citation counts, full-text downloads, link-resolver clicks, and four global journal impact factor indices within an all-disciplines journal collection of 12,200 titles and six subject subsets at the University of Illinois at Urbana-Champaign (UIUC) Library. While earlier investigations of the relationships between usage (downloads) and citation metrics have been inconclusive, this study shows strong correlations in the all-disciplines set and most subject subsets. The normalized Eigenfactor was the only global impact factor index that correlated highly with local journal metrics. Some of the identified disciplinary variances among the six subject subsets may be explained by the journal publication aspirations of UIUC researchers. The correlations between authorship and local citations in the six specific subject subsets closely match national department or program rankings. All the raw data used in this analysis, in the form of relational database tables with multiple columns. Can be opned using MS Access. Description for variables can be viewed through "Design View" (by right clik on the selected table, choose "Design View"). The 2 PDF files provide an overview of tables are included in each MDB file. In addition, the processing scripts and Pearson correlation code is available at <a href="https://doi.org/10.13012/B2IDB-0931140_V1">https://doi.org/10.13012/B2IDB-0931140_V1</a>.
keywords: Usage and local citation relationships; publication; citation and usage metrics; publication; citation and usage correlation analysis; Pearson correlation analysis
published: 2023-01-12
These processing and Pearson correlational scripts were developed to support the study that examined the correlational relationships between local journal authorship, local and external citation counts, full-text downloads, link-resolver clicks, and four global journal impact factor indices within an all-disciplines journal collection of 12,200 titles and six subject subsets at the University of Illinois at Urbana-Champaign (UIUC) Library. This study shows strong correlations in the all-disciplines set and most subject subsets. Special processing scripts and web site dashboards were created, including Pearson correlational analysis scripts for reading values from relational databases and displaying tabular results. The raw data used in this analysis, in the form of relational database tables with multiple columns, is available at <a href="https://doi.org/10.13012/B2IDB-6810203_V1">https://doi.org/10.13012/B2IDB-6810203_V1</a>.
keywords: Pearson Correlation Analysis Scripts; Journal Publication; Citation and Usage Data; University of Illinois at Urbana-Champaign Scholarly Communication