Displaying 26 - 50 of 68 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2022-08-08
 
This upload contains all datasets used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zip file has the following structure (presented as an example): salma_paper_datasets/ |_README.md |_10aa/ |_crw/ |_homfam/ |_aat/ | |_... |_... |_het/ |_5000M2-het/ | |_... |_5000M3-het/ ... |_rec_res/ Generally, the structure can be viewed as: [category]/[dataset]/[replicate]/[alignment files] # Categories: 1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate. 2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM). 3. homfam: There are the 10 largest Homfam datasets, each with one replicate. 4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates. 5. rec\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper. # Alignment files There are at most 6 `.fasta` files in each sub-directory: 1. `all.unaln.fasta`: All unaligned sequences. 2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included. 3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences). 4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included. 5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences). 6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included. >If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing. >If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing. >If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing. # Additional file(s) 1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.
keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published: 2015-12-16
 
This dataset contains the data for PASTA and UPP. PASTA data was used in the following articles: Mirarab, Siavash, Nam Nguyen, Sheng Guo, Li-San Wang, Junhyong Kim, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.” Journal of Computational Biology 22, no. 5 (2015): 377–86. doi:10.1089/cmb.2014.0156. Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment.” Edited by Roded Sharan. Research in Computational Molecular Biology, 2014, 177–91. UPP data was used in: Nguyen, Nam-phuong D., Siavash Mirarab, Keerthana Kumar, and Tandy Warnow. “Ultra-Large Alignments Using Phylogeny-Aware Profiles.” Genome Biology 16, no. 1 (December 16, 2015): 124. doi:10.1186/s13059-015-0688-z.
published: 2014-10-29
 
This dataset provides the data for Nguyen, Nam-phuong, et al. "TIPP: taxonomic identification and phylogenetic profiling." Bioinformatics 30.24 (2014): 3548-3555.
published: 2019-02-22
 
This dataset includes measurements taken during the experiments on patterns of alluvial cover over bedrock. The dataset includes an hour worth of timelapse images taken every 10s for eight different experimental conditions. It also includes the instantaneous water surface elevations measured with eTapes at a frequency of 10Hz for each experiment. The 'Read me Data.txt' file explains in more detail the contents of the dataset.
keywords: bedrock; erosion; alluvial; meandering; alluvial cover; sinuosity; flume; experiments; abrasion;
published: 2024-02-16
 
Sample data from one typical phantom test and one deidentified shunt patient test (shown in Fig. 8 of the MRM paper), with the corresponding analysis code for the Shunt-FENSI technique. For the MRM paper “Measuring CSF Shunt Flow with MRI Using Flow Enhancement of Signal Intensity (FENSI)”
keywords: Shunt-FENSI; MRM; Hydrocephalus; VP Shunt; Flow Quantification; Pediatric Neurosurgery; Pulse Sequence; Signal Simulation
published: 2016-12-20
 
Scripts and example data for AIDData (aiddata.org) processing in support of forthcoming Nakamura dissertation. This dataset includes two sets of scripts and example data files from an aiddata.org data dump. Fuller documentation about the functionality for these scripts is within the readme file. Additional background information and description of usage will be in the forthcoming Nakamura dissertation (link will be added when available). Data originally supplied by Nakamura. Python code and this readme file created by Wickes. Data included within this deposit are examples to demonstrate execution. Roughly, there are two python scripts in here: keyword_search.py, designed to assist in finding records matching specific keywords, and matching_tool.ipynb, designed to assist in detection of which records are and are not contained within a keyword results file and an aiddata project data file.
keywords: aiddata; natural resources
published: 2022-11-11
 
This dataset is for characterizing chemical short-range-ordering in CrCoNi medium entropy alloys. It has three sub-folders: 1. code, 2. sample WQ, 3. sample HT. The software needed to run the files is Gatan Microscopy Suite® (GMS). Please follow the instruction on this page to install the DM3 GMS: <a href="https://www.gatan.com/installation-instructions#Step1">https://www.gatan.com/installation-instructions#Step1</a> 1. Code folder contains three DM scripts to be installed in Gatan DigitalMicrograph software to analyze scanning electron nanobeam diffraction (SEND) dataset: Cepstrum.s: need [EF-SEND_sampleWQ_cropped_aligned.dm3] in Sample WQ and the average image from [EF-SEND_sampleWQ_cropped_aligned.dm3]. Same for Sample HT folder. log_BraggRemoval.s: same as above. Patterson.s: Need refined diffuse patterns in Sample HT folder. 2. Sample WQ and 3. Sample HT folders both contain the SEND data (.ser) and the binned SEND data (.dm3) as well as our calculated strain maps as the strain measurement reference. The Sample WQ folder additionally has atomic resolution STEM images; the Sample HT folder additionally has three refined diffuse patterns as references for diffraction data processing. * Only .ser file is needed to perform the strain measurement using imToolBox as listed in the manuscript. .emi file contains the meta data of the microscope, which can be opened together with .ser file using FEI TIA software.
keywords: Medium entropy alloy; CrCoNi; chemical short-range-ordering; CSRO; TEM
published: 2023-09-13
 
This upload contains one additional set of datasets (RNASim10k, ten replicates) used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zipped file has the following structure: 10k |__R0 |__unaln.fas |__true.fas |__true.tre |__R1 ... # Alignment files: 1. `unaln.fas`: all unaligned sequences. 2. `true.fas`: the reference alignment of all sequences. 3. `true.tre`: the reference tree on all sequences. For other datasets that uniquely appeared in EMMA, please refer to the related dataset (which is linked below): Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1
keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published: 2017-09-16
 
This dataset contains the data for 16S and 23S rRNA alignments including their reference trees. The original alignments are from the Gutell Lab CRW, currently located at https://crw-site.chemistry.gatech.edu/DAT/3C/Alignment/.
published: 2009-06-19
 
This dataset contains the data for SATe-I. SATe-I data was used in the following article: K. Liu, S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow, "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.
published: 2024-02-26
 
Traces created using DeathStarBench (https://github.com/delimitrou/DeathStarBench) benchmark of microservice applications with injected failures on containers. Failures consist of disk/CPU/memory failures.
keywords: Murphy;Performance Diagnosis;Microservice;Failures
published: 2019-10-27
 
This dataset accompanies the paper "STREETS: A Novel Camera Network Dataset for Traffic Flow" at Neural Information Processing Systems (NeurIPS) 2019. Included are: *Over four million still images form publicly accessible cameras in Lake County, IL. The images were collected across 2.5 months in 2018 and 2019. *Directed graphs describing the camera network structure in two communities in Lake County. *Documented non-recurring traffic incidents in Lake County coinciding with the 2018 data. *Traffic counts for each day of images in the dataset. These counts track the volume of traffic in each community. *Other annotations and files useful for computer vision systems. Refer to the accompanying "readme.txt" or "readme.pdf" for further details.
keywords: camera network; suburban vehicular traffic; roadways; computer vision
published: 2018-10-03
 
This dataset is the result of three crawls of the web performed in May 2018. The data contains raw crawl data and instrumentation captured by OpenWPM-Mobile, as well as analysis that identifies which scripts access mobile sensors, which ones perform some of browser fingerprinting, as well as clustering of scripts based on their intended use. The dataset is described in the included README.md file; more details about the methodology can be found in our ACM CCS'18 paper: Anupam Das, Gunes Acar, Nikita Borisov, Amogh Pradeep. The Web's Sixth Sense: A Study of Scripts Accessing Smartphone Sensors. In Proceedings of the 25th ACM Conference on Computer and Communications Security (CCS), Toronto, Canada, October 15–19, 2018. (Forthcoming)
keywords: mobile sensors; web crawls; browser fingerprinting; javascript
published: 2022-08-05
 
Simulated sequences provide a way to evaluate multiple sequence alignment (MSA) methods where the ground truth is exactly known. However, the realism of such simulated conditions often comes under question compared to empirical datasets. In particular, simulated data often does not display heterogeneity in the sequence lengths, a common feature in biological datasets. In order to imitate sequence length heterogeneity, we here present a set of data that are evolved under a mixture model of indel lengths, where indels have an occasional chance of being promoted to long indels (emulating large insertion/deletion events, e.g., domain-level gain/loss). This dataset is otherwise (e.g., in GTR parameters) analogous to the 1000M condition as presented in the SATe paper (doi: 10.1126/science.1171243) but with 5000 sequences and simulated with INDELible (http://abacus.gene.ucl.ac.uk/software/indelible/). For more information, see README.txt. For the INDELible control files, see https://github.com/ThisBioLife/5000M-234-het.
keywords: simulated data; sequence length heterogeneity; multiple sequence alignment;
published: 2019-09-01
 
Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014. The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).
keywords: global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series
published: 2022-08-31
 
These datasets are for the four-dimensional scanning transmission electron microscopy (4D-STEM) and electron energy loss spectroscopy (EELS) experiments for cathode nanoparticles at different cutoff voltages and in different electrolytes. The raw 4D-STEM experiment datasets were collected by TEM image & analysis software (FEI) and were saved as SER files. The raw 4D-STEM datasets of SER files can be opened and viewed in MATLAB using our analysis software package of imToolBox available at <a href="https://github.com/flysteven/imToolBox">https://github.com/flysteven/imToolBox</a>. The raw EELS datasets were collected by DigitalMicrograph software and were saved as DM4 files. The raw EELS datasets can be opened and viewed in DigitalMicrograph software or using our analysis codes available at <a href="https://github.com/chenlabUIUC/OrientedPhaseDomain">https://github.com/chenlabUIUC/OrientedPhaseDomain</a>. All the datasets are from the work "Formation and impact of nanoscopic oriented phase domains in electrochemical crystalline electrodes" (2022). The 4D-STEM experiment data include four example datasets for cathode nanoparticles collected at different cutoff voltages and in different electrolytes as described below. Each dataset contains a stack of diffraction patterns collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine cathode particle: "Pristine particle 4D-STEM.ser" 2. Cathode particle at the cutoff voltage of 0.09V during discharge at C/10 in the aqueous electrolyte: "Intermediate cutoff0_09V discharge (aqueous) 4D-STEM.ser" 3. Fully discharged cathode particle at C/10 in the aqueous electrolyte: "Fully discharged particle 4D-STEM.ser" 4. Fully discharged cathode particle at C/10 in the dry organic electrolyte: "Fully discharge particle (dry organic electrolyte).ser" The EELS experiment data includes three example datasets for cathode nanoparticles collected at different cutoff voltages during discharge in the aqueous electrolyte (in "EELS datasets.zip") as described below. Each EELS dataset contains the zero-loss and core-loss EELS spectra collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine cathode particle: "Pristine particle EELS.zip" 2. Cathode particle at the cutoff voltage of 0.09V during discharge at C/10 in the aqueous electrolyte: "intermediate discharge (aqueous) EELS.zip" 3. Fully discharged cathode particle at C/10 in the aqueous electrolyte: "fully discharge (aqueous) EELS.zip" The details of the software package and codes that can be used to analyze the 4D-STEM datasets and EELS datasets are available at: https://github.com/chenlabUIUC/OrientedPhaseDomain. Once our paper is formally published, we will update the relationship of these datasets with our paper.
keywords: 4D-STEM; microstructure; phase transformation; strain; cathode; nanoparticle; energy storage
published: 2021-03-06
 
This dataset consists of raw ADC readings from a 3 transmitter 4 receiver 77GHz FMCW radar, together with synchronized RGB camera and depth (active stereo) measurements. The data is grouped into 4 distinct radar configurations: - "indoor" configuration with range <14m - "30m" with range <38m - "50m" with range <63m - "high_res" with doppler resolution of 0.043m/s # Related code https://github.com/moodoki/radical_sdk # Hardware Project Page https://publish.illinois.edu/radicaldata
keywords: radar; FMCW; sensor-fusion; autonomous driving; dataset; RGB-D; object detection; odometry
published: 2019-10-19
 
Large, distributed microphone arrays could offer dramatic advantages for audio source separation, spatial audio capture, and human and machine listening applications. This dataset contains acoustic measurements and speech recordings from 10 loudspeakers and 160 microphones spread throughout a large, reverberant conference room. The distributed microphone system contains two types of array: four wearable microphone arrays of 16 sensors each placed near the ears and across the upper body, and twelve tabletop arrays of 8 microphones each in enclosures designed to resemble voice-assistant speakers. The dataset includes recordings of chirps that can be used to measure impulse responses and of speech clips derived from the CSTR VCTK corpus. The speech clips are recorded both individually and as a mixture to support source separation experiments. The uncompressed files are about 13.4 GB.
keywords: microphone arrays; audio source separation; augmented listening; wireless sensor networks
published: 2021-03-23
 
DNN weights used in the evaluation of the ApproxTuner system. Link to paper: https://dl.acm.org/doi/10.1145/3437801.3446108
published: 2021-01-27
 
*This is the third version of the dataset*. New changes in this 3rd version: <i>1.replaces simulations where the initial condition consists of a sinusoidal channel with topographic perturbations with simulations where the initial condition consists of a sinusoidal channel without topographic perturbations. These simulations better illustrate the transformation of a nondendritic network into a dendritic one. 2. contains two additional simulations showing how total domain size affects the landscape's dynamism. 3. changes dataset title to reflect the publication's title</i> This dataset contains data from 18 simulations using a landscape evolution model. A landscape evolution model simulates how uplift and rock incision shape the Earth's (or other planets) surface. To date, most landscape evolution models exhibit "extreme memory" (paper: https://doi.org/10.1029/2019GL083305 and dataset: https://doi.org/10.13012/B2IDB-4484338_V1). Extreme memory in landscape evolution models causes initial conditions to be unrealistically preserved. This dataset contains simulations from a new landscape evolution model that incorporates a sub-model that allows bedrock channels to erode laterally. With this addition, the landscapes no longer exhibit extreme memory. Initial conditions are erased over time, and the landscapes tend towards a dynamic steady state instead of a static one. The model with lateral erosion is named LEM-wLE (Landscape Evolution Model with Lateral Erosion) and the model without lateral erosion is named LEM-woLE (Landscape Evolution Model without Lateral Erosion). There are 16 folders in total. Here are the descriptions: <i>>LEM-woLE_simulations:</i> This folder contains simulations using LEM-woLE. Inside the folder are 5 subfolders containing 100 elevation rasters, 100 drainage area rasters, and 100 plots showing the slope-area relationship. Elevation depicts the height of the landscape, and drainage area represents a contributing area that is upslope. Each folder corresponds to a different initial condition. Driver files and code for these simulations can be found at https://github.com/jeffskwang/LEM-wLE. <i>>MOVIE_S#_data:</i> There are 13 data folders that contain raster data for 13 simulations using LEM-wLE. Inside each folder are 1000 elevation rasters, 1000 drainage area rasters, and 1000 plots showing the slope-area relationship. Driver files and code for these simulations can be found at https://github.com/jeffskwang/LEM-wLE. <i>>movies_mp4_format:</i> For each data folder there are 3 movies generated that show elevation (a), drainage area (b), and erosion rates (c). These files are formatted in the mp4 format and are best viewed using VLC media player (https://www.videolan.org/vlc/index.html). <i>>movies_wmv_format:</i> This folder contains the same movies as the "movies_mp4_format" folder, but they are in a wmv format. These movies can be viewed using Windows media player or other Windows platform movie software. Here are the captions for the 13 movies: Movie S1. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel without randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1. Movie S2. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Inclined with small, randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1. Movie S3. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Inclined with large, randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1. Movie S4. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: V-shaped valley with randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1. Movie S5. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel with randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1. Movie S6. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel without randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 0.25. Movie S7. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel without randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 0.5. Movie S8. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Sinusoidal channel without randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 0.75. Movie S9. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 1 open boundary at the bottom of the domain, and 3 closed boundaries elsewhere. KL/KV = 1. Movie S10. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 2 open boundaries at the top and bottom of the domain, and 2 closed boundaries on the left and right sides. KL/KV = 1. Movie S11. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 4 open boundaries. KL/KV = 1. Movie S12. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 4 open boundaries. KL/KV = 1. Compared to Movie S11, the length of the domain is 50% shorter, decreasing the total domain area. Movie S13. 200 MYR (1,000 RUs eroded) simulation showing elevation (a), logarithm of drainage area (b), and change in elevation (c). Initial Condition: Flat with randomized perturbations. Boundary Condition: 4 open boundaries. KL/KV = 1. Compared to Movie S11, the length of the domain is 50% longer, increasing the total domain area. The associated publication for this dataset has not yet been published, and we will update this description with a link when it is.
keywords: landscape evolution; drainage networks; lateral migration; geomorphology
published: 2019-10-23
 
Raw MD simulation trajectory, input and configuration files, SEM current data, and experimental raw data accompanying the publication, "Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore". README.md contains a description of all associated files.
keywords: molecular dynamics; protein sequencing; aerolysin; nanopore sequencing
published: 2019-10-05
 
This dataset contains collected and aggregated network information from NCSA’s Blue Waters system, which is comprised of 27,648 nodes connected via Cray Gemini* 3D torus (dimension 24x24x24) interconnect, from Jan/01/2017 to May/31/2017. Network performance counters for links are exposed via Cray's gpcdr (<a href="https://github.com/ovis-hpc/ovis/wiki/gpcdr-kernel-module">https://github.com/ovis-hpc/ovis/wiki/gpcdr-kernel-module</a>) kernel module. Lightweight Distributed Metric Service ([LDMS](<a href="https://github.com/ovis-hpc/ovis">https://github.com/ovis-hpc/ovis</a>)) is used to sampled the performance counters at 60 second intervals. Please read "README.md" file. <b>Acknowledgement:</b> This dataset is collected as a part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
keywords: HPC; Interconnect; Network; Congestion; Blue Waters; Dataset