Displaying 1 - 25 of 62 in total

Datasets

published: 2020-08-22

Qiu, Haoran; Banerjee, Subho S.; Jha, Saurabh; Kalbarczyk, Zbigniew T.; Iyer, Ravishankar K. (2020): Pre-processed Tracing Data for Popular Microservice Benchmarks. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6738796_V1

We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Kubernetes cluster consisting of 15 heterogeneous nodes. The dataset is not sampled and is from selected types of requests in each benchmark, i.e., compose-posts in the social network application, compose-reviews in the media service application, book-rooms in the hotel reservation application, and reserve-tickets in the train ticket booking application. The four microservice applications come from [DeathStarBench](https://github.com/delimitrou/DeathStarBench) and [Train-Ticket](https://github.com/FudanSELab/train-ticket). The performance anomaly injector is from [FIRM](https://gitlab.engr.illinois.edu/DEPEND/firm.git). The dataset was preprocessed from the raw data generated in FIRM's tracing system. The dataset is separated by on which microservice component is the performance anomaly located (as the file name suggests). Each dataset is in CSV format and fields are separated by commas. Each line consists of the tracing ID and the duration (in 10^(-3) ms) of each component. Execution paths are specified in `execution_paths.txt` in each directory.

keywords: Microservices; Tracing; Performance

published: 2023-04-06

Warnow, Tandy; Park, Minhyuk (2023): INDELible simulated datesets with sequence length heterogeneity. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0900513_V1

This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.

keywords: sequence length heterogeneity;indelible;computational biology;multiple sequence alignment

published: 2023-09-13

Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2023): Additional datasets (RNASim10k) for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4194451_V1

This upload contains one additional set of datasets (RNASim10k, ten replicates) used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zipped file has the following structure: 10k |__R0 |__unaln.fas |__true.fas |__true.tre |__R1 ... # Alignment files: 1. `unaln.fas`: all unaligned sequences. 2. `true.fas`: the reference alignment of all sequences. 3. `true.tre`: the reference tree on all sequences. For other datasets that uniquely appeared in EMMA, please refer to the related dataset (which is linked below): Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1

keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity

published: 2024-02-26

Harsh, Vipul; Zhou, Wenxuan; Ashok, Sachin; Mysore, Radhika Niranjan; Godfrey, Brighten; Banerjee, Sujata (2024): Murphy traces. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6641912_V1

Traces created using DeathStarBench (https://github.com/delimitrou/DeathStarBench) benchmark of microservice applications with injected failures on containers. Failures consist of disk/CPU/memory failures.

keywords: Murphy;Performance Diagnosis;Microservice;Failures

published: 2024-02-16

Mohasel Arjomandi, Hossein; Korobskiy, Dmitriy; Chacko, George (2024): Parsed Open Citations and PubMed Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5216575_V1

This dataset contains five files. (i) open_citations_jan2024_pub_ids.csv.gz, open_citations_jan2024_iid_el.csv.gz, open_citations_jan2024_el.csv.gz, and open_citation_jan2024_pubs.csv.gz represent a conversion of Open Citations to an edge list using integer ids assigned by us. The integer ids can be mapped to omids, pmids, and dois using the open_citation_jan2024_pubs.csv and open_citations_jan2024_pub_ids.scv files. The network consists of 121,052,490 nodes and 1,962,840,983 edges. Code for generating these data can be found https://github.com/chackoge/ERNIE_Plus/tree/master/OpenCitations. (ii) The fifth file, baseline2024.csv.gz, provides information about the metadata of PubMed papers. A 2024 version of PubMed was downloaded using Entrez and parsed into a table restricted to records that contain a pmid, a doi, and has a title and an abstract. A value of 1 in columns indicates that the information exists in metadata and a zero indicates otherwise. Code for generating this data: https://github.com/illinois-or-research-analytics/pubmed_etl

keywords: PubMed

published: 2014-10-29

Nguyen, Nam-phuong; Mirarab, Siavash; Bo, Liu; Pop, Mihai; Warnow, Tandy (2014): Data for Taxonomic Identification and Phylogenetic Profiling. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8783447_V1

This dataset provides the data for Nguyen, Nam-phuong, et al. "TIPP: taxonomic identification and phylogenetic profiling." Bioinformatics 30.24 (2014): 3548-3555.

published: 2012-07-01

Mirarab, Siavash; Ngyuen, Nam-Phuong; Warnow, Tandy (2012): Data for SEPP: SATé-Enabled Phylogenetic Placement.. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9316702_V1

This dataset provides the data for Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. "SEPP: SATé-enabled phylogenetic placement." Biocomputing 2012. 2012. 247-258.

published: 2023-01-16

Xie, Yuxuan Richard; Chari, Varsha.K; Castro, Daniel.C; Grant, Romans; Rubakhin , Stanislav S. ; Sweedler, Jonathan V. (2023): Data-Driven and Machine Learning Based Framework for Image-Guided Single-Cell Mass Spectrometry. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7302959_V1

Data sets to reproduce the results provided by the tutorial in paper "Data-Driven and Machine Learning Based Framework for Image-Guided Single-Cell Mass Spectrometry"

published: 2023-11-14

Gotsis, Dimitrios; Kelkar, Varun; Deshpande, Rucha; Brooks, Frank; KC, Prabhat; Myers, Kyle; Zeng, Rongping; Anastasio, Mark (2023): Data for the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2773204_V3

This repository contains the training dataset associated with the 2023 Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics (DGM-Image Challenge), hosted by the American Association of Physicists in Medicine. This dataset contains more than 100,000 8-bit images of size 512x512. These images emulate coronal slices from anthropomorphic breast phantoms adapted from the VICTRE toolchain [1], with assigned X-ray attenuation coefficients relevant for breast computed tomography. Also included are the labels indicating the breast type. The challenge has now concluded. More information about the challenge can be found here: <a href="https://www.aapm.org/GrandChallenge/DGM-Image/">https://www.aapm.org/GrandChallenge/DGM-Image/</a>. * New in V3: we added a CSV file containing the image breast type labels and example images (PNG).

keywords: Deep generative models; breast computed tomography

published: 2016-05-19

Donovan, Brian; Work, Dan (2016): New York City Taxi Trip Data (2010-2013). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/J8PN93H8

This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.

keywords: taxi;transportation;New York City;GPS

published: 2015-12-16

Nguyen, Nam-phuong; Mirarab, Siavash; Kumar, Keerthana; Warnow, Tandy (2015): Data for Ultra-Large Alignments Using Phylogeny-Aware Profiles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3174395_V1

This dataset contains the data for PASTA and UPP. PASTA data was used in the following articles: Mirarab, Siavash, Nam Nguyen, Sheng Guo, Li-San Wang, Junhyong Kim, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.” Journal of Computational Biology 22, no. 5 (2015): 377–86. doi:10.1089/cmb.2014.0156. Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment.” Edited by Roded Sharan. Research in Computational Molecular Biology, 2014, 177–91. UPP data was used in: Nguyen, Nam-phuong D., Siavash Mirarab, Keerthana Kumar, and Tandy Warnow. “Ultra-Large Alignments Using Phylogeny-Aware Profiles.” Genome Biology 16, no. 1 (December 16, 2015): 124. doi:10.1186/s13059-015-0688-z.

published: 2017-09-16

Mirarab, Siavash; Warnow, Tandy (2017): Data for 16S and 23S rRNA alignments. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1614388_V1

This dataset contains the data for 16S and 23S rRNA alignments including their reference trees. The original alignments are from the Gutell Lab CRW, currently located at https://crw-site.chemistry.gatech.edu/DAT/3C/Alignment/.

published: 2009-06-19

Liu, Kevin; Raghavan, Sindhu; Nelesen, Serita; Linder, C. Randall; Warnow, Tandy (2009): Data for Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5139418_V1

This dataset contains the data for SATe-I. SATe-I data was used in the following article: K. Liu, S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow, "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.

published: 2024-02-16

Zhang, Mingxiao; Sutton, Bradley (2024): Sample Data for “Measuring CSF Shunt Flow with MRI Using Flow Enhancement of Signal Intensity (FENSI)”. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7252521_V1

Sample data from one typical phantom test and one deidentified shunt patient test (shown in Fig. 8 of the MRM paper), with the corresponding analysis code for the Shunt-FENSI technique. For the MRM paper “Measuring CSF Shunt Flow with MRI Using Flow Enhancement of Signal Intensity (FENSI)”

keywords: Shunt-FENSI; MRM; Hydrocephalus; VP Shunt; Flow Quantification; Pediatric Neurosurgery; Pulse Sequence; Signal Simulation

published: 2011-09-20

Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San (2011): Data for SuperFine, DACTAL, and BeeTLe. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2952208_V1

This page provides the data for SuperFine, DACTAL, and BeeTLe publications. - Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214. - Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13. - Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012. - Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282. - Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.

published: 2019-02-22

Fernández, Roberto; Parker, Gary; Stark, Colin (2019): Experiments on patterns of alluvial cover and bedrock erosion in a meandering channel. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2-3044828_V1

This dataset includes measurements taken during the experiments on patterns of alluvial cover over bedrock. The dataset includes an hour worth of timelapse images taken every 10s for eight different experimental conditions. It also includes the instantaneous water surface elevations measured with eTapes at a frequency of 10Hz for each experiment. The 'Read me Data.txt' file explains in more detail the contents of the dataset.

keywords: bedrock; erosion; alluvial; meandering; alluvial cover; sinuosity; flume; experiments; abrasion;

published: 2018-04-06

Collins, Kodi; Warnow, Tandy (2018): PASTA For Proteins Data (BALiBASE). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4074787_V1

keywords: protein; multiple sequence alignment; balibase

published: 2018-06-06

Balasubramanian, Srinidhi; Nelson, Andrew; Koloutsou-Vakakis, Sotiria; Lin, Jie; Rood, Mark; Myles, LaToya; Bernacchi, Carl (2018): Dataset for Evaluation of DeNitrification DeComposition Model for Estimating Ammonia Fluxes from Chemical Fertilizer Application. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3773381_V1

DNDC scripts and outputs that were generated as a part of the research publication 'Evaluation of DeNitrification DeComposition Model for Estimating Ammonia Fluxes from Chemical Fertilizer Application'.

keywords: DNDC; REA; ammonia emissions; fertilizers; uncertainty analysis

published: 2018-04-24

Sun, Tianye; Liu, Liang; Flanner, Mark; Kirchstetter, Thomas; Chaoyi, Jiao; Preble, Chelsea; Chang, Wayne; Bond, Tami (2018): Constraining a Historical Black Carbon Emission Inventory of U.S. for 1960 to 2000 data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9686195_V1

keywords: Black carbon; Emission Inventory; Observations; Climate change, Diesel engine, Coal burning

published: 2018-12-20

Sun, Tianye; Liu, Liang; Flanner, Mark; Kirchstetter, Thomas; Jiao, Chaoyi; Preble, Chelsea; Chang, Wayne; Bond, Tami (2018): Constraining a Historical Black Carbon Emission Inventory of U.S. for 1960 to 2000 data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9686195_V2

This dataset contains data used to generate figures and tables in the corresponding paper.

keywords: Black carbon; Emission Inventory; Observations; Climate change, Diesel engine, Coal burning

published: 2018-11-20

Corey, Ryan M.; Tsuda, Naoki; Singer, Andrew C. (2018): Wearable Microphone Impulse Responses. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1932389_V1

A dataset of acoustic impulse responses for microphones worn on the body. Microphones were placed at 80 positions on the body of a human subject and a plastic mannequin. The impulse responses can be used to study the acoustic effects of the body and can be convolved with sound sources to simulate wearable audio devices and microphone arrays. The dataset also includes measurements with different articles of clothing covering some of the microphones and with microphones placed on different hats and accessories. The measurements were performed from 24 angles of arrival in an acoustically treated laboratory. Related Paper: Ryan M. Corey, Naoki Tsuda, and Andrew C. Singer. "Acoustic Impulse Responses for Wearable Audio Devices," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK, May 2019. All impulse responses are sampled at 48 kHz and truncated to 500 ms. The impulse response data is provided in WAVE audio and MATLAB data file formats. The microphone locations are provided in tab-separated-value files for each experiment and are also depicted graphically in the documentation. The file wearable_mic_dataset_full.zip contains both WAVE- and MATLAB-format impulse responses. The file wearable_mic_dataset_matlab.zip contains only MATLAB-format impulse responses. The file wearable_mic_dataset_wave.zip contains only WAVE-format impulse responses.

keywords: Acoustic impulse responses; microphone arrays; wearables; hearing aids; audio source separation

published: 2019-09-01

Jackson, Nicole; Konar, Megan; Debaere, Peter; Estes, Lyndon (2019): Data for: Probabilistic global maps of crop-specific areas from 1961 to 2014. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7439710_V1

Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014. The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).

keywords: global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series

published: 2019-10-19

Corey, Ryan M.; Skarha, Matthew D.; Singer, Andrew C. (2019): Massive Distributed Microphone Array Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6216881_V1

Large, distributed microphone arrays could offer dramatic advantages for audio source separation, spatial audio capture, and human and machine listening applications. This dataset contains acoustic measurements and speech recordings from 10 loudspeakers and 160 microphones spread throughout a large, reverberant conference room. The distributed microphone system contains two types of array: four wearable microphone arrays of 16 sensors each placed near the ears and across the upper body, and twelve tabletop arrays of 8 microphones each in enclosures designed to resemble voice-assistant speakers. The dataset includes recordings of chirps that can be used to measure impulse responses and of speech clips derived from the CSTR VCTK corpus. The speech clips are recorded both individually and as a mixture to support source separation experiments. The uncompressed files are about 13.4 GB.

keywords: microphone arrays; audio source separation; augmented listening; wireless sensor networks

published: 2019-10-23

Ouldali, Hadjer; Sarthak, Kumar; Ensslen, Tobias; Piguet, Fabien; Manivet, Philippe; Pelta, Juan; Behrends, Jan C.; Aksimentiev, Aleksei; Oukhaled, Abdelghani (2019): Experiment and simulation raw data for Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4905767_V1

Raw MD simulation trajectory, input and configuration files, SEM current data, and experimental raw data accompanying the publication, "Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore". README.md contains a description of all associated files.

keywords: molecular dynamics; protein sequencing; aerolysin; nanopore sequencing

published: 2019-10-05

Saurabh, Jha; Archit, Patke; Mike, Showerman; Jeremy, Enos; Greg, Bauer; Zbigniew, Kalbarczyk; Ravishankar, Iyer; William , Kramer (2019): Monet - Blue Waters Network Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2921318_V1

This dataset contains collected and aggregated network information from NCSA’s Blue Waters system, which is comprised of 27,648 nodes connected via Cray Gemini* 3D torus (dimension 24x24x24) interconnect, from Jan/01/2017 to May/31/2017. Network performance counters for links are exposed via Cray's gpcdr (<a href="https://github.com/ovis-hpc/ovis/wiki/gpcdr-kernel-module">https://github.com/ovis-hpc/ovis/wiki/gpcdr-kernel-module</a>) kernel module. Lightweight Distributed Metric Service ([LDMS](<a href="https://github.com/ovis-hpc/ovis">https://github.com/ovis-hpc/ovis</a>)) is used to sampled the performance counters at 60 second intervals. Please read "README.md" file. <b>Acknowledgement:</b> This dataset is collected as a part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.

keywords: HPC; Interconnect; Network; Congestion; Blue Waters; Dataset

Subject Area

Funder

Publication Year

License

Datasets