Displaying 1 - 25 of 62 in total

Subject Area

Technology and Engineering (62)


U.S. National Science Foundation (NSF) (28)
Other (18)
U.S. Department of Energy (DOE) (9)
U.S. National Institutes of Health (NIH) (5)

Publication Year

2017 (9)
2022 (9)
2018 (8)
2021 (8)
2023 (7)
2019 (6)
2024 (4)
2016 (3)
2020 (3)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)


CC0 (34)
CC BY (25)
custom (3)


published: 2016-05-19
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords: taxi;transportation;New York City;GPS
published: 2020-08-22
We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Kubernetes cluster consisting of 15 heterogeneous nodes. The dataset is not sampled and is from selected types of requests in each benchmark, i.e., compose-posts in the social network application, compose-reviews in the media service application, book-rooms in the hotel reservation application, and reserve-tickets in the train ticket booking application. The four microservice applications come from [DeathStarBench](https://github.com/delimitrou/DeathStarBench) and [Train-Ticket](https://github.com/FudanSELab/train-ticket). The performance anomaly injector is from [FIRM](https://gitlab.engr.illinois.edu/DEPEND/firm.git). The dataset was preprocessed from the raw data generated in FIRM's tracing system. The dataset is separated by on which microservice component is the performance anomaly located (as the file name suggests). Each dataset is in CSV format and fields are separated by commas. Each line consists of the tracing ID and the duration (in 10^(-3) ms) of each component. Execution paths are specified in `execution_paths.txt` in each directory.
keywords: Microservices; Tracing; Performance
published: 2015-12-16
This dataset contains the data for PASTA and UPP. PASTA data was used in the following articles: Mirarab, Siavash, Nam Nguyen, Sheng Guo, Li-San Wang, Junhyong Kim, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.” Journal of Computational Biology 22, no. 5 (2015): 377–86. doi:10.1089/cmb.2014.0156. Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment.” Edited by Roded Sharan. Research in Computational Molecular Biology, 2014, 177–91. UPP data was used in: Nguyen, Nam-phuong D., Siavash Mirarab, Keerthana Kumar, and Tandy Warnow. “Ultra-Large Alignments Using Phylogeny-Aware Profiles.” Genome Biology 16, no. 1 (December 16, 2015): 124. doi:10.1186/s13059-015-0688-z.
published: 2017-09-16
This dataset contains the data for 16S and 23S rRNA alignments including their reference trees. The original alignments are from the Gutell Lab CRW, currently located at https://crw-site.chemistry.gatech.edu/DAT/3C/Alignment/.
published: 2009-06-19
This dataset contains the data for SATe-I. SATe-I data was used in the following article: K. Liu, S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow, "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.
published: 2024-02-26
Traces created using DeathStarBench (https://github.com/delimitrou/DeathStarBench) benchmark of microservice applications with injected failures on containers. Failures consist of disk/CPU/memory failures.
keywords: Murphy;Performance Diagnosis;Microservice;Failures
published: 2024-02-16
This dataset contains five files. (i) open_citations_jan2024_pub_ids.csv.gz, open_citations_jan2024_iid_el.csv.gz, open_citations_jan2024_el.csv.gz, and open_citation_jan2024_pubs.csv.gz represent a conversion of Open Citations to an edge list using integer ids assigned by us. The integer ids can be mapped to omids, pmids, and dois using the open_citation_jan2024_pubs.csv and open_citations_jan2024_pub_ids.scv files. The network consists of 121,052,490 nodes and 1,962,840,983 edges. Code for generating these data can be found https://github.com/chackoge/ERNIE_Plus/tree/master/OpenCitations. (ii) The fifth file, baseline2024.csv.gz, provides information about the metadata of PubMed papers. A 2024 version of PubMed was downloaded using Entrez and parsed into a table restricted to records that contain a pmid, a doi, and has a title and an abstract. A value of 1 in columns indicates that the information exists in metadata and a zero indicates otherwise. Code for generating this data: https://github.com/illinois-or-research-analytics/pubmed_etl
keywords: PubMed
published: 2024-02-16
Sample data from one typical phantom test and one deidentified shunt patient test (shown in Fig. 8 of the MRM paper), with the corresponding analysis code for the Shunt-FENSI technique. For the MRM paper “Measuring CSF Shunt Flow with MRI Using Flow Enhancement of Signal Intensity (FENSI)”
keywords: Shunt-FENSI; MRM; Hydrocephalus; VP Shunt; Flow Quantification; Pediatric Neurosurgery; Pulse Sequence; Signal Simulation
published: 2011-09-20
This page provides the data for SuperFine, DACTAL, and BeeTLe publications. - Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214. - Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13. - Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012. - Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282. - Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.
published: 2014-10-29
This dataset provides the data for Nguyen, Nam-phuong, et al. "TIPP: taxonomic identification and phylogenetic profiling." Bioinformatics 30.24 (2014): 3548-3555.
published: 2012-07-01
This dataset provides the data for Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. "SEPP: SATé-enabled phylogenetic placement." Biocomputing 2012. 2012. 247-258.
published: 2019-02-22
This dataset includes measurements taken during the experiments on patterns of alluvial cover over bedrock. The dataset includes an hour worth of timelapse images taken every 10s for eight different experimental conditions. It also includes the instantaneous water surface elevations measured with eTapes at a frequency of 10Hz for each experiment. The 'Read me Data.txt' file explains in more detail the contents of the dataset.
keywords: bedrock; erosion; alluvial; meandering; alluvial cover; sinuosity; flume; experiments; abrasion;
published: 2018-06-06
DNDC scripts and outputs that were generated as a part of the research publication 'Evaluation of DeNitrification DeComposition Model for Estimating Ammonia Fluxes from Chemical Fertilizer Application'.
keywords: DNDC; REA; ammonia emissions; fertilizers; uncertainty analysis
published: 2018-12-20
This dataset contains data used to generate figures and tables in the corresponding paper.
keywords: Black carbon; Emission Inventory; Observations; Climate change, Diesel engine, Coal burning
published: 2018-11-20
A dataset of acoustic impulse responses for microphones worn on the body. Microphones were placed at 80 positions on the body of a human subject and a plastic mannequin. The impulse responses can be used to study the acoustic effects of the body and can be convolved with sound sources to simulate wearable audio devices and microphone arrays. The dataset also includes measurements with different articles of clothing covering some of the microphones and with microphones placed on different hats and accessories. The measurements were performed from 24 angles of arrival in an acoustically treated laboratory. Related Paper: Ryan M. Corey, Naoki Tsuda, and Andrew C. Singer. "Acoustic Impulse Responses for Wearable Audio Devices," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK, May 2019. All impulse responses are sampled at 48 kHz and truncated to 500 ms. The impulse response data is provided in WAVE audio and MATLAB data file formats. The microphone locations are provided in tab-separated-value files for each experiment and are also depicted graphically in the documentation. The file wearable_mic_dataset_full.zip contains both WAVE- and MATLAB-format impulse responses. The file wearable_mic_dataset_matlab.zip contains only MATLAB-format impulse responses. The file wearable_mic_dataset_wave.zip contains only WAVE-format impulse responses.
keywords: Acoustic impulse responses; microphone arrays; wearables; hearing aids; audio source separation
published: 2019-09-01
Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014. The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).
keywords: global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series
published: 2019-10-19
Large, distributed microphone arrays could offer dramatic advantages for audio source separation, spatial audio capture, and human and machine listening applications. This dataset contains acoustic measurements and speech recordings from 10 loudspeakers and 160 microphones spread throughout a large, reverberant conference room. The distributed microphone system contains two types of array: four wearable microphone arrays of 16 sensors each placed near the ears and across the upper body, and twelve tabletop arrays of 8 microphones each in enclosures designed to resemble voice-assistant speakers. The dataset includes recordings of chirps that can be used to measure impulse responses and of speech clips derived from the CSTR VCTK corpus. The speech clips are recorded both individually and as a mixture to support source separation experiments. The uncompressed files are about 13.4 GB.
keywords: microphone arrays; audio source separation; augmented listening; wireless sensor networks
published: 2019-10-23
Raw MD simulation trajectory, input and configuration files, SEM current data, and experimental raw data accompanying the publication, "Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore". README.md contains a description of all associated files.
keywords: molecular dynamics; protein sequencing; aerolysin; nanopore sequencing
published: 2019-10-05
This dataset contains collected and aggregated network information from NCSA’s Blue Waters system, which is comprised of 27,648 nodes connected via Cray Gemini* 3D torus (dimension 24x24x24) interconnect, from Jan/01/2017 to May/31/2017. Network performance counters for links are exposed via Cray's gpcdr (<a href="https://github.com/ovis-hpc/ovis/wiki/gpcdr-kernel-module">https://github.com/ovis-hpc/ovis/wiki/gpcdr-kernel-module</a>) kernel module. Lightweight Distributed Metric Service ([LDMS](<a href="https://github.com/ovis-hpc/ovis">https://github.com/ovis-hpc/ovis</a>)) is used to sampled the performance counters at 60 second intervals. Please read "README.md" file. <b>Acknowledgement:</b> This dataset is collected as a part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
keywords: HPC; Interconnect; Network; Congestion; Blue Waters; Dataset
published: 2019-10-27
This dataset accompanies the paper "STREETS: A Novel Camera Network Dataset for Traffic Flow" at Neural Information Processing Systems (NeurIPS) 2019. Included are: *Over four million still images form publicly accessible cameras in Lake County, IL. The images were collected across 2.5 months in 2018 and 2019. *Directed graphs describing the camera network structure in two communities in Lake County. *Documented non-recurring traffic incidents in Lake County coinciding with the 2018 data. *Traffic counts for each day of images in the dataset. These counts track the volume of traffic in each community. *Other annotations and files useful for computer vision systems. Refer to the accompanying "readme.txt" or "readme.pdf" for further details.
keywords: camera network; suburban vehicular traffic; roadways; computer vision
published: 2017-11-14
If you use this dataset, please cite the IJRR data paper (bibtex is below). We present a dataset collected from a canoe along the Sangamon River in Illinois. The canoe was equipped with a stereo camera, an IMU, and a GPS device, which provide visual data suitable for stereo or monocular applications, inertial measurements, and position data for ground truth. We recorded a canoe trip up and down the river for 44 minutes covering 2.7 km round trip. The dataset adds to those previously recorded in unstructured environments and is unique in that it is recorded on a river, which provides its own set of challenges and constraints that are described in this paper. The data is divided into subsets, which can be downloaded individually. Video previews are available on Youtube: https://www.youtube.com/channel/UCOU9e7xxqmL_s4QX6jsGZSw The information below can also be found in the README files provided in the 527 dataset and each of its subsets. The purpose of this document is to assist researchers in using this dataset. Images ====== Raw --- The raw images are stored in the cam0 and cam1 directories in bmp format. They are bayered images that need to be debayered and undistorted before they are used. The camera parameters for these images can be found in camchain-imucam.yaml. Note that the camera intrinsics describe a 1600x1200 resolution image, so the focal length and center pixel coordinates must be scaled by 0.5 before they are used. The distortion coefficients remain the same even for the scaled images. The camera to imu tranformation matrix is also in this file. cam0/ refers to the left camera, and cam1/ refers to the right camera. Rectified --------- Stereo rectified, undistorted, row-aligned, debayered images are stored in the rectified/ directory in the same way as the raw images except that they are in png format. The params.yaml file contains the projection and rotation matrices necessary to use these images. The resolution of these parameters do not need to be scaled as is necessary for the raw images. params.yml ---------- The stereo rectification parameters. R0,R1,P0,P1, and Q correspond to the outputs of the OpenCV stereoRectify function except that 1s and 2s are replaced by 0s and 1s, respectively. R0: The rectifying rotation matrix of the left camera. R1: The rectifying rotation matrix of the right camera. P0: The projection matrix of the left camera. P1: The projection matrix of the right camera. Q: Disparity to depth mapping matrix T_cam_imu: Transformation matrix for a point in the IMU frame to the left camera frame. camchain-imucam.yaml -------------------- The camera intrinsic and extrinsic parameters and the camera to IMU transformation usable with the raw images. T_cam_imu: Transformation matrix for a point in the IMU frame to the camera frame. distortion_coeffs: lens distortion coefficients using the radial tangential model. intrinsics: focal length x, focal length y, principal point x, principal point y resolution: resolution of calibration. Scale the intrinsics for use with the raw 800x600 images. The distortion coefficients do not change when the image is scaled. T_cn_cnm1: Transformation matrix from the right camera to the left camera. Sensors ------- Here, each message in name.csv is described ###rawimus### time # GPS time in seconds message name # rawimus acceleration_z # m/s^2 IMU uses right-forward-up coordinates -acceleration_y # m/s^2 acceleration_x # m/s^2 angular_rate_z # rad/s IMU uses right-forward-up coordinates -angular_rate_y # rad/s angular_rate_x # rad/s ###IMG### time # GPS time in seconds message name # IMG left image filename right image filename ###inspvas### time # GPS time in seconds message name # inspvas latitude longitude altitude # ellipsoidal height WGS84 in meters north velocity # m/s east velocity # m/s up velocity # m/s roll # right hand rotation about y axis in degrees pitch # right hand rotation about x axis in degrees azimuth # left hand rotation about z axis in degrees clockwise from north ###inscovs### time # GPS time in seconds message name # inscovs position covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz m^2 attitude covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz deg^2 velocity covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz (m/s)^2 ###bestutm### time # GPS time in seconds message name # bestutm utm zone # numerical zone utm character # alphabetical zone northing # m easting # m height # m above mean sea level Camera logs ----------- The files name.cam0 and name.cam1 are text files that correspond to cameras 0 and 1, respectively. The columns are defined by: unused: The first column is all 1s and can be ignored. software frame number: This number increments at the end of every iteration of the software loop. camera frame number: This number is generated by the camera and increments each time the shutter is triggered. The software and camera frame numbers do not have to start at the same value, but if the difference between the initial and final values is not the same, it suggests that frames may have been dropped. camera timestamp: This is the cameras internal timestamp of the frame capture in units of 100 milliseconds. PC timestamp: This is the PC time of arrival of the image. name.kml -------- The kml file is a mapping file that can be read by software such as Google Earth. It contains the recorded GPS trajectory. name.unicsv ----------- This is a csv file of the GPS trajectory in UTM coordinates that can be read by gpsbabel, software for manipulating GPS paths. @article{doi:10.1177/0278364917751842, author = {Martin Miller and Soon-Jo Chung and Seth Hutchinson}, title ={The Visual–Inertial Canoe Dataset}, journal = {The International Journal of Robotics Research}, volume = {37}, number = {1}, pages = {13-20}, year = {2018}, doi = {10.1177/0278364917751842}, URL = {https://doi.org/10.1177/0278364917751842}, eprint = {https://doi.org/10.1177/0278364917751842} }
keywords: slam;sangamon;river;illinois;canoe;gps;imu;stereo;monocular;vision;inertial
published: 2024-01-01
Contains scattering data obtained for (TaSe4)2I at the Advanced Photon Source at Argonne National Laboratory. Beamline 6ID-D was used with a beam energy of 64.8 keV in a transmission geometry. Data was obtained at temperatures between 28 and 300 K. See the readme.txt file for more information.
keywords: X-ray diffraction
published: 2023-11-14
This repository contains the training dataset associated with the 2023 Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics (DGM-Image Challenge), hosted by the American Association of Physicists in Medicine. This dataset contains more than 100,000 8-bit images of size 512x512. These images emulate coronal slices from anthropomorphic breast phantoms adapted from the VICTRE toolchain [1], with assigned X-ray attenuation coefficients relevant for breast computed tomography. Also included are the labels indicating the breast type. The challenge has now concluded. More information about the challenge can be found here: <a href="https://www.aapm.org/GrandChallenge/DGM-Image/">https://www.aapm.org/GrandChallenge/DGM-Image/</a>. * New in V3: we added a CSV file containing the image breast type labels and example images (PNG).
keywords: Deep generative models; breast computed tomography