Displaying 26 - 50 of 77 in total

Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2017-11-14

Miller, Martin; Chung, Soon-Jo; Hutchinson, Seth (2017): The Visual-Inertial Canoe Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9342111_V1

If you use this dataset, please cite the IJRR data paper (bibtex is below). We present a dataset collected from a canoe along the Sangamon River in Illinois. The canoe was equipped with a stereo camera, an IMU, and a GPS device, which provide visual data suitable for stereo or monocular applications, inertial measurements, and position data for ground truth. We recorded a canoe trip up and down the river for 44 minutes covering 2.7 km round trip. The dataset adds to those previously recorded in unstructured environments and is unique in that it is recorded on a river, which provides its own set of challenges and constraints that are described in this paper. The data is divided into subsets, which can be downloaded individually. Video previews are available on Youtube: https://www.youtube.com/channel/UCOU9e7xxqmL_s4QX6jsGZSw The information below can also be found in the README files provided in the 527 dataset and each of its subsets. The purpose of this document is to assist researchers in using this dataset. Images ====== Raw --- The raw images are stored in the cam0 and cam1 directories in bmp format. They are bayered images that need to be debayered and undistorted before they are used. The camera parameters for these images can be found in camchain-imucam.yaml. Note that the camera intrinsics describe a 1600x1200 resolution image, so the focal length and center pixel coordinates must be scaled by 0.5 before they are used. The distortion coefficients remain the same even for the scaled images. The camera to imu tranformation matrix is also in this file. cam0/ refers to the left camera, and cam1/ refers to the right camera. Rectified --------- Stereo rectified, undistorted, row-aligned, debayered images are stored in the rectified/ directory in the same way as the raw images except that they are in png format. The params.yaml file contains the projection and rotation matrices necessary to use these images. The resolution of these parameters do not need to be scaled as is necessary for the raw images. params.yml ---------- The stereo rectification parameters. R0,R1,P0,P1, and Q correspond to the outputs of the OpenCV stereoRectify function except that 1s and 2s are replaced by 0s and 1s, respectively. R0: The rectifying rotation matrix of the left camera. R1: The rectifying rotation matrix of the right camera. P0: The projection matrix of the left camera. P1: The projection matrix of the right camera. Q: Disparity to depth mapping matrix T_cam_imu: Transformation matrix for a point in the IMU frame to the left camera frame. camchain-imucam.yaml -------------------- The camera intrinsic and extrinsic parameters and the camera to IMU transformation usable with the raw images. T_cam_imu: Transformation matrix for a point in the IMU frame to the camera frame. distortion_coeffs: lens distortion coefficients using the radial tangential model. intrinsics: focal length x, focal length y, principal point x, principal point y resolution: resolution of calibration. Scale the intrinsics for use with the raw 800x600 images. The distortion coefficients do not change when the image is scaled. T_cn_cnm1: Transformation matrix from the right camera to the left camera. Sensors ------- Here, each message in name.csv is described ###rawimus### time # GPS time in seconds message name # rawimus acceleration_z # m/s^2 IMU uses right-forward-up coordinates -acceleration_y # m/s^2 acceleration_x # m/s^2 angular_rate_z # rad/s IMU uses right-forward-up coordinates -angular_rate_y # rad/s angular_rate_x # rad/s ###IMG### time # GPS time in seconds message name # IMG left image filename right image filename ###inspvas### time # GPS time in seconds message name # inspvas latitude longitude altitude # ellipsoidal height WGS84 in meters north velocity # m/s east velocity # m/s up velocity # m/s roll # right hand rotation about y axis in degrees pitch # right hand rotation about x axis in degrees azimuth # left hand rotation about z axis in degrees clockwise from north ###inscovs### time # GPS time in seconds message name # inscovs position covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz m^2 attitude covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz deg^2 velocity covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz (m/s)^2 ###bestutm### time # GPS time in seconds message name # bestutm utm zone # numerical zone utm character # alphabetical zone northing # m easting # m height # m above mean sea level Camera logs ----------- The files name.cam0 and name.cam1 are text files that correspond to cameras 0 and 1, respectively. The columns are defined by: unused: The first column is all 1s and can be ignored. software frame number: This number increments at the end of every iteration of the software loop. camera frame number: This number is generated by the camera and increments each time the shutter is triggered. The software and camera frame numbers do not have to start at the same value, but if the difference between the initial and final values is not the same, it suggests that frames may have been dropped. camera timestamp: This is the cameras internal timestamp of the frame capture in units of 100 milliseconds. PC timestamp: This is the PC time of arrival of the image. name.kml -------- The kml file is a mapping file that can be read by software such as Google Earth. It contains the recorded GPS trajectory. name.unicsv ----------- This is a csv file of the GPS trajectory in UTM coordinates that can be read by gpsbabel, software for manipulating GPS paths. @article{doi:10.1177/0278364917751842, author = {Martin Miller and Soon-Jo Chung and Seth Hutchinson}, title ={The Visual–Inertial Canoe Dataset}, journal = {The International Journal of Robotics Research}, volume = {37}, number = {1}, pages = {13-20}, year = {2018}, doi = {10.1177/0278364917751842}, URL = {https://doi.org/10.1177/0278364917751842}, eprint = {https://doi.org/10.1177/0278364917751842} }

keywords: slam;sangamon;river;illinois;canoe;gps;imu;stereo;monocular;vision;inertial

published: 2017-05-01

Freedman, Ryan (2017): Smartphone recorded driving sensor data: Indianapolis International Airport to Urbana, IL. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4650469_V1

Indianapolis Int'l Airport to Urbana: Sampling Rate: 2 Hz Total Travel Time: 5901534 ms or 98.4 minutes Number of Data Points: 11805 Distance Traveled: 124 miles via I-74 Device used: Samsung Galaxy S6 Date Recorded: 2016-11-27 Parameters Recorded: * ACCELEROMETER X (m/s²) * ACCELEROMETER Y (m/s²) * ACCELEROMETER Z (m/s²) * GRAVITY X (m/s²) * GRAVITY Y (m/s²) * GRAVITY Z (m/s²) * LINEAR ACCELERATION X (m/s²) * LINEAR ACCELERATION Y (m/s²) * LINEAR ACCELERATION Z (m/s²) * GYROSCOPE X (rad/s) * GYROSCOPE Y (rad/s) * GYROSCOPE Z (rad/s) * LIGHT (lux) * MAGNETIC FIELD X (microT) * MAGNETIC FIELD Y (microT) * MAGNETIC FIELD Z (microT) * ORIENTATION Z (azimuth °) * ORIENTATION X (pitch °) * ORIENTATION Y (roll °) * PROXIMITY (i) * ATMOSPHERIC PRESSURE (hPa) * SOUND LEVEL (dB) * LOCATION Latitude * LOCATION Longitude * LOCATION Altitude (m) * LOCATION Altitude-google (m) * LOCATION Altitude-atmospheric pressure (m) * LOCATION Speed (kph) * LOCATION Accuracy (m) * LOCATION ORIENTATION (°) * Satellites in range * GPS NMEA * Time since start in ms * Current time in YYYY-MO-DD HH-MI-SS_SSS format Quality Notes: There are some things to note about the quality of this data set that you may want to consider while doing preprocessing. This dataset was taken continuously as a single trip, no stop was made for gas along the way making this a very long continuous dataset. It starts in the parking lot of the Indianapolis International Airport and continues directly towards a gas station on Lincoln Avenue in Urbana, IL. There are a couple parts of the trip where the phones orientation had to be changed because my navigation cut out. These times are easy to account for based on Orientation X/Y/Z change. I would also advise cutting out the first couple hundred points or the points leading up to highway speed. The phone was mounted in the cupholder in the front seat of the car.

keywords: smartphone; sensor; driving; accelerometer; gyroscope; magnetometer; gps; nmea; barometer; satellite

published: 2017-02-28

Freedman, Ryan (2017): Smartphone recorded driving sensor data: Leesburg, VA to Indianapolis, IN. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5975383_V1

Leesburg, VA to Indianapolis, Indiana: Sampling Rate: 0.1 Hz Total Travel Time: 31100007 ms or 518 minutes or 8.6 hours Distance Traveled: 570 miles via I-70 Number of Data Points: 3112 Device used: Samsung Galaxy S4 Date Recorded: 2017-01-15 Parameters Recorded: * ACCELEROMETER X (m/s²) * ACCELEROMETER Y (m/s²) * ACCELEROMETER Z (m/s²) * GRAVITY X (m/s²) * GRAVITY Y (m/s²) * GRAVITY Z (m/s²) * LINEAR ACCELERATION X (m/s²) * LINEAR ACCELERATION Y (m/s²) * LINEAR ACCELERATION Z (m/s²) * GYROSCOPE X (rad/s) * GYROSCOPE Y (rad/s) * GYROSCOPE Z (rad/s) * LIGHT (lux) * MAGNETIC FIELD X (microT) * MAGNETIC FIELD Y (microT) * MAGNETIC FIELD Z (microT) * ORIENTATION Z (azimuth °) * ORIENTATION X (pitch °) * ORIENTATION Y (roll °) * PROXIMITY (i) * ATMOSPHERIC PRESSURE (hPa) * Relative Humidity (%) * Temperature (F) * SOUND LEVEL (dB) * LOCATION Latitude * LOCATION Longitude * LOCATION Altitude (m) * LOCATION Altitude-google (m) * LOCATION Altitude-atmospheric pressure (m) * LOCATION Speed (kph) * LOCATION Accuracy (m) * LOCATION ORIENTATION (°) * Satellites in range * GPS NMEA * Time since start in ms * Current time in YYYY-MO-DD HH-MI-SS_SSS format Quality Notes: There are some things to note about the quality of this data set that you may want to consider while doing preprocessing. This dataset was taken continuously but had multiple stops to refuel (without the data recording ceasing). This can be removed by parsing out all data that has a speed of 0. The mount for this dataset was fairly stable (as can be seen by the consistent orientation angle throughout the dataset). It was mounted tightly between two seats in the back of the vehicle. Unfortunately, the frequency for this dataset was set fairly low at one per ten seconds.

keywords: smartphone; sensor; driving; accelerometer; gyroscope; magnetometer; gps; nmea; barometer; satellite; temperature; humidity

published: 2024-12-11

Cheng, Ho Kei (2024): Pretrained models for MMAudio. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-1292191_V2

MMAudio pretrained models. These models can be used in the open-sourced codebase https://github.com/hkchengrex/MMAudio <b>Note:</b> mmaudio_large_44k_v2.pth and Readme.txt are added to this V2. Other 4 files stay the same.

published: 2024-06-04

Park, Minhyuk; Tabatabaee, Yasamin; Warnow, Tandy; Chacko, George (2024): Data for Well-Connectedness and Community Detection. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-6271968_V1

This dataset contains files and relevant metadata for real-world and synthetic LFR networks used in the manuscript "Well-Connectedness and Community Detection (2024) Park et al. presently under review at PLOS Complex Systems. The manuscript is an extended version of Park, M. et al. (2024). Identifying Well-Connected Communities in Real-World and Synthetic Networks. In Complex Networks & Their Applications XII. COMPLEX NETWORKS 2023. Studies in Computational Intelligence, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-031-53499-7_1. “The Overview of Real-World Networks image provides high-level information about the seven real-world networks. TSVs of the seven real-world networks are provided as [network-name]_cleaned to indicate that duplicated edges and self-loops were removed, where column 1 is source and column 2 is target. LFR datasets are contained within the zipped file. Real-world networks are labeled _cleaned_ to indicate that duplicate edges and self loops were removed. #LFR datasets for the Connectivity Modifier (CM) paper ### File organization Each directory `[network-name]_[resolution-value]_lfr` includes the following files: * `network.dat`: LFR network edge-list * `community.dat`: LFR ground-truth communities * `time_seed.dat`: time seed used in the LFR software * `statistics.dat`: statistics generated by the LFR software * `cmd.stat`: command used to run the LFR software as well as time and memory usage information

published: 2018-12-20

Sun, Tianye; Liu, Liang; Flanner, Mark; Kirchstetter, Thomas; Jiao, Chaoyi; Preble, Chelsea; Chang, Wayne; Bond, Tami (2018): Constraining a Historical Black Carbon Emission Inventory of U.S. for 1960 to 2000 data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9686195_V2

This dataset contains data used to generate figures and tables in the corresponding paper.

keywords: Black carbon; Emission Inventory; Observations; Climate change, Diesel engine, Coal burning

published: 2023-03-16

Park, Minhyuk; Tabatabaee, Yasamin; Warnow, Tandy; Chacko, George (2023): Data For Well-Connected Communities In Real Networks.. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0908742_V1

Curated networks and clustering output from the manuscript: Well-Connected Communities in Real-World Networks https://arxiv.org/abs/2303.02813

keywords: Community detection; clustering; open citations; scientometrics; bibliometrics

published: 2024-02-16

Mohasel Arjomandi, Hossein; Korobskiy, Dmitriy; Chacko, George (2024): Parsed Open Citations and PubMed Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5216575_V1

This dataset contains five files. (i) open_citations_jan2024_pub_ids.csv.gz, open_citations_jan2024_iid_el.csv.gz, open_citations_jan2024_el.csv.gz, and open_citation_jan2024_pubs.csv.gz represent a conversion of Open Citations to an edge list using integer ids assigned by us. The integer ids can be mapped to omids, pmids, and dois using the open_citation_jan2024_pubs.csv and open_citations_jan2024_pub_ids.scv files. The network consists of 121,052,490 nodes and 1,962,840,983 edges. Code for generating these data can be found https://github.com/chackoge/ERNIE_Plus/tree/master/OpenCitations. (ii) The fifth file, baseline2024.csv.gz, provides information about the metadata of PubMed papers. A 2024 version of PubMed was downloaded using Entrez and parsed into a table restricted to records that contain a pmid, a doi, and has a title and an abstract. A value of 1 in columns indicates that the information exists in metadata and a zero indicates otherwise. Code for generating this data: https://github.com/illinois-or-research-analytics/pubmed_etl. If you use these data or code in your work, please cite https://doi.org/10.13012/B2IDB-5216575_V1.

keywords: PubMed

published: 2024-05-23

Park, Manho; Zheng, Zhonghua; Riemer, Nicole; Tessum, Christopher (2024): Data for: Learned 1-D passive scalar advection to accelerate chemical transport modeling: a case study with GEOS-FP horizontal wind fields. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4743181_V1

This dataset contains the training results (model parameters, outputs), datasets for generalization testing, and 2-D implementation used in the article "Learned 1-D passive scalar advection to accelerate chemical transport modeling: a case study with GEOS-FP horizontal wind fields." The article will be submitted to Artificial Intelligence for Earth Systems. The datasets are saved as CSV for 1-D time-series data and *netCDF for 2-D time series dataset. The model parameters are saved in every training epoch tested in the study.

keywords: Air quality modeling; Coarse-graining; GEOS-Chem; Numerical advection; Physics-informed machine learning; Transport operator

published: 2023-11-14

Gotsis, Dimitrios; Kelkar, Varun; Deshpande, Rucha; Brooks, Frank; KC, Prabhat; Myers, Kyle; Zeng, Rongping; Anastasio, Mark (2023): Data for the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2773204_V3

This repository contains the training dataset associated with the 2023 Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics (DGM-Image Challenge), hosted by the American Association of Physicists in Medicine. This dataset contains more than 100,000 8-bit images of size 512x512. These images emulate coronal slices from anthropomorphic breast phantoms adapted from the VICTRE toolchain [1], with assigned X-ray attenuation coefficients relevant for breast computed tomography. Also included are the labels indicating the breast type. The challenge has now concluded. More information about the challenge can be found here: <a href="https://www.aapm.org/GrandChallenge/DGM-Image/">https://www.aapm.org/GrandChallenge/DGM-Image/</a>. * New in V3: we added a CSV file containing the image breast type labels and example images (PNG).

keywords: Deep generative models; breast computed tomography

published: 2022-03-25

Shen, Chengze; Park, Minhyuk; Warnow, Tandy (2022): The 16S.B.ALL dataset in 100-HF condition. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6604429_V1

This upload includes the 16S.B.ALL in 100-HF condition (referred to as 16S.B.ALL-100-HF) used in Experiment 3 of the WITCH paper (currently accepted in principle by the Journal of Computational Biology). 100-HF condition refers to making sequences fragmentary with an average length of 100 bp and a standard deviation of 60 bp. Additionally, we enforced that all fragmentary sequences to have lengths > 50 bp. Thus, the final average length of the fragments is slightly higher than 100 bp (~120 bp). In this case (i.e., 16S.B.ALL-100-HF), 1,000 sequences with lengths 25% around the median length are retained as "backbone sequences", while the remaining sequences are considered "query sequences" and made fragmentary using the "100-HF" procedure. Backbone sequences are aligned using MAGUS (or we extract their reference alignment). Then, the fragmentary versions of the query sequences are added back to the backbone alignment using either MAGUS+UPP or WITCH. More details of the tar.gz file are described in README.txt.

keywords: MAGUS;UPP;Multiple Sequence Alignment;eHMMs

published: 2022-08-08

Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1

This upload contains all datasets used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zip file has the following structure (presented as an example): salma_paper_datasets/ |_README.md |_10aa/ |_crw/ |_homfam/ |_aat/ | |_... |_... |_het/ |_5000M2-het/ | |_... |_5000M3-het/ ... |_rec_res/ Generally, the structure can be viewed as: [category]/[dataset]/[replicate]/[alignment files] # Categories: 1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate. 2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM). 3. homfam: There are the 10 largest Homfam datasets, each with one replicate. 4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates. 5. rec\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper. # Alignment files There are at most 6 `.fasta` files in each sub-directory: 1. `all.unaln.fasta`: All unaligned sequences. 2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included. 3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences). 4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included. 5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences). 6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included. >If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing. >If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing. >If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing. # Additional file(s) 1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.

keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity

published: 2015-12-16

Nguyen, Nam-phuong; Mirarab, Siavash; Kumar, Keerthana; Warnow, Tandy (2015): Data for Ultra-Large Alignments Using Phylogeny-Aware Profiles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3174395_V1

This dataset contains the data for PASTA and UPP. PASTA data was used in the following articles: Mirarab, Siavash, Nam Nguyen, Sheng Guo, Li-San Wang, Junhyong Kim, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.” Journal of Computational Biology 22, no. 5 (2015): 377–86. doi:10.1089/cmb.2014.0156. Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. “PASTA: Ultra-Large Multiple Sequence Alignment.” Edited by Roded Sharan. Research in Computational Molecular Biology, 2014, 177–91. UPP data was used in: Nguyen, Nam-phuong D., Siavash Mirarab, Keerthana Kumar, and Tandy Warnow. “Ultra-Large Alignments Using Phylogeny-Aware Profiles.” Genome Biology 16, no. 1 (December 16, 2015): 124. doi:10.1186/s13059-015-0688-z.

published: 2014-10-29

Nguyen, Nam-phuong; Mirarab, Siavash; Bo, Liu; Pop, Mihai; Warnow, Tandy (2014): Data for Taxonomic Identification and Phylogenetic Profiling. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8783447_V1

This dataset provides the data for Nguyen, Nam-phuong, et al. "TIPP: taxonomic identification and phylogenetic profiling." Bioinformatics 30.24 (2014): 3548-3555.

published: 2019-02-22

Fernández, Roberto; Parker, Gary; Stark, Colin (2019): Experiments on patterns of alluvial cover and bedrock erosion in a meandering channel. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2-3044828_V1

This dataset includes measurements taken during the experiments on patterns of alluvial cover over bedrock. The dataset includes an hour worth of timelapse images taken every 10s for eight different experimental conditions. It also includes the instantaneous water surface elevations measured with eTapes at a frequency of 10Hz for each experiment. The 'Read me Data.txt' file explains in more detail the contents of the dataset.

keywords: bedrock; erosion; alluvial; meandering; alluvial cover; sinuosity; flume; experiments; abrasion;

published: 2024-02-16

Zhang, Mingxiao; Sutton, Bradley (2024): Sample Data for “Measuring CSF Shunt Flow with MRI Using Flow Enhancement of Signal Intensity (FENSI)”. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7252521_V1

Sample data from one typical phantom test and one deidentified shunt patient test (shown in Fig. 8 of the MRM paper), with the corresponding analysis code for the Shunt-FENSI technique. For the MRM paper “Measuring CSF Shunt Flow with MRI Using Flow Enhancement of Signal Intensity (FENSI)”

keywords: Shunt-FENSI; MRM; Hydrocephalus; VP Shunt; Flow Quantification; Pediatric Neurosurgery; Pulse Sequence; Signal Simulation

published: 2016-12-20

Wickes, Elizabeth; Nakamura, Katia (2016): Supporting data processing scripts and example data for Peru AIDData analysis. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7860393_V1

Scripts and example data for AIDData (aiddata.org) processing in support of forthcoming Nakamura dissertation. This dataset includes two sets of scripts and example data files from an aiddata.org data dump. Fuller documentation about the functionality for these scripts is within the readme file. Additional background information and description of usage will be in the forthcoming Nakamura dissertation (link will be added when available). Data originally supplied by Nakamura. Python code and this readme file created by Wickes. Data included within this deposit are examples to demonstrate execution. Roughly, there are two python scripts in here: keyword_search.py, designed to assist in finding records matching specific keywords, and matching_tool.ipynb, designed to assist in detection of which records are and are not contained within a keyword results file and an aiddata project data file.

keywords: aiddata; natural resources

published: 2022-11-11

Hsiao, Haw-Wen; Zuo, Jian-Min (2022): Data for Chemical Short-Range Ordering in a CrCoNi Medium-Entropy Alloy. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4432073_V1

This dataset is for characterizing chemical short-range-ordering in CrCoNi medium entropy alloys. It has three sub-folders: 1. code, 2. sample WQ, 3. sample HT. The software needed to run the files is Gatan Microscopy Suite® (GMS). Please follow the instruction on this page to install the DM3 GMS: <a href="https://www.gatan.com/installation-instructions#Step1">https://www.gatan.com/installation-instructions#Step1</a> 1. Code folder contains three DM scripts to be installed in Gatan DigitalMicrograph software to analyze scanning electron nanobeam diffraction (SEND) dataset: Cepstrum.s: need [EF-SEND_sampleWQ_cropped_aligned.dm3] in Sample WQ and the average image from [EF-SEND_sampleWQ_cropped_aligned.dm3]. Same for Sample HT folder. log_BraggRemoval.s: same as above. Patterson.s: Need refined diffuse patterns in Sample HT folder. 2. Sample WQ and 3. Sample HT folders both contain the SEND data (.ser) and the binned SEND data (.dm3) as well as our calculated strain maps as the strain measurement reference. The Sample WQ folder additionally has atomic resolution STEM images; the Sample HT folder additionally has three refined diffuse patterns as references for diffraction data processing. * Only .ser file is needed to perform the strain measurement using imToolBox as listed in the manuscript. .emi file contains the meta data of the microscope, which can be opened together with .ser file using FEI TIA software.

keywords: Medium entropy alloy; CrCoNi; chemical short-range-ordering; CSRO; TEM

published: 2023-09-13

Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2023): Additional datasets (RNASim10k) for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4194451_V1

This upload contains one additional set of datasets (RNASim10k, ten replicates) used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment". The zipped file has the following structure: 10k |__R0 |__unaln.fas |__true.fas |__true.tre |__R1 ... # Alignment files: 1. `unaln.fas`: all unaligned sequences. 2. `true.fas`: the reference alignment of all sequences. 3. `true.tre`: the reference tree on all sequences. For other datasets that uniquely appeared in EMMA, please refer to the related dataset (which is linked below): Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1

keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity

published: 2017-09-16

Mirarab, Siavash; Warnow, Tandy (2017): Data for 16S and 23S rRNA alignments. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1614388_V1

This dataset contains the data for 16S and 23S rRNA alignments including their reference trees. The original alignments are from the Gutell Lab CRW, currently located at https://crw-site.chemistry.gatech.edu/DAT/3C/Alignment/.

published: 2009-06-19

Liu, Kevin; Raghavan, Sindhu; Nelesen, Serita; Linder, C. Randall; Warnow, Tandy (2009): Data for Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5139418_V1

This dataset contains the data for SATe-I. SATe-I data was used in the following article: K. Liu, S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow, "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.

published: 2023-01-16

Xie, Yuxuan Richard; Chari, Varsha.K; Castro, Daniel.C; Grant, Romans; Rubakhin , Stanislav S. ; Sweedler, Jonathan V. (2023): Data-Driven and Machine Learning Based Framework for Image-Guided Single-Cell Mass Spectrometry. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7302959_V1

Data sets to reproduce the results provided by the tutorial in paper "Data-Driven and Machine Learning Based Framework for Image-Guided Single-Cell Mass Spectrometry"

published: 2024-02-26

Harsh, Vipul; Zhou, Wenxuan; Ashok, Sachin; Mysore, Radhika Niranjan; Godfrey, Brighten; Banerjee, Sujata (2024): Murphy traces. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6641912_V1

Traces created using DeathStarBench (https://github.com/delimitrou/DeathStarBench) benchmark of microservice applications with injected failures on containers. Failures consist of disk/CPU/memory failures.

keywords: Murphy;Performance Diagnosis;Microservice;Failures

published: 2022-10-14

Zhou, Shan; Li, Jiahui; Lu, Jun; Liu, Haihua; Kim, Ji-Young; Kim, Ahyoung; Yao, Lehan; Liu, Chang; Qian, Chang; Hood, Zachary D. ; Lin, Xiaoying; Chen, Wenxiang; Gage, Thomas E. ; Arslan, Ilke; Travesset, Alex; Sun, Kai; Kotov, Nicholas A.; Chen, Qian (2022): Chiral Assemblies of Pinwheel Superlattices on Substrates. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0873473_V1

This dataset is the raw data including SEM, TEM, PINEM images and FDTD simulation as well as pairwise interaction calculation results.

published: 2018-10-03

Das, Anupam; Acar, Gunes; Borisov, Nikita; Pradeep, Amogh (2018): A Crawl of the Mobile Web Measuring Sensor Accesses. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9213932_V1

This dataset is the result of three crawls of the web performed in May 2018. The data contains raw crawl data and instrumentation captured by OpenWPM-Mobile, as well as analysis that identifies which scripts access mobile sensors, which ones perform some of browser fingerprinting, as well as clustering of scripts based on their intended use. The dataset is described in the included README.md file; more details about the methodology can be found in our ACM CCS'18 paper: Anupam Das, Gunes Acar, Nikita Borisov, Amogh Pradeep. The Web's Sixth Sense: A Study of Scripts Accessing Smartphone Sensors. In Proceedings of the 25th ACM Conference on Computer and Communications Security (CCS), Toronto, Canada, October 15–19, 2018. (Forthcoming)

keywords: mobile sensors; web crawls; browser fingerprinting; javascript