Home
Deposit
Find
Policies
Guides
Contact
Log in
Toggle navigation
Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
Displaying 1 - 25 of 77 in total
<
1
2
3
4
>
25 per page
50 per page
Show All
Go
Clear Filters
Generate Report from Search Results
Subject Area
Technology and Engineering (77)
Life Sciences (0)
Social Sciences (0)
Physical Sciences (0)
Uncategorized
Arts and Humanities (0)
Funder
U.S. National Science Foundation (NSF) (32)
Other (25)
U.S. Department of Energy (DOE) (9)
U.S. National Institutes of Health (NIH) (7)
U.S. National Aeronautics and Space Administration (NASA) (1)
U.S. Department of Agriculture (USDA) (0)
Illinois Department of Natural Resources (IDNR) (0)
U.S. Geological Survey (USGS) (0)
Illinois Department of Transportation (IDOT) (0)
U.S. Army (0)
Publication Year
2024 (10)
2017 (9)
2022 (9)
2025 (9)
2021 (8)
2023 (8)
2018 (7)
2019 (6)
2016 (3)
2020 (3)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)
License
CC0 (40)
CC BY (33)
custom (4)
Illinois Data Bank Dataset Search Results
Dataset Search Results
published: 2025-08-07
Vu-Le, The-Anh; Chacko, George; Warnow, Tandy (2025): EC-SBM Benchmark Networks. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-3284069_V1
Dataset generated using the technique described in "EC-SBM synthetic network generator". This contains multiple synthetic networks with ground-truth community structure, which can be used to evaluate community detection methods. Note: * networks.zip contains the synthetic networks
keywords:
network science; synthetic networks; community detection; tsv
published: 2025-08-05
Zhu, Minjiang; Sanders, Derrick M.; Kim, Yun Seong; Shah, Rohan ; Hossain, Mohammad Tanver; Ewoldt, Randy H.; Tawfick, Sameh H.; Geubelle, Philippe H. (2025): Supplemental data for curvature effect in frontal polymerization. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4186044_V1
published: 2019-09-01
Jackson, Nicole; Konar, Megan; Debaere, Peter; Estes, Lyndon (2019): Data for: Probabilistic global maps of crop-specific areas from 1961 to 2014. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-7439710_V1
Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014. The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).
keywords:
global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series
published: 2024-07-29
Caetano Machado Lopes, Lorran; Chacko, George (2024): A Citation Graph from OpenAlex (Works). University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-7362697_V1
This dataset consists of a citation graph. It was constructed by downloading and parsing the Works section of the Open Alex catalog of the global research system. Open Alex (see citation below) contains detailed information about scholarly research, including articles, authors, journals, institutions, and their relationships. The data were downloaded on 2024-07-15. The dataset comprises two compressed (.xz) files. 1) filename: openalexID_integer_id_hasDOI.parquet.xz. The tabular data within contains three columns: openalex_id, integer_id, and hasDOI. Each row represents a record with the following data types: • openalex_id: A unique identifier from the Open Alex catalog. • integer_id: An integer representing the new identifier (assigned by the authors) • hasDOI: An integer (0 or 1) indicating whether the record has a DOI (0 for no, 1 for yes). 2) filename: citation_table.tsv.xz This edgelist of citations has two columns (no header) of integer values that represent citing and cited integer_id, respectively. Summary Features • Total Nodes (Documents): 256,997,006 • Total Edges (citations): 2,148,871,058 • Documents with DOIs: 163,495,446 • Edges between documents with DOIs: 1,936,722,541 The code used to generate these files can be found here: https://github.com/illinois-or-research-analytics/lorran_openalex/
keywords:
citation networks; Open Alex
published: 2024-11-13
Tang, Zhichu; Chen, Wenxiang; Yin, Kaijun; Busch, Robert; Hou, Hanyu; Lin, Oliver; Lyu, Zhiheng; Zhang, Cheng; Yang, Hong; Zuo, Jian-Min ; Chen, Qian (2024): Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-8188066_V1
These datasets are for the four-dimensional scanning transmission electron microscopy (4D-STEM) and electron energy loss spectroscopy (EELS) experiments for cathode nanoparticles at different states. The raw 4D-STEM experiment datasets were collected by TEM image & analysis software (FEI) and were saved as SER files. The raw 4D-STEM datasets of SER files can be opened and viewed in MATLAB using our analysis software package of imToolBox available at https://github.com/flysteven/imToolBox. The raw EELS datasets were collected by DigitalMicrograph software and were saved as DM4 files. The raw EELS datasets can be opened and viewed in DigitalMicrograph software or using our analysis codes available at https://github.com/chenlabUIUC/OrientedPhaseDomain. All the datasets are from the work "Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion" (2024). The 4D-STEM experiment data include four example datasets for cathode nanoparticles collected at pristine and discharged states. Each dataset contains a stack of diffraction patterns collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine untreated nanoparticle: "Pristine U-NP.ser" 2. Pristine 200ºC heated nanoparticle: "Pristine H200-NP.ser" 3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP.ser" 4. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP.ser" The EELS experiment data includes six example datasets for cathode nanoparticles collected at different states (in "EELS datasets.zip") as described below. Each EELS dataset contains the zero-loss and core-loss EELS spectra collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine untreated nanoparticle: "Pristine U-NP EELS.zip" 2. Pristine 200ºC heated nanoparticle: "Prisitne H200-NP EELS.zip" 3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP EELS.zip" 4. Untreated nanoparticle after first charge in Zn-ion batteries: "Charged U-NP EELS.zip" 5. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP EELS.zip" 6. 200ºC heated nanoparticle after first charge in Zn-ion batteries: "Charged H200-NP EELS.zip" The details of the software package and codes that can be used to analyze the 4D-STEM datasets and EELS datasets are available at: https://github.com/chenlabUIUC/OrientedPhaseDomain. Once our paper is formally published, we will update the relationship of these datasets with our paper.
keywords:
4D-STEM; EELS; defects; strain; cathode; nanoparticle; energy storage
published: 2025-02-08
Anne, Lahari; Park, Minhyuk; Warnow, Tandy; Chacko, George (2025): Synthetic Networks For Benchmarking . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-9805305_V1
The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647). The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/). Input Networks: The CEN can be downloaded from the Illinois Data Bank: https://databank.illinois.edu/datasets/IDB-0908742 -> cen_pipeline.tar.gz -> S1_cen_cleaned.tsv The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where a - name of inspirational network, e.g., cit_hepph b - the resolution value used when clustering a with the Leiden algorithm optimizing the Constant Potts Model, e.g., 0.01 c- the RECCS option used to approximate edge count and connectivity in the real world network, e.g., v1 Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14) Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment. Input Network : CEN Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows: cen_<resolution><cm_status><reccs_version>sample<replicate_id>.tsv where: cen – Indicates the network was modeled on the Curated Exosome Network (CEN). resolution – The resolution parameter used in clustering the input network with Leiden-CPM (0.01). cm_status – Either cm (CM-treated input clustering) or no_cm (input clustering without CM treatment). reccs_version – The RECCS version used to generate the synthetic network (v1 or v2). replicate_id – The specific replicate (ranging from 0 to 2 for each configuration). For example: cen_0.01_cm_v1_sample_0.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, CM-treated input, and generated using RECCSv1 (first replicate). cen_0.01_no_cm_v2_sample_1.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, without CM treatment, and generated using RECCSv2 (second replicate). The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz.
keywords:
Community Detection; Synthetic Networks; Stochastic Block Model (SBM);
published: 2025-03-28
Brooks, Frank (2025): Realizations from Stochastic Image Models of Some Features Seen in Fluorescence Microscopy. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2642688_V1
8-bit RGB realizations of a stochastic image model (SIM) of the **kinds** of things seen in fluorescence microscopy of biological samples. Note that no attempt was made to model a particular tissue, sample, or microscope. Distinct image features are seen in each color channel. The first public mention of these SIMs is in "Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure" by Frank Brooks and Rucha Deshpande. Manuscript on ArXiv and submitted for publication.
keywords:
image models; fluorescence microscopy; training data; image-to-image translation; generative model evaluation
published: 2025-01-27
Shen, Chengze; Wedell, Eleanor; Pop, Mihai; Warnow, Tandy (2025): TIPP3 Benchmark Data and Simulated Reads. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-5467027_V1
The zip file contains the benchmark data used for the TIPP3 simulation study. See the README file for more information.
keywords:
TIPP3;abundance profile;reference database;taxonomic identification;simulation
published: 2025-07-12
Xiang, Jingyi; Dinkel, Holly; Zhao, Harry; Gao, Naixiang; Coltin, Brian; Smith, Trey; Bretl, Timothy (2025): Data for TrackDLO: Tracking Deformable Linear Objects Under Occlusion with Motion Coherence. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2916472_V1
The TrackDLO data release supports the paper, "TrackDLO: Tracking Deformable Linear Objects Under Occlusion with Motion Coherence," published in Robotics and Automation: Letters. The TrackDLO data release includes the raw image and depth data for tracking Deformable Linear Objects (DLOs) under tip occlusion, large-scale mid-section occlusion, and self-occlusion. The released data are Robot Operating System (ROS1) bag files containing raw color images and point clouds. The data were collected using a static Intel Realsense d-435 RGB-D camera while DLOs in the field of view of the camera were manipulated. The data can be used to benchmark the performance of future vision-only DLO tracking algorithms in several manipulation scenarios relevant to DLOs and to verify existing vision-only DLO tracking algorithms. Please see the RA-L paper, the code repository on GitHub, the conference presentation, and the supplementary demonstration video for more information.
keywords:
rosbag; perception for grasping and manipulation; RGBD perception; visual tracking; deformable linear objects; robotic manipulation
published: 2025-07-11
Xiang, Jingyi; Dinkel, Holly (2025): Data for MultiDLO: Simultaneous Shape Tracking of Multiple Deformable Linear Objects with Global-Local Topology Preservation. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-6432640_V1
The MultiDLO data release supports the paper, "MultiDLO: Simultaneous Shape Tracking of Multiple Deformable Linear Objects with Global-Local Topology Preservation," presented in the IEEE International Conference on Robotics and Automation Workshop on Representing and Manipulating Deformable Objects in May 2023. The data release includes the raw image and depth data for simultaneously tracking multiple Deformable Linear Objects (DLOs). The released data are Robot Operating System (ROS1) bag files containing raw color images and point clouds. The data were collected using a static Intel Realsense d-435 RGB-D camera while DLOs in the field of view of the camera were manipulated. The data can be used to benchmark the performance of future DLO tracking or prediction algorithms in two manipulation scenarios relevant to DLOs and to verify existing DLO tracking algorithms. Please see the accompanying extended abstract, the code repository on GitHub, and the conference presentation video referenced in the `multidlo_data_release.pdf` document for more information.
keywords:
rosbag; perception for grasping and manipulation; RGBD perception; visual tracking; deformable linear objects; robotic manipulation
published: 2016-05-19
Donovan, Brian; Work, Dan (2016): New York City Taxi Trip Data (2010-2013). University of Illinois Urbana-Champaign. https://doi.org/10.13012/J8PN93H8
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords:
taxi;transportation;New York City;GPS
published: 2019-10-27
Snyder, Corey; Do, Minh (2019): Data for STREETS: A Novel Camera Network Dataset for Traffic Flow. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-3671567_V1
This dataset accompanies the paper "STREETS: A Novel Camera Network Dataset for Traffic Flow" at Neural Information Processing Systems (NeurIPS) 2019. Included are: *Over four million still images form publicly accessible cameras in Lake County, IL. The images were collected across 2.5 months in 2018 and 2019. *Directed graphs describing the camera network structure in two communities in Lake County. *Documented non-recurring traffic incidents in Lake County coinciding with the 2018 data. *Traffic counts for each day of images in the dataset. These counts track the volume of traffic in each community. *Other annotations and files useful for computer vision systems. Refer to the accompanying "readme.txt" or "readme.pdf" for further details.
keywords:
camera network; suburban vehicular traffic; roadways; computer vision
published: 2025-04-21
Shen, Chengze; Wedell, Eleanor; Warnow, Tandy (2025): TIPP3 Reference Package for Abundance Profiling. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4931852_V2
#Overview These are reference packages for the TIPP3 software for abundance profiling and/or species detection from metagenomic reads (e.g., Illumina, PacBio, Nanopore, etc.). Different refpkg versions are listed. TIPP3 software: https://github.com/c5shen/TIPP3 #Changelog V1.2 (`tipp3-refpkg-1-2.zip`) >>Fixed old typos in the file mapping text. >>Added new files `taxonomy/species_to_marker.tsv` for new function `run_tipp3.py detection [...parameters]`. Please use the latest release of the TIPP3 software for this new function. V1 (`tipp3-refpkg.zip`) >>Initial release of the TIPP3 reference package. #Usage 1. unzip the file to a local directory (will get a folder named "tipp3-refpkg"). 2. use with TIPP3 software: `run_tipp3.py -r [path/to/tipp3-refpkg] [other parameters]`
keywords:
TIPP3; abundance profile; reference database; taxonomic identification
published: 2020-08-22
Qiu, Haoran; Banerjee, Subho S.; Jha, Saurabh; Kalbarczyk, Zbigniew T.; Iyer, Ravishankar K. (2020): Pre-processed Tracing Data for Popular Microservice Benchmarks. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-6738796_V1
We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Kubernetes cluster consisting of 15 heterogeneous nodes. The dataset is not sampled and is from selected types of requests in each benchmark, i.e., compose-posts in the social network application, compose-reviews in the media service application, book-rooms in the hotel reservation application, and reserve-tickets in the train ticket booking application. The four microservice applications come from [DeathStarBench](https://github.com/delimitrou/DeathStarBench) and [Train-Ticket](https://github.com/FudanSELab/train-ticket). The performance anomaly injector is from [FIRM](https://gitlab.engr.illinois.edu/DEPEND/firm.git). The dataset was preprocessed from the raw data generated in FIRM's tracing system. The dataset is separated by on which microservice component is the performance anomaly located (as the file name suggests). Each dataset is in CSV format and fields are separated by commas. Each line consists of the tracing ID and the duration (in 10^(-3) ms) of each component. Execution paths are specified in `execution_paths.txt` in each directory.
keywords:
Microservices; Tracing; Performance
published: 2025-03-05
Li, Fu; Villa, Umberto; Park, Seonyeong; Jeong, Gangwon; Anastasio, Mark A. (2025): 2D Acoustic Numerical Breast Phantoms for Ultrasound Computed Tomography. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-5648161_V1
References - Li, Fu, Umberto Villa, Seonyeong Park, and Mark A. Anastasio. "3-D stochastic numerical breast phantoms for enabling virtual imaging trials of ultrasound computed tomography." IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 69, no. 1 (2021): 135-146. DOI: 10.1109/TUFFC.2021.3112544 - Li, Fu; Villa, Umberto; Park, Seonyeong; Anastasio, Mark, 2021, "2D Acoustic Numerical Breast Phantoms and USCT Measurement Data", https://doi.org/10.7910/DVN/CUFVKE, Harvard Dataverse, V1 Overview - This dataset includes 1,089 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for ultrasound computed tomography (USCT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in USCT studies are described in the publication cited above. - The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories: > Type A - The breast is almost entirely fatty > Type B - There are scattered areas of fibroglandular density in the breast > Type C - The breast is heterogeneously dense > Type D - The breast is extremely dense - Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom. File Name Format - Each data file is stored as an HDF5 .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID. File Contents - Each file contains the following variables: > "type": Breast type > "sos": Speed-of-sound map [mm/μs] > "den": Ambient density map [kg/mm³] > "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm] > "y": power-law exponent > "label": Tissue label map. Tissue types are denoted using the following labels: water (0), fat (1), skin (2), glandular tissue (29), ligament (88), lesion (200). - All spatial maps ("sos", "den", "att", and "label") have the same spatial dimensions of 2560 x 2560 pixels, with a pixel size of 0.1 mm x 0.1 mm. - "sos", "den", and "att" are float32 arrays, and "label" is an 8-bit unsigned integer array.
keywords:
Medical imaging; Ultrasound computed tomography; Numerical phantom
published: 2024-10-31
Liu, Shanshan; Vlachokostas, Alex; Kontou, Eleftheria (2024): Data for Resilience and environmental benefits of electric school buses as backup power for educational functions continuation during outages. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4925630_V1
School buses transport 20 million students annually and are currently undergoing electrification in the US. With Vehicle-to-Building (V2B) technology, electric school buses (ESBs) can supply energy to school buildings during power outages, ensuring continued operation and safety. This study proposes assessing the resilience of secondary schools during outages by leveraging ESB fleets as backup power across various US climate regions. The findings indicate that the current fleet of ESBs in representative cities across different climate regions in the US is insufficient to meet the power demands of an entire school or even its HVAC system. However, we estimated the number of ESBs required to support the school's power needs, and we showed that the use of V2B technology significantly reduces carbon emissions compared to backup diesel generators. While adjusting HVAC setpoints and installing solar panels have limited impacts on enhancing school resilience, gathering students in classrooms during outages significantly improved resilience in our case study in Houston, Texas. Given the ongoing electrification of school buses, it is essential for schools to complement ESBs with stationary batteries and other backup power sources, such as solar and/or diesel generators, to effectively address prolonged outages. Determining the deployment of direct current fast and Level 2 chargers can reduce infrastructure costs while maintaining the resilience benefits of ESBs. This dataset includes the simulation process and results of this study.
keywords:
Electric school bus; Power outages,;Vehicle-to-Building technology; Carbon emission reduction; Backup power source
published: 2023-10-22
Davidson, Ruth; Vachaspati, Pranjal; Mirarab, Siavash; Warnow, Tandy (2023): Data from: Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6670066_V1
HGT+ILS datasets from Davidson, R., Vachaspati, P., Mirarab, S., & Warnow, T. (2015). Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC genomics, 16(10), 1-12. Contains model species trees, true and estimated gene trees, and simulated alignments.
keywords:
evolution; computational biology; bioinformatics; phylogenetics
published: 2022-09-29
Levine, Nathaniel (2022): 3DIFICE: A Synthetic Dataset for Training Computer Vision Algorithms to Recognize Earthquake Damage to Reinforced Concrete Structures. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6415287_V1
3DIFICE: 3-dimensional Damage Imposed on Frame structures for Investigating Computer vision-based Evaluation methods This dataset contains 1,396 synthetic images and label maps with various types of earthquake damage imposed on reinforced concrete frame structures. Damage includes: cracking, spalling, exposed transverse rebar, and exposed longitudinal rebar. Each image has an associated label map that can be used for training machine learning algorithms to recognize the various types of damage.
keywords:
computer vision; earthquake engineering; structural health monitoring; civil engineering; structural engineering;
published: 2022-04-29
Wedell, Eleanor; Warnow, Tandy (2022): Biological and Simulated datasets for testing the SCAMPP framework for phylogenetic placement methods. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9257957_V1
Thank you for using these datasets! These files contain trees and reference alignments, as well as the selected query sequences for testing phylogenetic placement methods against and within the SCAMPP framework. There are four datasets from three different sources, each containing their source alignment and "true" tree, any estimated trees that may have been generated, and any re-estimated branch lengths that were created to be used with their requisite phylogenetic placement method. Three biological datasets (16S.B.ALL, PEWO/LTP_s128_SSU, and PEWO/green85) and one simulated dataset (nt78) is contained. See README.txt in each file for more information.
keywords:
Phylogenetic Placement; Phylogenetics; Maximum Likelihood; pplacer; EPA-ng
published: 2021-11-18
Pan, Chao; Tabatabaei, S Kasra; Tabatabaei Yazdi, S. M. Hossein; Hernandez, Alvaro; Schroeder, Charles; Milenkovic, Olgica (2021): Rewritable Two-Dimensional DNA-Based Data Storage System (2DDNA) Sequencing Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2308557_V1
This dataset contains sequencing data obtained from Illumina MiSeq device to prove the concept of the proposed 2DDNA framework. Please refer to README.txt for detailed description of each file.
keywords:
machine learning;image processing;computer vision;rewritable storage system;2D DNA-based data storage
published: 2023-06-01
Pan, Chao; Peng, Jianhao; Chien, Eli; Milenkovic, Olgica (2023): Embedded dataset in Poincare Balls. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6901251_V1
This dataset contains four real-world sub-datasets with data embedded into Poincare ball models, including Olsson's single-cell RNA expression data, CIFAR10, Fashion-MNIST and mini-ImageNet. Each sub-dataset has two corresponding files: one is the data file, the other one is the pre-computed reference points for each class in the sub-dataset. Please refer to our paper (https://arxiv.org/pdf/2109.03781.pdf) and codes (https://github.com/thupchnsky/PoincareLinearClassification) for more details.
keywords:
Hyperbolic space; Machine learning; Poincare ball models; Perceptron algorithm; Support vector machine
published: 2021-06-14
Kelkar, Varun A.; Anastasio, Mark A. (2021): StyleGAN2 trained on MR brain and face images: network weights. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4499850_V1
This repository contains the weights for two StyleGAN2 networks trained on two composite T1 and T2 weighted open-source brain MR image datasets, and one StyleGAN2 network trained on the Flickr Face HQ image dataset. Example images sampled from the respective StyleGANs are also included. The datasets themselves are not included in this repository. The weights are stored as `.pkl` files. The code and instructions to load and use the weights can be found at https://github.com/comp-imaging-sci/pic-recon . Additional details and citations can be found in the file "README.md".
keywords:
StyleGAN2; Generative adversarial network (GAN); MRI; Medical imaging
published: 2020-07-01
Rykhlevskii, Andrei; Huff, Kathryn D. (2020): SaltProc output for TAP MSR and MSBR online reprocessing depletion simulations. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7364919_V1
keywords:
molten salt; fuel cycle; reprocessing; refueling
published: 2021-10-11
Peng, Jianhao; Ochoa, Idoia (2021): ClonalKinetic Data and Intermediate Results of SimiC. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3975180_V1
This dataset contains the ClonalKinetic dataset that was used in SimiC and its intermediate results for comparison. The Detail description can be found in the text file 'clonalKinetics_Example_data_description.txt' and 'ClonalKinetics_filtered.DF_data_description.txt'. The required input data for SimiC contains: 1. ClonalKinetics_filtered.clustAssign.txt => cluster assignment for each cell. 2. ClonalKinetics_filtered.DF.pickle => filtered scRNAseq matrix. 3. ClonalKinetics_filtered.TFs.pickle => list of driver genes. The results after running SimiC contains: 1. ClonalKinetics_filtered_L10.01_L20.01_Ws.pickle => inferred GRNs for each cluster 2. ClonalKinetics_filtered_L10.01_L20.01_AUCs.pickle => regulon activity scores for each cell and each driver gene. <b>NOTE:</b> “ClonalKinetics_filtered.rds” file which is mentioned in “ClonalKinetics_filtered.DF_data_description.txt” is an intermediate file and the authors have put all the processed in the pickle/txt file as described in the filtered data text.
keywords:
GRNs;SimiC;RDS;ClonalKinetic
published: 2024-01-01
Christensen, Jacob; Bettler, Simon; Qu, Kejian; Huang, Jeffrey; Kim, Soyeun; Lu, Yinchuan; Zhao, Chengxi; Chen, Jin; Krogstad, Matthew; Woods, Toby; Mahmood, Fahad; Huang, Pinshane; Abbamonte, Peter; Shoemaker, Daniel (2024): Data for Disorder and diffuse scattering in single-chirality (TaSe4)2I crystals. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5148684_V1
Contains scattering data obtained for (TaSe4)2I at the Advanced Photon Source at Argonne National Laboratory. Beamline 6ID-D was used with a beam energy of 64.8 keV in a transmission geometry. Data was obtained at temperatures between 28 and 300 K. See the readme.txt file for more information.
keywords:
X-ray diffraction