Illinois Data Bank
Welcome
Log in
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Displaying 1 - 25 of 78 in total
<
1
2
3
4
>
25 per page
50 per page
Show All
Go
Clear Filters
Generate Report from Search Results
Subject Area
Technology and Engineering (78)
Life Sciences (0)
Social Sciences (0)
Physical Sciences (0)
Uncategorized
Arts and Humanities (0)
Funder
U.S. National Science Foundation (NSF) (33)
Other (26)
U.S. Department of Energy (DOE) (9)
U.S. National Institutes of Health (NIH) (7)
U.S. National Aeronautics and Space Administration (NASA) (1)
U.S. Department of Agriculture (USDA) (0)
Illinois Department of Natural Resources (IDNR) (0)
U.S. Geological Survey (USGS) (0)
Illinois Department of Transportation (IDOT) (0)
U.S. Army (0)
Publication Year
2024 (10)
2025 (10)
2017 (9)
2022 (9)
2021 (8)
2023 (8)
2018 (7)
2019 (6)
2016 (3)
2020 (3)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)
License
CC0 (40)
CC BY (34)
custom (4)
Illinois Data Bank Dataset Search Results
Dataset Search Results
published: 2024-06-04
Park, Minhyuk; Tabatabaee, Yasamin; Warnow, Tandy; Chacko, George (2024): Data for Well-Connectedness and Community Detection. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-6271968_V1
This dataset contains files and relevant metadata for real-world and synthetic LFR networks used in the manuscript "Well-Connectedness and Community Detection (2024) Park et al. presently under review at PLOS Complex Systems. The manuscript is an extended version of Park, M. et al. (2024). Identifying Well-Connected Communities in Real-World and Synthetic Networks. In Complex Networks & Their Applications XII. COMPLEX NETWORKS 2023. Studies in Computational Intelligence, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-031-53499-7_1. “The Overview of Real-World Networks image provides high-level information about the seven real-world networks. TSVs of the seven real-world networks are provided as [network-name]_cleaned to indicate that duplicated edges and self-loops were removed, where column 1 is source and column 2 is target. LFR datasets are contained within the zipped file. Real-world networks are labeled _cleaned_ to indicate that duplicate edges and self loops were removed. #LFR datasets for the Connectivity Modifier (CM) paper ### File organization Each directory `[network-name]_[resolution-value]_lfr` includes the following files: * `network.dat`: LFR network edge-list * `community.dat`: LFR ground-truth communities * `time_seed.dat`: time seed used in the LFR software * `statistics.dat`: statistics generated by the LFR software * `cmd.stat`: command used to run the LFR software as well as time and memory usage information
published: 2023-03-16
Park, Minhyuk; Tabatabaee, Yasamin; Warnow, Tandy; Chacko, George (2023): Data For Well-Connected Communities In Real Networks.. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-0908742_V1
Curated networks and clustering output from the manuscript: Well-Connected Communities in Real-World Networks https://arxiv.org/abs/2303.02813
keywords:
Community detection; clustering; open citations; scientometrics; bibliometrics
published: 2024-02-16
Mohasel Arjomandi, Hossein; Korobskiy, Dmitriy; Chacko, George (2024): Parsed Open Citations and PubMed Data. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-5216575_V1
This dataset contains five files. (i) open_citations_jan2024_pub_ids.csv.gz, open_citations_jan2024_iid_el.csv.gz, open_citations_jan2024_el.csv.gz, and open_citation_jan2024_pubs.csv.gz represent a conversion of Open Citations to an edge list using integer ids assigned by us. The integer ids can be mapped to omids, pmids, and dois using the open_citation_jan2024_pubs.csv and open_citations_jan2024_pub_ids.scv files. The network consists of 121,052,490 nodes and 1,962,840,983 edges. Code for generating these data can be found https://github.com/chackoge/ERNIE_Plus/tree/master/OpenCitations. (ii) The fifth file, baseline2024.csv.gz, provides information about the metadata of PubMed papers. A 2024 version of PubMed was downloaded using Entrez and parsed into a table restricted to records that contain a pmid, a doi, and has a title and an abstract. A value of 1 in columns indicates that the information exists in metadata and a zero indicates otherwise. Code for generating this data: https://github.com/illinois-or-research-analytics/pubmed_etl. If you use these data or code in your work, please cite https://doi.org/10.13012/B2IDB-5216575_V1.
keywords:
PubMed
published: 2024-07-29
Caetano Machado Lopes, Lorran; Chacko, George (2024): A Citation Graph from OpenAlex (Works). University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-7362697_V1
This dataset consists of a citation graph. It was constructed by downloading and parsing the Works section of the Open Alex catalog of the global research system. Open Alex (see citation below) contains detailed information about scholarly research, including articles, authors, journals, institutions, and their relationships. The data were downloaded on 2024-07-15. The dataset comprises two compressed (.xz) files. 1) filename: openalexID_integer_id_hasDOI.parquet.xz. The tabular data within contains three columns: openalex_id, integer_id, and hasDOI. Each row represents a record with the following data types: • openalex_id: A unique identifier from the Open Alex catalog. • integer_id: An integer representing the new identifier (assigned by the authors) • hasDOI: An integer (0 or 1) indicating whether the record has a DOI (0 for no, 1 for yes). 2) filename: citation_table.tsv.xz This edgelist of citations has two columns (no header) of integer values that represent citing and cited integer_id, respectively. Summary Features • Total Nodes (Documents): 256,997,006 • Total Edges (citations): 2,148,871,058 • Documents with DOIs: 163,495,446 • Edges between documents with DOIs: 1,936,722,541 The code used to generate these files can be found here: https://github.com/illinois-or-research-analytics/lorran_openalex/
keywords:
citation networks; Open Alex
published: 2025-08-16
Park, Minhyuk; Lamy, João AC; Rodrigues, Esther CC; Ferreira, Felipe Mariano; Vu-Le, The-Anh; Warnow, Tandy; Chacko, George (2025): Data from development and evaluation of SASCA-s: Scalable Agent-based Simulator for Citation Analysis with simulation. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-3926377_V1
The data within consist of compressed output files in the form of edgelists (*.edgelist.gz) and nodelists (*.aux.parquet) from large citation network simulations using an agent-based model. The code and instructions are available at: <a href="https://github.com/illinois-or-research-analytics/SASCA">https://github.com/illinois-or-research-analytics/SASCA</a>. In addition, we provide a distribution of citation frequencies drawn from a random sample of PubMed journal articles (pooled_50k_pubmed_unique.csv) and a table of recencies- the frequency with which citations are made to the previous year, the year before that and so on (recency_probs_percent_stahl_filled.csv). A manuscript describing the SASCA-s simulator has been submitted for review and will be referenced in a future version of this data repository if it is accepted. The prefixes sj and er refer to the real world and Erdos-Renyi random graph respectively that were used to initiate simulations. These 'seed' networks are available from the Github site referenced above.
keywords:
benchmark networks; agent-based models; simulation; citation
published: 2025-08-07
Vu-Le, The-Anh; Chacko, George; Warnow, Tandy (2025): EC-SBM Benchmark Networks. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-3284069_V1
Dataset generated using the technique described in "EC-SBM synthetic network generator". This contains multiple synthetic networks with ground-truth community structure, which can be used to evaluate community detection methods. Note: * networks.zip contains the synthetic networks
keywords:
network science; synthetic networks; community detection; tsv
published: 2016-05-19
Donovan, Brian; Work, Dan (2016): New York City Taxi Trip Data (2010-2013). University of Illinois Urbana-Champaign. https://doi.org/10.13012/J8PN93H8
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords:
taxi;transportation;New York City;GPS
published: 2025-08-05
Zhu, Minjiang; Sanders, Derrick M.; Kim, Yun Seong; Shah, Rohan ; Hossain, Mohammad Tanver; Ewoldt, Randy H.; Tawfick, Sameh H.; Geubelle, Philippe H. (2025): Supplemental data for curvature effect in frontal polymerization. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4186044_V1
published: 2019-09-01
Jackson, Nicole; Konar, Megan; Debaere, Peter; Estes, Lyndon (2019): Data for: Probabilistic global maps of crop-specific areas from 1961 to 2014. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-7439710_V1
Agriculture has substantial socioeconomic and environmental impacts that vary between crops. However, information on how the spatial distribution of specific crops has changed over time across the globe is relatively sparse. We introduce the Probabilistic Cropland Allocation Model (PCAM), a novel algorithm to estimate where specific crops have likely been grown over time. Specifically, PCAM downscales annual and national-scale data on the crop-specific area harvested of 17 major crops to a global 0.5-degree grid from 1961-2014. The resulting database presented here provides annual global gridded likelihood estimates of crop-specific areas. Both mean and standard deviations of grid cell fractions are available for each of the 17 crops. Each netCDF file contains an individual year of data with an additional variable ("crs") that defines the coordinate reference system used. Our results provide new insights into the likely changes in the spatial distribution of major crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2019) in Environmental Research Letters (https://doi.org/10.1088/1748-9326/ab3b93).
keywords:
global; gridded; probabilistic allocation; crop suitability; agricultural geography; time series
published: 2024-11-13
Tang, Zhichu; Chen, Wenxiang; Yin, Kaijun; Busch, Robert; Hou, Hanyu; Lin, Oliver; Lyu, Zhiheng; Zhang, Cheng; Yang, Hong; Zuo, Jian-Min ; Chen, Qian (2024): Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-8188066_V1
These datasets are for the four-dimensional scanning transmission electron microscopy (4D-STEM) and electron energy loss spectroscopy (EELS) experiments for cathode nanoparticles at different states. The raw 4D-STEM experiment datasets were collected by TEM image & analysis software (FEI) and were saved as SER files. The raw 4D-STEM datasets of SER files can be opened and viewed in MATLAB using our analysis software package of imToolBox available at https://github.com/flysteven/imToolBox. The raw EELS datasets were collected by DigitalMicrograph software and were saved as DM4 files. The raw EELS datasets can be opened and viewed in DigitalMicrograph software or using our analysis codes available at https://github.com/chenlabUIUC/OrientedPhaseDomain. All the datasets are from the work "Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion" (2024). The 4D-STEM experiment data include four example datasets for cathode nanoparticles collected at pristine and discharged states. Each dataset contains a stack of diffraction patterns collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine untreated nanoparticle: "Pristine U-NP.ser" 2. Pristine 200ºC heated nanoparticle: "Pristine H200-NP.ser" 3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP.ser" 4. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP.ser" The EELS experiment data includes six example datasets for cathode nanoparticles collected at different states (in "EELS datasets.zip") as described below. Each EELS dataset contains the zero-loss and core-loss EELS spectra collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine untreated nanoparticle: "Pristine U-NP EELS.zip" 2. Pristine 200ºC heated nanoparticle: "Prisitne H200-NP EELS.zip" 3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP EELS.zip" 4. Untreated nanoparticle after first charge in Zn-ion batteries: "Charged U-NP EELS.zip" 5. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP EELS.zip" 6. 200ºC heated nanoparticle after first charge in Zn-ion batteries: "Charged H200-NP EELS.zip" The details of the software package and codes that can be used to analyze the 4D-STEM datasets and EELS datasets are available at: https://github.com/chenlabUIUC/OrientedPhaseDomain. Once our paper is formally published, we will update the relationship of these datasets with our paper.
keywords:
4D-STEM; EELS; defects; strain; cathode; nanoparticle; energy storage
published: 2025-02-08
Anne, Lahari; Park, Minhyuk; Warnow, Tandy; Chacko, George (2025): Synthetic Networks For Benchmarking . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-9805305_V1
The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647). The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/). Input Networks: The CEN can be downloaded from the Illinois Data Bank: https://databank.illinois.edu/datasets/IDB-0908742 -> cen_pipeline.tar.gz -> S1_cen_cleaned.tsv The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where a - name of inspirational network, e.g., cit_hepph b - the resolution value used when clustering a with the Leiden algorithm optimizing the Constant Potts Model, e.g., 0.01 c- the RECCS option used to approximate edge count and connectivity in the real world network, e.g., v1 Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14) Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment. Input Network : CEN Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows: cen_<resolution><cm_status><reccs_version>sample<replicate_id>.tsv where: cen – Indicates the network was modeled on the Curated Exosome Network (CEN). resolution – The resolution parameter used in clustering the input network with Leiden-CPM (0.01). cm_status – Either cm (CM-treated input clustering) or no_cm (input clustering without CM treatment). reccs_version – The RECCS version used to generate the synthetic network (v1 or v2). replicate_id – The specific replicate (ranging from 0 to 2 for each configuration). For example: cen_0.01_cm_v1_sample_0.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, CM-treated input, and generated using RECCSv1 (first replicate). cen_0.01_no_cm_v2_sample_1.tsv – A synthetic network based on CEN with Leiden-CPM clustering at resolution 0.01, without CM treatment, and generated using RECCSv2 (second replicate). The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz.
keywords:
Community Detection; Synthetic Networks; Stochastic Block Model (SBM);
published: 2025-03-28
Brooks, Frank (2025): Realizations from Stochastic Image Models of Some Features Seen in Fluorescence Microscopy. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2642688_V1
8-bit RGB realizations of a stochastic image model (SIM) of the **kinds** of things seen in fluorescence microscopy of biological samples. Note that no attempt was made to model a particular tissue, sample, or microscope. Distinct image features are seen in each color channel. The first public mention of these SIMs is in "Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure" by Frank Brooks and Rucha Deshpande. Manuscript on ArXiv and submitted for publication.
keywords:
image models; fluorescence microscopy; training data; image-to-image translation; generative model evaluation
published: 2025-01-27
Shen, Chengze; Wedell, Eleanor; Pop, Mihai; Warnow, Tandy (2025): TIPP3 Benchmark Data and Simulated Reads. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-5467027_V1
The zip file contains the benchmark data used for the TIPP3 simulation study. See the README file for more information.
keywords:
TIPP3;abundance profile;reference database;taxonomic identification;simulation
published: 2025-07-12
Xiang, Jingyi; Dinkel, Holly; Zhao, Harry; Gao, Naixiang; Coltin, Brian; Smith, Trey; Bretl, Timothy (2025): Data for TrackDLO: Tracking Deformable Linear Objects Under Occlusion with Motion Coherence. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2916472_V1
The TrackDLO data release supports the paper, "TrackDLO: Tracking Deformable Linear Objects Under Occlusion with Motion Coherence," published in Robotics and Automation: Letters. The TrackDLO data release includes the raw image and depth data for tracking Deformable Linear Objects (DLOs) under tip occlusion, large-scale mid-section occlusion, and self-occlusion. The released data are Robot Operating System (ROS1) bag files containing raw color images and point clouds. The data were collected using a static Intel Realsense d-435 RGB-D camera while DLOs in the field of view of the camera were manipulated. The data can be used to benchmark the performance of future vision-only DLO tracking algorithms in several manipulation scenarios relevant to DLOs and to verify existing vision-only DLO tracking algorithms. Please see the RA-L paper, the code repository on GitHub, the conference presentation, and the supplementary demonstration video for more information.
keywords:
rosbag; perception for grasping and manipulation; RGBD perception; visual tracking; deformable linear objects; robotic manipulation
published: 2025-07-11
Xiang, Jingyi; Dinkel, Holly (2025): Data for MultiDLO: Simultaneous Shape Tracking of Multiple Deformable Linear Objects with Global-Local Topology Preservation. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-6432640_V1
The MultiDLO data release supports the paper, "MultiDLO: Simultaneous Shape Tracking of Multiple Deformable Linear Objects with Global-Local Topology Preservation," presented in the IEEE International Conference on Robotics and Automation Workshop on Representing and Manipulating Deformable Objects in May 2023. The data release includes the raw image and depth data for simultaneously tracking multiple Deformable Linear Objects (DLOs). The released data are Robot Operating System (ROS1) bag files containing raw color images and point clouds. The data were collected using a static Intel Realsense d-435 RGB-D camera while DLOs in the field of view of the camera were manipulated. The data can be used to benchmark the performance of future DLO tracking or prediction algorithms in two manipulation scenarios relevant to DLOs and to verify existing DLO tracking algorithms. Please see the accompanying extended abstract, the code repository on GitHub, and the conference presentation video referenced in the `multidlo_data_release.pdf` document for more information.
keywords:
rosbag; perception for grasping and manipulation; RGBD perception; visual tracking; deformable linear objects; robotic manipulation
published: 2019-10-27
Snyder, Corey; Do, Minh (2019): Data for STREETS: A Novel Camera Network Dataset for Traffic Flow. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-3671567_V1
This dataset accompanies the paper "STREETS: A Novel Camera Network Dataset for Traffic Flow" at Neural Information Processing Systems (NeurIPS) 2019. Included are: *Over four million still images form publicly accessible cameras in Lake County, IL. The images were collected across 2.5 months in 2018 and 2019. *Directed graphs describing the camera network structure in two communities in Lake County. *Documented non-recurring traffic incidents in Lake County coinciding with the 2018 data. *Traffic counts for each day of images in the dataset. These counts track the volume of traffic in each community. *Other annotations and files useful for computer vision systems. Refer to the accompanying "readme.txt" or "readme.pdf" for further details.
keywords:
camera network; suburban vehicular traffic; roadways; computer vision
published: 2025-04-21
Shen, Chengze; Wedell, Eleanor; Warnow, Tandy (2025): TIPP3 Reference Package for Abundance Profiling. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4931852_V2
#Overview These are reference packages for the TIPP3 software for abundance profiling and/or species detection from metagenomic reads (e.g., Illumina, PacBio, Nanopore, etc.). Different refpkg versions are listed. TIPP3 software: https://github.com/c5shen/TIPP3 #Changelog V1.2 (`tipp3-refpkg-1-2.zip`) >>Fixed old typos in the file mapping text. >>Added new files `taxonomy/species_to_marker.tsv` for new function `run_tipp3.py detection [...parameters]`. Please use the latest release of the TIPP3 software for this new function. V1 (`tipp3-refpkg.zip`) >>Initial release of the TIPP3 reference package. #Usage 1. unzip the file to a local directory (will get a folder named "tipp3-refpkg"). 2. use with TIPP3 software: `run_tipp3.py -r [path/to/tipp3-refpkg] [other parameters]`
keywords:
TIPP3; abundance profile; reference database; taxonomic identification
published: 2020-08-22
Qiu, Haoran; Banerjee, Subho S.; Jha, Saurabh; Kalbarczyk, Zbigniew T.; Iyer, Ravishankar K. (2020): Pre-processed Tracing Data for Popular Microservice Benchmarks. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-6738796_V1
We are releasing the tracing dataset of four microservice benchmarks deployed on our dedicated Kubernetes cluster consisting of 15 heterogeneous nodes. The dataset is not sampled and is from selected types of requests in each benchmark, i.e., compose-posts in the social network application, compose-reviews in the media service application, book-rooms in the hotel reservation application, and reserve-tickets in the train ticket booking application. The four microservice applications come from [DeathStarBench](https://github.com/delimitrou/DeathStarBench) and [Train-Ticket](https://github.com/FudanSELab/train-ticket). The performance anomaly injector is from [FIRM](https://gitlab.engr.illinois.edu/DEPEND/firm.git). The dataset was preprocessed from the raw data generated in FIRM's tracing system. The dataset is separated by on which microservice component is the performance anomaly located (as the file name suggests). Each dataset is in CSV format and fields are separated by commas. Each line consists of the tracing ID and the duration (in 10^(-3) ms) of each component. Execution paths are specified in `execution_paths.txt` in each directory.
keywords:
Microservices; Tracing; Performance
published: 2025-03-05
Li, Fu; Villa, Umberto; Park, Seonyeong; Jeong, Gangwon; Anastasio, Mark A. (2025): 2D Acoustic Numerical Breast Phantoms for Ultrasound Computed Tomography. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-5648161_V1
References - Li, Fu, Umberto Villa, Seonyeong Park, and Mark A. Anastasio. "3-D stochastic numerical breast phantoms for enabling virtual imaging trials of ultrasound computed tomography." IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 69, no. 1 (2021): 135-146. DOI: 10.1109/TUFFC.2021.3112544 - Li, Fu; Villa, Umberto; Park, Seonyeong; Anastasio, Mark, 2021, "2D Acoustic Numerical Breast Phantoms and USCT Measurement Data", https://doi.org/10.7910/DVN/CUFVKE, Harvard Dataverse, V1 Overview - This dataset includes 1,089 two-dimensional slices extracted from 3D numerical breast phantoms (NBPs) for ultrasound computed tomography (USCT) studies. The anatomical structures of these NBPs were obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project. The methods used to modify and extend the VICTRE NBPs for use in USCT studies are described in the publication cited above. - The NBPs in this dataset represent the following four ACR BI-RADS breast composition categories: > Type A - The breast is almost entirely fatty > Type B - There are scattered areas of fibroglandular density in the breast > Type C - The breast is heterogeneously dense > Type D - The breast is extremely dense - Each 2D slice is taken from a different 3D NBP, ensuring that no more than one slice comes from any single phantom. File Name Format - Each data file is stored as an HDF5 .mat file. The filenames follow this format: {type}{subject_id}.mat where{type} indicates the breast type (A, B, C, or D), and {subject_id} is a unique identifier assigned to each sample. For example, in the filename D510022534.mat, "D" represents the breast type, and "510022534" is the sample ID. File Contents - Each file contains the following variables: > "type": Breast type > "sos": Speed-of-sound map [mm/μs] > "den": Ambient density map [kg/mm³] > "att": Acoustic attenuation (power-law prefactor) map [dB/ MHzʸ mm] > "y": power-law exponent > "label": Tissue label map. Tissue types are denoted using the following labels: water (0), fat (1), skin (2), glandular tissue (29), ligament (88), lesion (200). - All spatial maps ("sos", "den", "att", and "label") have the same spatial dimensions of 2560 x 2560 pixels, with a pixel size of 0.1 mm x 0.1 mm. - "sos", "den", and "att" are float32 arrays, and "label" is an 8-bit unsigned integer array.
keywords:
Medical imaging; Ultrasound computed tomography; Numerical phantom
published: 2024-10-31
Liu, Shanshan; Vlachokostas, Alex; Kontou, Eleftheria (2024): Data for Resilience and environmental benefits of electric school buses as backup power for educational functions continuation during outages. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4925630_V1
School buses transport 20 million students annually and are currently undergoing electrification in the US. With Vehicle-to-Building (V2B) technology, electric school buses (ESBs) can supply energy to school buildings during power outages, ensuring continued operation and safety. This study proposes assessing the resilience of secondary schools during outages by leveraging ESB fleets as backup power across various US climate regions. The findings indicate that the current fleet of ESBs in representative cities across different climate regions in the US is insufficient to meet the power demands of an entire school or even its HVAC system. However, we estimated the number of ESBs required to support the school's power needs, and we showed that the use of V2B technology significantly reduces carbon emissions compared to backup diesel generators. While adjusting HVAC setpoints and installing solar panels have limited impacts on enhancing school resilience, gathering students in classrooms during outages significantly improved resilience in our case study in Houston, Texas. Given the ongoing electrification of school buses, it is essential for schools to complement ESBs with stationary batteries and other backup power sources, such as solar and/or diesel generators, to effectively address prolonged outages. Determining the deployment of direct current fast and Level 2 chargers can reduce infrastructure costs while maintaining the resilience benefits of ESBs. This dataset includes the simulation process and results of this study.
keywords:
Electric school bus; Power outages,;Vehicle-to-Building technology; Carbon emission reduction; Backup power source
published: 2023-10-22
Davidson, Ruth; Vachaspati, Pranjal; Mirarab, Siavash; Warnow, Tandy (2023): Data from: Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6670066_V1
HGT+ILS datasets from Davidson, R., Vachaspati, P., Mirarab, S., & Warnow, T. (2015). Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC genomics, 16(10), 1-12. Contains model species trees, true and estimated gene trees, and simulated alignments.
keywords:
evolution; computational biology; bioinformatics; phylogenetics
published: 2022-09-29
Levine, Nathaniel (2022): 3DIFICE: A Synthetic Dataset for Training Computer Vision Algorithms to Recognize Earthquake Damage to Reinforced Concrete Structures. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6415287_V1
3DIFICE: 3-dimensional Damage Imposed on Frame structures for Investigating Computer vision-based Evaluation methods This dataset contains 1,396 synthetic images and label maps with various types of earthquake damage imposed on reinforced concrete frame structures. Damage includes: cracking, spalling, exposed transverse rebar, and exposed longitudinal rebar. Each image has an associated label map that can be used for training machine learning algorithms to recognize the various types of damage.
keywords:
computer vision; earthquake engineering; structural health monitoring; civil engineering; structural engineering;
published: 2022-04-29
Wedell, Eleanor; Warnow, Tandy (2022): Biological and Simulated datasets for testing the SCAMPP framework for phylogenetic placement methods. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9257957_V1
Thank you for using these datasets! These files contain trees and reference alignments, as well as the selected query sequences for testing phylogenetic placement methods against and within the SCAMPP framework. There are four datasets from three different sources, each containing their source alignment and "true" tree, any estimated trees that may have been generated, and any re-estimated branch lengths that were created to be used with their requisite phylogenetic placement method. Three biological datasets (16S.B.ALL, PEWO/LTP_s128_SSU, and PEWO/green85) and one simulated dataset (nt78) is contained. See README.txt in each file for more information.
keywords:
Phylogenetic Placement; Phylogenetics; Maximum Likelihood; pplacer; EPA-ng
published: 2021-11-18
Pan, Chao; Tabatabaei, S Kasra; Tabatabaei Yazdi, S. M. Hossein; Hernandez, Alvaro; Schroeder, Charles; Milenkovic, Olgica (2021): Rewritable Two-Dimensional DNA-Based Data Storage System (2DDNA) Sequencing Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2308557_V1
This dataset contains sequencing data obtained from Illumina MiSeq device to prove the concept of the proposed 2DDNA framework. Please refer to README.txt for detailed description of each file.
keywords:
machine learning;image processing;computer vision;rewritable storage system;2D DNA-based data storage
published: 2023-06-01
Pan, Chao; Peng, Jianhao; Chien, Eli; Milenkovic, Olgica (2023): Embedded dataset in Poincare Balls. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6901251_V1
This dataset contains four real-world sub-datasets with data embedded into Poincare ball models, including Olsson's single-cell RNA expression data, CIFAR10, Fashion-MNIST and mini-ImageNet. Each sub-dataset has two corresponding files: one is the data file, the other one is the pre-computed reference points for each class in the sub-dataset. Please refer to our paper (https://arxiv.org/pdf/2109.03781.pdf) and codes (https://github.com/thupchnsky/PoincareLinearClassification) for more details.
keywords:
Hyperbolic space; Machine learning; Poincare ball models; Perceptron algorithm; Support vector machine
Research Data Service
Illinois Data Bank
Access and Use Policies
Web Privacy Notice
Contact Us