Illinois Data Bank Dataset Search Results
Results
published:
2026-02-01
Xu, Xiaotian; Yao, Yu; Liu, Yicen; Curtis, Jeffrey; West, West; Riemer, Nicole
(2026)
This dataset contains simulation results from PartMC-MOSAIC and WRF-PartMC that used in the journal article: Quantifying the Impact of Surfactants on Cloud Condensation Nuclei Activity Using a Particle-Resolved Model. Two compressed folder are uploaded here, one is for the data that used in this article, the other folder is the python scripts to process the data. For more details of the uploaded files, please check the README file.
keywords:
Surfactants; CCN; Effective surface tension
published:
2026-03-12
Acharya, Rishi; Gerber, Eli; Bielinski, Nina; Aguirre, Hannah E.; Kim, Younsik; Bernal-Choban, Camille; Tenkila, Gaurav; Sheikh, Suhas; Mahaadev, Pranav; Hoveyda-Marashi, Faren; ROYCHOWDHURY, SUBHAJIT; Shekhar, Chandra; Felser, Claudia; Abbamonte, Peter; Wieder, Benjamin; Mahmood, Fahad
(2026)
This repository contains source data for key plots presented in the manuscript "Plasmon-driven exciton formation in a non-equilibrium Fermi liquid."
Experimental data that was analyzed in Igor Pro 8 are presented as the .pxp files used to generate individual sub-plots. Electronic spectral function calculations are provided as .txt files, in which consecutive rows refer to the meshgrid x coordinate, y coordinate, spectral function (and, where relevant, axis-projected local angular momentum). We additionally include the Wannier model and DFT-obtained bulk band structure on which the Wannier model was based.
Files are named as the number of the figure in the manuscript to which they correspond, with additional details included where necessary.
<b>Details of file names:</b>
2a_DOS_Lxz_Ek_KGM_40layer_xnum_800kpt_tot.txt: Density of states, xz-axis projected local orbital angular momentum, for 800 points along the K-Gamma-M path, for a 40-layer model.
2c_composite_y.pxp: ARPES (angle-resolved photoemission spectroscopy) spectra along the ky axis, including both a scan near the Fermi level and a scan at high kinetic energies.
2d_LCP_RCP_diff_Sect_20K.pxp: difference between ARPES constant energy cuts at T=20 K at E0 + 0.23 eV taken with left- and right-circularly polarized photons. The polarization-integrated intensity at the constant energy cut is also included.
2e_DOS_L45_E11pt79_m0pt25to0pt25_xnum_800kpt_tot.txt: Density of states, xz-projected local orbital angular momentum, and corresponding k-points in two dimensions from ab-initio electronic structure calculations for a constant-energy cut.
3a_[x]_[y]ps: ARPES cut under excitation at a fluence of x uJ/cm2, measured y ps after photoexcitation. Measurements were performed at 9 K.
3b_[x]: Energy distribution curves under excitation at a fluence x uJ/cm2 at selected delay times after photoexcitation.
4a_ImSigma_vs_temperature.pxp: Imaginary self energy (extracted from ARPES linewidths) at different energies above E0 for selected lattice temperatures.
4b_EELS_lowE.pxp: Electron energy loss spectrum over a low energy range
5b_diff_55m15.pxp: Difference between momentum-integrated Tr-ARPES traces at 55 uJ/cm2 and 15 uJ/cm2 photoexcitation. Time-dependent intensity at each energy level has been normalized to a maximum of 1 for each individual fluence prior to subtraction.
5d_invtau_at_EX_vs_fluence.pxp: decay rate at a specified energy EX for different excitation fluences, from single exponential fits.
<b>NOTE: Analyses based on the Wannier model presented here should cite both the associated Article and this dataset. For all other files in the repository, citing the dataset alone is sufficient.</b>
published:
2026-03-04
Arnav, Arushi; Zhang, Rui; Karakoc, Deniz Berfin; Konar, Megan
(2026)
This dataset provides estimates of annual agricultural and food commodity flows (in kg) between all county pairs within the United States from 2018 to 2022. The database provides 343.7 million data points, since pairwise information is provided between 3134 counties, for 7 commodity categories, and 5 time periods. The commodity categories correspond to the Standardized Classification of Transported Goods and are:
- SCTG 1: Iive animals and fish
- SCTG 2: cereal grains
- SCTG 3: agricultural products (except for animal feed, cereal grains, and forage products)
- SCTG 4: animal feed, eggs, honey, and other products of animal origin
- SCTG 5: meat, poultry, fish, seafood, and their preparations
- SCTG 6: milled grain products and preparations, and bakery products
- SCTG 7: other prepared foodstuffs, fats and oils
For additional information, please see the related paper by Arnav et al. (2026) in Environmental Research: Food Systems. http://iopscience.iop.org/article/10.1088/2976-601X/ae487c.
keywords:
food flows; high-resolution; county-scale; time-series; United States
published:
2026-01-22
Cao, Yanghui; Dietrich, Christopher H.; Dmitriev, Dmitry A.; Zou, Hongfen; Xue, Qingquan; Zhang, Yalin
(2026)
The following 5 files were used to reconstruct the phylogeny of the Membracoidea.
1. Taxon_sampling.csv: contains the sample IDs (1st column, used in the alignments) and the taxonomic information (2nd to 6th columns) for 269 samples.
2. concatenated_aa_.phy: a concatenated amino acid dataset with 52,987 amino acid positions. This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
3. concatenated_nt.phy: a concatenated nucleotide dataset with all codon positions included (158,961 nucleotide positions). This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
4. concatenated_12nt.phy: a concatenated nucleotide dataset with the third codon positions excluded (105,974 nucleotide positions). This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
5. Individual_gene_alignment.zip: contains 427 FASTA files, each one represents the nucleotide alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5.
keywords:
Auchenorrhyncha; evolution; phylogeny; timetree
published:
2026-02-11
Hanley, David; Lee, Jongwon; Choi, Su Yeon; Bretl, Timothy
(2026)
If you use this dataset, please cite both the dataset and the associated data paper (bibtex is below).
@ARTICLE{11386847,
author={Hanley, David and Lee, Jongwon and Choi, Su Yeon and Bretl, Timothy},
journal={IEEE Transactions on Instrumentation and Measurement},
title={The MagPIE2 Dataset for Mapping, Localization, and Simultaneous Localization and Mapping Using Magnetic Fields},
year={2026},
volume={},
number={},
pages={1-1},
keywords={Magnetometers;Magnetic field measurement;Magnetic fields;Pedestrians;Location awareness;Buildings;Simultaneous localization and mapping;Measurement errors;Hardware;Calibration;Localization;mapping;SLAM;dataset;benchmark;magnetometer;magnetic field},
doi={10.1109/TIM.2026.3662919}}
We present a dataset for the evaluation of magnetic field-based robotic and pedestrian localization, mapping, and SLAM methods. This dataset contains magnetometer and inertial measurement unit data collected from inside three buildings both a pedestrian and a ground robot. Data were collected at different heights simultaneously, both with and without changes in the placement of objects that may affect magnetometer measurements. In total, approximately 689 square meters of floor space was covered by this dataset.
This dataset is archivally stored. We provide a GitHub site which is meant to serve as a forum to post issues with the dataset, share code using the dataset, and to resolve problems: <a href="https://github.com/hanley6/MagPIE2Forum">https://github.com/hanley6/MagPIE2Forum</a>
Note that while the dataset is meant to be permanently stored, this forum is not meant to guarantee perennial support and its existence will be dependent on the policies of GitHub.
<b>How is the dataset organized?</b> The data is divided into the following parts at a high level and more detailed information can be found in the Readme:
1. The walking portion of the dataset: CSL_WLK.zip, DCL_WLK.zip, Talbot_WLK.zip, and WLK_Misc.zip.
2. The robot portion of the dataset: Robot_Dataset.zip.
3. Motor interference tests: Motor_Interference_Test.zip.
4. Ground truth evaluation: Ground_Truth_Evaluation.zip.
5. Quick start results: Quick_Start_Results.zip.
<b>How is data recorded and stored?</b> Data is generally collected in the form of ROS bag files. Each ROS bag has Intel Realsense camera images, magnetometer readings, IMU readings, timestamps, and more as applicable for each file in the dataset. Each bag file has an associated metadata file written as a YAML file. This contains general information about each bag file including the start and stop time, who collected the bag file (during the pedestrian portion of the dataset), and the approximate location where data was collected. In several cases, additional comma separated (csv) files of the dataset where included either as a convenient supplement to ROS bag files (e.g., csv files of magnetometer calibration data) or because they serve as human readable quick start results.
<b>How does one set up and run files on the dataset?</b> The files are stored in ROS bags and are, therefore, meant to be run using the Robot Operating System. Information regarding how to use the Robot Operating System as well as installation instructions are available at: <a href="https://ros.org/">https://ros.org/</a>
keywords:
Localization; mapping; SLAM; dataset; benchmark; magnetometer; magnetic field
published:
2025-02-07
Wang, Binghui; Kudeki, Erhan
(2025)
Incoherent scatter radar datasets collected during the September 2016 campaign at Arecibo have been deposited in this databank. The lag products of the ISR data are stored as lag profile matrices with 5 minutes of integration time. The data is organized in a Python dictionary format, with each file containing 12 lag profile matrices representing one hour of observation. A sample Python script is provided to illustrate its usage.
published:
2026-01-19
Note: The GTAP dataset includes a total of 140 regions, some of which are aggregated regions. For all map-related supplementary files (S11, S12, S13), we assign values to each individual country to enhance visualization. Countries within the same aggregated region are assigned the same regional value to maintain consistency across the map.
<b>Data S1 (separate file): S1.csv</b>- CSV file detailing production-related deaths for the GTAP dataset.
Rows: Each row represents a country where deaths occur as a result of production activities.
Columns: Each column represents a country-sector pair on the production side.
Values: The values indicate the number of deaths caused by production activities in the country-sector listed in each column and occurring in the country listed in each row.
<b>Data S2 (separate file): S2.csv</b>- CSV file detailing production-related deaths for the EORA dataset.
Structure: The file has the same structure as S1.csv.
<b>Data S3 (separate file): S3.csv</b>- CSV file detailing consumption-related deaths for the GTAP dataset.
Rows: Each row represents a country where deaths occur as a result of consumption activities.
Columns: Each column represents a consumption country.
Values: The values indicate the number of deaths caused by consumption activities in the country listed in the column and occurring in the country listed in the row.
<b>Data S4 (separate file): S4.csv</b>- CSV file detailing consumption-related deaths for the EORA dataset.
Structure: The file has the same structure as S3.csv.
<b>Data S5 (folder of files): S5.zip</b>- a folder containing 141 CSV files, each named after a country's 3-digit code (e.g., USA.csv, CHN.csv), representing production-related spatial PM₂.₅ concentration patterns for all GTAP countries.
Rows: Each row corresponds to a grid cell.
Columns: Each column represents an industrial sector. The final column, "geometry," contains the spatial coordinates (latitude and longitude) for each grid cell.
Values: Each value indicates the PM₂.₅ concentration level (in µg/m³) attributable to emissions from the specified sector in the given country, as they occur in each grid cell.
<b>Data S6 (folder of files): S6.zip</b>- a folder containing 188 CSV files, each named after a country's 3-digit code, representing production-related spatial PM₂.₅ concentration patterns for all EORA countries.
Structure: Each file follows the same format as those in S5.zip, with rows representing grid cells and columns representing industrial sectors, plus a "geometry" column containing spatial coordinates.
<b>Data S7 (separate file): S7.csv</b>- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all GTAP countries.
Rows: Each row represents a grid cell.
Columns: Apart from the last column ("geometry"), which contains spatial information for each grid cell in latitude-longitude coordinates, each column represents a consumption country.
Values: Each value indicates the PM₂.₅ concentration level caused by each country’s consumption process and occurring in each grid cell, measured in µg/m³.
<b>Data S8 (separate file): S8.csv</b>- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all EORA countries.
Structure: The file has the same structure as S7.csv.
<b>Data S9 (separate file): S9.csv</b>- CSV file listing the total net bidirectional export of deaths for all countries in GTAP, displaying only positive values.
Columns:
"from": The country that exports more consumption-related deaths.
"to": The country that imports more consumption-related deaths.
"values": The net export of deaths between these two countries, calculated as the difference between the deaths flowing from "from" to "to" and those from "to" to "from."
<b>Data S10 (separate file): S10.csv</b>- CSV file listing the total net bidirectional export of deaths for all countries in EORA, displaying only positive values.
Structure: The file has the same structure as S9.csv.
<b>Data S11 (separate file): S11.csv</b>- CSV file listing the Value of Statistical Lives (VSLs), and consumption-related externalities under three scenarios—Business as Usual (BAU), Global Community (GC), and Fair Trade in Deaths (FTD)—along with externalities per GDP and their differences for GTAP countries.
Columns:
VSL, BAU_Externality, GC_Externality, FTD_Externality
BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP
Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC
<b>Data S12 (separate file): S12.csv</b>- Same as S11.csv, but for EORA countries.
Structure: Identical to S11.csv.
<b>Data S13 (separate file): S13.csv</b>- purpose: Includes data used to generate Figures 1, 2, 3, and 5 in the main text.
Columns:
country_code: 3-letter country code
GTAP_region, continent, population, GDP, GDP_capita, VSL
export_of_death, import_of_death, net_export, net_export_capita
allforeign_world, G50foreign_world, G100foreign_world
cause_allforeign_world, cause_L30foreign_world, cause_L50foreign_world
BAU_Externality, GC_Externality, FTD_Externality
BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP
Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC
geometry (used for visualization)
<b>Data S14 (separate file): S14.xlsx</b>- this Excel file contains six sheets summarizing cross-model Pearson correlation coefficients between sectoral economic activity fractions and transboundary mortality impact metrics, based on both GTAP and EORA datasets.
Sheets:
Output_fraction_GTAP
Direct_demand_fraction_GTAP
Final_demand_fraction_GTAP
Output_fraction_EORA
Direct_demand_fraction_EORA
Final_demand_fraction_EORA
Rows: Each row represents an economic sector.
Columns:
G50foreign_world: Fraction of deaths attributable to final demand from regions where demand per capita is more than 50% higher than in the current country.
cause_L50foreign_world: Fraction of deaths caused by consumption within the current country but occurring in countries with more than 50% lower demand per capita.
Values: Each value represents the Pearson correlation between the sectoral fraction and the corresponding transboundary mortality metric.
<b>Data S15 (separate file): S15.csv</b>- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of production-based premature deaths.
Column Producer: The producing country–sector pair responsible for the emissions leading to health impacts.
Column Affected Country: The country where the resulting premature deaths occur.
Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis.
Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each producer–affected country pair.
<b>Data S16 (separate file): S16.csv</b>- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of consumption-based premature deaths.
Column Consumer: The consuming country whose final demand drives the global production and associated health impacts.
Column Affected Country: The country where the resulting premature deaths occur.
Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis.
Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each consumer–affected country combination.
published:
2026-01-12
Yan, Qiang; Cordell, William; Jindra, Michael; Pfleger, Brian
(2026)
Microbial lipid metabolism is an attractive route for producing oleochemicals. The predominant strategy centers on heterologous thioesterases to synthesize desired chain-length fatty acids. To convert acids to oleochemicals (e.g., fatty alcohols, ketones), the narrowed fatty acid pool needs to be reactivated as coenzyme A thioesters at cost of one ATP per reactivation – an expense that could be saved if the acyl-chain was directly transferred from ACP- to CoA-thioester. Here, we demonstrate such an alternative acyl-transferase strategy by heterologous expression of PhaG, an enzyme first identified in Pseudomonads, that transfers 3-hydroxy acyl-chains between acyl-carrier protein and coenzyme A thioester forms for creating polyhydroxyalkanoate monomers. We use it to create a pool of acyl-CoA’s that can be redirected to oleochemical products. Through bioprospecting, mutagenesis, and metabolic engineering, we develop three strains of Escherichia coli capable of producing over 1 g/L of medium-chain free fatty acids, fatty alcohols, and methyl ketones.
keywords:
Bioproducts; Metabolomics
published:
2025-09-08
Si, Luyang; Salami, Malik Oyewale; Schneider, Jodi
(2025)
This work evaluates the consistency and reliability of the title flag, i.e., retraction labeling that appears in the title of retracted publications, using 925 sampled retracted publications indexed in the Crossref only (Lee & Schneider, 2023), that are indexed in three other sources, Retraction Watch, Scopus, and Web of Science as of April 2023. We presume the retraction status of an item based on its title flag. For example, the flag "removal notice" is a retraction notice, and "retracted article" is a retracted paper. We compared the item's likely retraction status from the flag with the item's actual retraction status from the publisher's website.
keywords:
Crossref; Data Quality; Title flag; Retraction flag; Retraction flag assessment; Retraction labeling; Retraction indexing; Retracted papers; Retraction notices; Retraction status; RISRS
published:
2025-11-20
Njuguna, Joyce N.; Clark, Lindsay; Lipka, Alexander; Anzoua, Kossonou; Bagmet, Larisa; Chebukin, Pavel; Dwiyanti, Maria S.; Dzyubenko, Elena; Dzyubenko, Nicolay; Ghimire, Bimal Kumar; Jin, Xiaoli; Johnson, Douglas A.; Nagano, Hironori; Peng, Junhua; Petersen, Karen Koefoed; Sabitov, Andrey; Seong, Eun Soo; Yamada, Toshihiko; Yoo, Ji Hye; Yu, Chang Yeon; Zhao, Hua; Long, Stephen; Sacks, Erik
(2025)
Accelerating biomass improvement is a major goal of miscanthus breeding. The development and implementation of genomic-enabled breeding tools, like marker-assisted selection (MAS) and genomic selection, has the potential to improve the efficiency of miscanthus breeding. The present study conducted genome-wide association (GWA) and genomic prediction of biomass yield and 14 yield-components traits in Miscanthus sacchariflorus. We evaluated a diversity panel with 590 accessions of M. sacchariflorus grown across four years in one subtropical and three temperate locations and genotyped with 268,109 single-nucleotide polymorphisms (SNPs). The GWA study identified a total of 835 significant SNPs and 674 candidate genes across all traits and locations. Of the significant SNPs identified, 280 were localized in mapped quantitative trait loci intervals and proximal to SNPs identified for similar traits in previously reported miscanthus studies, providing additional support for the importance of these genomic regions for biomass yield. Our study gave insights into the genetic basis for yield-component traits in M. sacchariflorus that may facilitate marker-assisted breeding for biomass yield. Genomic prediction accuracy for the yield-related traits ranged from 0.15 to 0.52 across all locations and genetic groups. Prediction accuracies within the six genetic groupings of M. sacchariflorus were limited due to low sample sizes. Nevertheless, the Korea/NE China/Russia (N = 237) genetic group had the highest prediction accuracy of all genetic groups (ranging 0.26–0.71), suggesting that with adequate sample sizes, there is strong potential for genomic selection within the genetic groupings of M. sacchariflorus. This study indicated that MAS and genomic prediction will likely be beneficial for conducting population-improvement of M. sacchariflorus.
keywords:
Feedstock Production;Biomass Analytics;Genomics
published:
2025-11-03
Blake-Bradshaw, Abigail; Bradshaw, Therin; Beilke, Elizabeth; Gilbert, Andrew; Osborn, Joshua; Fournier, Auriel M.V.
(2025)
Data consist of 55 acoustic recordings collected using Autonomous Recording Units (ARUs) from two locations and sampling periods. Specifically, data include 60-minute WAV files (8 folders, each contains 5 WAV files) from a field trial during February 2025 whereby we shot shotguns at varying distance from ARUs at Emiquon Reserve owned by The Nature Conservancy. Data also include 60-minute WAV files (15 WAV files) from one ARU placed at Big Rice Lake State Fish and Wildlife Area on opening day of waterfowl hunting season during 10-26-2024. Filenames include the ARU ID separated by underscores and the associated date and time e.g., MINI10_20241026_060002.wav was from MINI10 on 10/26/24 at 6 AM.
keywords:
hunting; shotgun; waterfowl; acoustics
published:
2025-10-31
Lopes, Daiane; Dien, Bruce; Hector, Ronald; Singh, Vijay; Thompson, Stephanie R.; Slininger, Patricia J.; Boundy-Mills, Kyria; Jagtap, Sujit; Rao, Christopher V.
(2025)
Rhodotorula toruloides is being developed for the use in industrial biotechnology processes because of its favorable physiology. This includes its ability to produce and store large amounts of lipids in the form of intracellular lipid bodies. Nineteen strains were characterized for mating type, ploidy, robustness for growth, and accumulation of lipids on inhibitory switchgrass hydrolysate (SGH). Mating type was determined using a novel polymerase chain reaction (PCR)-based assay, which was validated using the classical microscopic test. Three of the strains were heterozygous for mating type (A1/A2). Ploidy analysis revealed a complex pattern. Two strains were triploid, eight haploid, and eight either diploid or aneuploid. Two of the A1/A2 strains were compared to their parents for growth on 75%v/v concentrated SGH. The A1/A2 strains were much more robust than the parental strains, which either did not grow or had extended lag times. The entire set was evaluated in 60%v/v SGH batch cultures for growth kinetics and biomass and lipid production. Lipid titers were 2.33–9.40 g/L with a median of 6.12 g/L, excluding the two strains that did not grow. Lipid yields were 0.032–0.131 (g/g) and lipid contents were 13.5–53.7% (g/g). Four strains had significantly higher lipid yields and contents. One of these strains, which had among the highest lipid yield in this study (0.131 ± 0.007 g/g), has not been previously described in the literature.
keywords:
Conversion;Hydrolysate;Lipidomics
published:
2025-09-22
The files in this dataset include the now-public domain full raw text and illustrations for the novel Gentlemen Prefer Blondes (GBP) by Anita Loos, and files comparing the two published versions of the novel in 1925, one in Harper's Bazar magazine and the other in book format by Boni & Liveright. These files comprise the underlying data for the scholarly digital edition of the novel edited by Daniel G. Tracy. The full citation for the publication, including the DOI link for those wishing access the text, is: Loos, Anita. Gentlemen Prefer Blondes. Edited by Daniel G. Tracy, Critical Edition. Windsor & Downs Press, 2025. https://doi.org/10.21900/wd.13
keywords:
literature; textual collation; digital editions; American Literature
published:
2017-02-28
Leesburg, VA to Indianapolis, Indiana:
Sampling Rate: 0.1 Hz
Total Travel Time: 31100007 ms or 518 minutes or 8.6 hours
Distance Traveled: 570 miles via I-70
Number of Data Points: 3112
Device used: Samsung Galaxy S4
Date Recorded: 2017-01-15
Parameters Recorded:
* ACCELEROMETER X (m/s²)
* ACCELEROMETER Y (m/s²)
* ACCELEROMETER Z (m/s²)
* GRAVITY X (m/s²)
* GRAVITY Y (m/s²)
* GRAVITY Z (m/s²)
* LINEAR ACCELERATION X (m/s²)
* LINEAR ACCELERATION Y (m/s²)
* LINEAR ACCELERATION Z (m/s²)
* GYROSCOPE X (rad/s)
* GYROSCOPE Y (rad/s)
* GYROSCOPE Z (rad/s)
* LIGHT (lux)
* MAGNETIC FIELD X (microT)
* MAGNETIC FIELD Y (microT)
* MAGNETIC FIELD Z (microT)
* ORIENTATION Z (azimuth °)
* ORIENTATION X (pitch °)
* ORIENTATION Y (roll °)
* PROXIMITY (i)
* ATMOSPHERIC PRESSURE (hPa)
* Relative Humidity (%)
* Temperature (F)
* SOUND LEVEL (dB)
* LOCATION Latitude
* LOCATION Longitude
* LOCATION Altitude (m)
* LOCATION Altitude-google (m)
* LOCATION Altitude-atmospheric pressure (m)
* LOCATION Speed (kph)
* LOCATION Accuracy (m)
* LOCATION ORIENTATION (°)
* Satellites in range
* GPS NMEA
* Time since start in ms
* Current time in YYYY-MO-DD HH-MI-SS_SSS format
Quality Notes:
There are some things to note about the quality of this data set that you may want to consider while doing preprocessing. This dataset was taken continuously but had multiple stops to refuel (without the data recording ceasing). This can be removed by parsing out all data that has a speed of 0. The mount for this dataset was fairly stable (as can be seen by the consistent orientation angle throughout the dataset). It was mounted tightly between two seats in the back of the vehicle. Unfortunately, the frequency for this dataset was set fairly low at one per ten seconds.
keywords:
smartphone; sensor; driving; accelerometer; gyroscope; magnetometer; gps; nmea; barometer; satellite; temperature; humidity
published:
2017-05-01
Indianapolis Int'l Airport to Urbana:
Sampling Rate: 2 Hz
Total Travel Time: 5901534 ms or 98.4 minutes
Number of Data Points: 11805
Distance Traveled: 124 miles via I-74
Device used: Samsung Galaxy S6
Date Recorded: 2016-11-27
Parameters Recorded:
* ACCELEROMETER X (m/s²)
* ACCELEROMETER Y (m/s²)
* ACCELEROMETER Z (m/s²)
* GRAVITY X (m/s²)
* GRAVITY Y (m/s²)
* GRAVITY Z (m/s²)
* LINEAR ACCELERATION X (m/s²)
* LINEAR ACCELERATION Y (m/s²)
* LINEAR ACCELERATION Z (m/s²)
* GYROSCOPE X (rad/s)
* GYROSCOPE Y (rad/s)
* GYROSCOPE Z (rad/s)
* LIGHT (lux)
* MAGNETIC FIELD X (microT)
* MAGNETIC FIELD Y (microT)
* MAGNETIC FIELD Z (microT)
* ORIENTATION Z (azimuth °)
* ORIENTATION X (pitch °)
* ORIENTATION Y (roll °)
* PROXIMITY (i)
* ATMOSPHERIC PRESSURE (hPa)
* SOUND LEVEL (dB)
* LOCATION Latitude
* LOCATION Longitude
* LOCATION Altitude (m)
* LOCATION Altitude-google (m)
* LOCATION Altitude-atmospheric pressure (m)
* LOCATION Speed (kph)
* LOCATION Accuracy (m)
* LOCATION ORIENTATION (°)
* Satellites in range
* GPS NMEA
* Time since start in ms
* Current time in YYYY-MO-DD HH-MI-SS_SSS format
Quality Notes:
There are some things to note about the quality of this data set that you may want to consider while doing preprocessing. This dataset was taken continuously as a single trip, no stop was made for gas along the way making this a very long continuous dataset. It starts in the parking lot of the Indianapolis International Airport and continues directly towards a gas station on Lincoln Avenue in Urbana, IL. There are a couple parts of the trip where the phones orientation had to be changed because my navigation cut out. These times are easy to account for based on Orientation X/Y/Z change. I would also advise cutting out the first couple hundred points or the points leading up to highway speed. The phone was mounted in the cupholder in the front seat of the car.
keywords:
smartphone; sensor; driving; accelerometer; gyroscope; magnetometer; gps; nmea; barometer; satellite
published:
2020-11-18
This is the dataset that accompanies the paper titled "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network", submitted for peer review in August 2020. Please see the github for the most up-to-date data after the revision process: https://github.com/dopplerchase/Chase_et_al_2021_NN
Authors: Randy J. Chase, Stephen W. Nesbitt and Greg M. McFarquhar Corresponding author: Randy J. Chase (randyjc2@illinois.edu)
Here we have the data used in the manuscript. Please email me if you have specific questions about units etc.
1) DDA/GMM database of scattering properties: base_df_DDA.csv
This is the combined dataset from the following papers: Leinonen & Moisseev, 2015; Leinonen & Szyrmer, 2015; Lu et al., 2016; Kuo et al., 2016; Eriksson et al., 2018. The column names are D: Maximum dimension in meters, M: particle mass in grams kg, sigma_ku: backscatter cross-section at ku in m^2, sigma_ka: backscatter cross-section at ka in m^2, sigma_w: backscatter cross-section at w in m^2. The first column is just an index column.
2) Synthetic Data used to train and test the neural network: Unrimed_simulation_wholespecturm_train_V2.nc, Unrimed_simulation_wholespecturm_test_V2.nc
This was the result of combining the PSDs and DDA/GMM particles randomly to build the training and test dataset.
3) Notebook for training the network using the synthetic database and Google Colab (tensorflow): Train_Neural_Network_Chase2020.ipynb
This is the notebook used to train the neural network.
4)Trained tensorflow neural network: NN_6by8.h5 This is the hdf5 tensorflow model that resulted from the training. You will need this to run the retrieval.
5) Scalers needed to apply the neural network: scaler_X_V2.pkl, scaler_y_V2.pkl These are the sklearn scalers used in training the neural network. You will need these to scale your data if you wish to run the retrieval.
6) <b>New in this version</b> - Example notebook of how to run the trained neural network on Ku- Ka- band observations. We showed this with the 3rd case in the paper: Run_Chase2021_NN.ipynb
7) <b>New in this version</b> - APR data used to show how to run the neural network retrieval: Chase_2021_NN_APR03Dec2015.nc
The data for the analysis on the observations are not provided here because of the size of the radar data. Please see the GHRC website (<a href="https://ghrc.nsstc.nasa.gov/home/">https://ghrc.nsstc.nasa.gov/home/</a>) if you wish to download the radar and in-situ data or contact me. We can coordinate transferring the exact datafiles used.
The GPM-DPR data are avail. here: <a href="http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05">http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05</a>
published:
2023-08-03
Dalling, James William
(2023)
This file contains the delta 15N values for leaf material collected from Cyathea rojasiana tree ferns before and after fertilization using ammonium -15N chloride solution to determine whether 15N update is possible from senescent leaves.
Details of the experiment are provided in the online supplement to the published paper. Briefly, In February 2022 we selected three mature C. rojasiana individuals 1-1.5m in height that had leaves rooted in the soil and one new developing (but unexpanded) leaf. For each fern, two plastic pots (10 x 10 x 12 cm) were filled with a 50:50 mixture of washed river sand and soil from the Chorro watershed. For each pot, one senescent leaf that was rooted in the soil was carefully excavated and its roots transplanted into the pot. Pots were then fertilized by adding 30 ml of a 0.02 M 15N solution of ammonium-15N chloride (98% 15N; Sigma-Aldrich 299251; St Louis, MO) to yield a target concentration of 2 µg15N cm-3 of soil. After fertilization pots were carefully enclosed within thick plastic bags, and sealed around the senescent leaf rachis to prevent leaching any of 15N from the pot to the surrounding soil.
At the time of N fertilization, pinnae of the youngest fully expanded leaf were collected from each fern. One pinna was collected from the base of the leaf and one from the distal end of the leaf. In March 2022, after 28 days the roots were removed from pots and two additional leaf pinnae sampled from each fern: one from the base and one from the distal end of the youngest (now fully expanded) leaf. Leaf samples were dried for 72 hours at 60 C and then leaf lamina tissue finely ground with a bead beater. The delta 15N for each leaf sample determined at the University of Illinois, Urbana-Champaign using a Thermo Delta V Advantage IRMS run in combination with a Costech 4010 Elemental Analyzer. Samples were run in continuous flow relative to laboratory standards that were calibrated with USGS 40, 41, and NBS 19 reference materials.
keywords:
15N; Cyathea rojasiana; N fertilization; montane forest
published:
2025-09-15
HamediRad, Mohammad; Weisberg, Scott; Chao, Ran; Lian, Jiazhang; Zhao, Huimin
(2025)
Golden Gate assembly is one of the most widely used DNA assembly methods due to its robustness and modularity. However, despite its popularity, the need for BsaI-free parts, the introduction of scars between junctions, as well as the lack of a comprehensive study on the linkers hinders its more widespread use. Here, we first developed a novel sequencing scheme to test the efficiency and specificity of 96 linkers of 4-bp length and experimentally verified these linkers and their effects on Golden Gate assembly efficiency and specificity. We then used this sequencing data to generate 200 distinct linker sets that can be used by the community to perform efficient Golden Gate assemblies of different sizes and complexity. We also present a single-pot scarless Golden Gate assembly and BsaI removal scheme and its accompanying assembly design software to perform point mutations and Golden Gate assembly. This assembly scheme enables scarless assembly without compromising efficiency by choosing optimized linkers near assembly junctions.
keywords:
Conversion;Genome Engineering;Genomics
published:
2025-10-15
Blind-Doskocil, Leanne; Trapp, Robert J.; Nesbitt, Stephen W.
(2025)
This is a collection of 31 quasi-linear convective system (QLCS) mesovortices (MVs) that were first manually identified and analyzed using the lowest elevation scan of the nearest relevant Weather Surveillance Radar–1988 Doppler (WSR-88D) during the two years (springs of 2022 and 2023) of the Propagation, Evolution, and Rotation in Linear Storms (PERiLS) field campaign. This analysis was completed using the Gibson Ridge radar-viewing software (GR2Analyst). Throughout the two years of PERiLS, a total of nine intensive observing periods (IOPs) occurred (see https://catalog.eol.ucar.edu/perils_2022/missions and https://catalog.eol.ucar.edu/perils_2023/missions for exact IOP dates/times). However, only six of these IOPs (specifically, IOPs 2, 3, and 4 from both years) are included in this dataset. The inclusion criteria were based on the presence of strictly QLCS MVs that from a cursory analysis were within the C-band On Wheels (COW) domain, one of the research radars deployed in the field for the PERiLS project. The 31 QLCS MVs identified using WSR-88D data were also examined using data from the COW radar (using Solo3 software). The lowest elevation angle was not always useable in the COW data, and sometimes the second lowest elevation angle was used. Further details on how MVs were identified are provided below, and a very detailed methodology is published in Blind-Doskocil et al. (2025).
Each MV had to be produced by a QLCS, defined as a continuous area of 35 dBZ radar reflectivity over at least 100 km when viewed from the lowest elevation scan. The MVs analyzed also had to pass through/near the COW’s domain at some point during their lifetimes to allow for additional analysis using the COW data. Tornadic (TOR), wind-damaging (WD), and non-damaging (ND) MVs were analyzed over their entire lifetime and subsequently during the pretornadic, predamaging (wind damage), and prewarning phase (classified altogether as the prephase) of each MV. The prephase MVs were classified based on the first damage report or lack thereof associated with them. ND MVs were ones that usually had a tornado warning placed on them (all but one case) but did not produce any damage and persisted for five or more radar scans; this was done to target the strongest MVs that forecasters thought could be tornadic.
The QLCS MVs were identified using objective criteria, which included the existence of a circulation with a maximum differential velocity (dV; i.e., the difference between the maximum outbound and minimum inbound velocities at a constant range) of at least 20 kt over a distance ≤ 7 km. The following radar-based characteristics were catalogued for each QLCS MV at the lowest elevation angle of the nearest WSR-88D: latitude and longitude locations of the MV, the genesis to decay time of the MV, the maximum dV across the MV, the maximum rotational velocity (Vrot; i.e., dV divided by two), diameter of the MV, the range from the radar of the MV center, and the height above radar level of the MV center.
In the Excel workbook titled “nexrad_analyzed_mvs_perils_illinois_data_bank”, there are a total of 36 sheets. 31 of the 36 sheets are for each MV that was examined. The 31 MV sheets that were used to calculate MV statistics are labeled following the convention 'mv#_iop#_qlcs'. ‘mv#’ is the unique number that was assigned to each MV for clear identification, 'iop#' is the IOP in which the MV occurred, 'qlcs' denotes that the MV was produced by a QLCS, and the 2023 IOPs are denoted by ‘_2023’ after ‘qlcs’ in the sheet name. In these sheets, there are notes on what was visually seen in the radar data, damage associated with each MV (using the National Centers for Environmental Information (NCEI) database), and the characteristics of the MV at each time step of its lifetime. The yellow rows in each of the sheets indicate the last row of data included in the prephase statistics. The orange boxes in the notes column indicate any reports that were in NCEI but not in GR2Analyst. There are also sheets that examine pretornadic and predamaging diameter trends; box and whisker plot statistics of the overall characteristics of the different types of MVs; and the overall characteristics of each MV, with one Excel sheet (‘combined_qlcs_mvs’) examining the characteristics of each MV over its entire lifetime and one Excel sheet (‘combined_qlcs_mvs_before_report’) examining the characteristics of each MV before it first produced damage or had a tornado warning placed on it.
In the Excel workbook titled “cow_analyzed_mvs_perils_illinois_data_bank”, there are a total of 33 sheets. 31 of the 33 sheets are for each MV that was examined, with a similar naming convention to those analyzed using WSR-88D data. The data documented in each sheet is also similar to that in the WSR-88D sheets. Due to the very tedious and time-consuming nature of analyzing radar data manually, we mainly focused on cataloging only the times where the MVs were detectable in the COW data during the prephase. In the WSR-88D data, we examined the MVs over their entire lifetimes and during their prephases. Not all the MVs analyzed in the WSR-88D data ended up being detectable in the COW data, and we focused on comparing the prephase MVs in the COW data and WSR-88D data. Therefore, there are sheets that are missing values and note that the MV was not in the COW’s domain, not detectable during the prephase, only focused on cataloging the prephase, etc. There are also sheets that examine characteristics of each MV during the prephase (‘combined_qlcs_mvs_before_report’) and box and whisker plot statistics of the prephase characteristics of the MVs (‘box_whisker_stats).
keywords:
quasi-linear convective system; QLCS; tornado; radar; mesovortex; PERiLS; low-level rotation; tornadic; nontornadic; wind-damaging; Propagation, Evolution, and Rotation in Linear Storms; tornado warning; C-band On Wheels
published:
2019-03-19
Fernandez, Roberto; Parker, Gary; Stark, Colin P.
(2019)
This dataset includes images and extracted centerlines from experiments looking at the formation and evolution of meltwater meandering channels on ice. The laboratory data includes centimeter- and millimeter-scale rivulets. Dataset also includes an image and corresponding centerlines from the Peterman Ice Island.
All centerlines were manually digitized in Matlab but no distributable code was developed for the process. Once digitized, centerlines were smoothed and standardized following methods and routines developed by other authors (Zolezzi and Guneralp, 2016; Guneralp and Rhoads, 2008). Details about the preparation of the centerlines and processing with these methods is included in the dissertation by Fernández (2018) linked to this dataset.
"Millimeter scale and Peterman Ice Island centerlines.pdf": This file includes the images of two mm-scale experimetns and the Peterman Ice Island image. Seventeen centerlines were digitized from the former and seven were digitized from the latter. Those centerlines are shown above the images themselves.
"Centimeter scale rivulet images.pdf": This file includes images corresponding to all cm-scale centerlines used for the analysis presented in the dissertation by Fernandez (2018). Each image has a short caption indicating the run ID and the time at which it was captured. The images were used to extract centerlines to look at the planform evolution of cm-scale meltwater meandering rivulets on ice. Images include 26 centerlines from four different runs.
"Meltwater meandering channel centerlines.xlsx": This spreadsheet contains the centerline data for all fifty centerlines. The workbook includes 51 sheets. The first 50 are related to each one of the channels. The mm scale and Peterman Ice Island ones are identified using the same IDs shown in "Millimeter scale and Peterman Ice Island centerlines.pdf". The cm-scale centerlines are identified by run ID and a number indicating the time in minutes (with t = 0 min being the time at which water started flowing over the ice block). The naming convention is also associated to the images in "Centimeter scale rivulet images.pdf". The last sheet in the workbook includes a summary of the channel widths measured from every image for each centerline. The 50 sheets with the centerline information have four columns each. The titles of the columns are X, Y, S, and C. X,Y are dimensionless coordinates of the centerline. S is dimensionless streamwise coordinate (location along the centerline). C is dimensionless curvature value. All these values were non-dimensionalized with the channel width. See Fernandez (2018), Zolezzi and Guneralp (2016), and Guneralp and Rhoads (2008) for more details regarding the process of smoothing, standardizing and non-dimensionalization of the centerline coordinates.
keywords:
Meltwater, Meandering, Ice, Supraglacial, Experiments
published:
2026-01-08
Dibaeinia, Payam; Sinha, Saurabh
(2026)
CoNSEPT is a tool to predict gene expression in various cis and trans contexts. Inputs to CoNSEPT are enhancer sequence, transcription factor levels in one or many trans conditions, TF motifs (PWMs), and any prior knowledge of TF-TF interactions.
keywords:
software; gene expression
published:
2022-04-29
Wedell, Eleanor; Warnow, Tandy
(2022)
Thank you for using these datasets!
These files contain trees and reference alignments, as well as the selected query sequences for testing phylogenetic placement methods against and within the SCAMPP framework.
There are four datasets from three different sources, each containing their source alignment and "true" tree, any estimated trees that may have been generated, and any re-estimated branch lengths that were created to be used with their requisite phylogenetic placement method.
Three biological datasets (16S.B.ALL, PEWO/LTP_s128_SSU, and PEWO/green85) and one simulated dataset (nt78) is contained. See README.txt in each file for more information.
keywords:
Phylogenetic Placement; Phylogenetics; Maximum Likelihood; pplacer; EPA-ng
published:
2025-07-09
Kim, Ahyoung; Kim, Chansong; Waltmann, Tommy; Vo, Thi; Kim, Eun Mi; Kim, Junseok; Shao, Yu-Tsun; Michelson, Aaron; Crockett, John R.; Kalutantirige, Falon C.; Yang, Eric; Yao, Lehan; Hwang, Chu-Yun; Zhang, Yugang; Liu, Yu-Shen; An, Hyosung; Gao, Zirui; Kim, Jiyeon; Mandal, Sohini; Muller, David; Fichthorn, Kristen; Glotzer, Sharon; Chen, Qian
(2025)
This dataset contains the raw transmission electron microscopy (TEM) and scanning electron microscopy (SEM) images used to calculate the synthesis yield of patchy nanoparticles (NPs), as described in Supplementary Table 1 of the paper “Patchy Nanoparticles by Atomic “Stencilling” (2025).” All the images were taken at the Materials Research Laboratory, University of Illinois at Urbana-Champaign by Qian Chen group.
1. We have 21 subfolders, each with a name corresponding to one of the 21 patchy NPs listed in Supplementary Table 1 of the paper “Patchy Nanoparticles by Atomic “Stencilling” (2025)."
2. In TEM images, the bright and dark regions indicate the polymer patches and NP cores, respectively.
3. In SEM images, the bright and dark regions indicate the NP cores and polymer patches, respectively.
4. Each subfolder contains a “readme (subfolder name).txt” file with more detailed information about each sample.
keywords:
Patchy nanoparticle; polymer; synthesis; self-assembly
published:
2024-02-21
Hartman, Jordan H; Corush, Joel B; Larson, Eric R; Tiemann, Jeremy S; Willink, Philip; Davis, Mark A
(2024)
Data associated with the manuscript "Niche conservatism and spread explain hybridization and introgression between native and invasive fish" by Jordan H. Hartman, Joel B. Corush, Eric R. Larson, Jeremy S. Tiemann, Philip Willink, and Mark A. Davis. For this project, we combined results of ecological niche models (ENMs) and next-generation restriction site-associated DNA sequencing (RADseq) to test theories of niche conservatism and biotic resistance on the success of invasion, hybridization, and extent of introgression between native Western Banded Killifish and non-native Eastern Banded Killifish. This dataset provides the sampling locations and number of Banded Killifish in each population, accession numbers for RADseq from the National Center for Biotechnology Information Sequence Read Archive and the assignment of each Banded Killifish, the habitat associations of each population from the ENMs, and the occurrence points used to build the ENMs.
keywords:
Banded Killifish; ecological niche model; Fundulus diaphanus; hybrid swarm; invasive species; Laurentian Great Lakes
published:
2018-12-04
Wang, Yang; Dietrich, Christopher; Zhang, Yalin
(2018)
The text file contains the original data used in the phylogenetic analyses of Wang et al. (2017: Scientific Reports 7:45387). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 81 taxa (species) and 2905 characters, indicate that the first 2805 characters are DNA sequence and the last 100 are morphological, that the data may be interleaved (with data for one species on multiple rows), that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 100 morphological characters. The identity and positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.
keywords:
phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; wingless; histone H3; cytochrome oxidase I; bayesian analysis