Illinois Data Bank Dataset Search Results
Results
published:
2025-11-12
Purmessur, Cheeranjeev; Chow, Kaicheung; van Heck, Bernard; Kou, Angela
(2025)
This dataset contains all the raw and processed data used to generate the figures presented in the main text and the supplementary information of the paper "Operation of a high frequency, phase slip qubit." It also includes code for data analysis and code for generating the figures.
<b>Note:</b> V2 includes time domain analysis that also accounts for the thermal dephasing from the f state (see readme in Time domain Device A).
keywords:
phase slip qubit; superconducting qubit; quantum information; disordered superconductors
published:
2021-04-29
Jackson, Nicole ; Konar, Megan ; Debaere, Peter; Sheffield, Justin
(2021)
Global assessments of climate extremes typically do not account for the unique characteristics of individual crops. A consistent definition of the exposure of specific crops to extreme weather would enable agriculturally-relevant hazard quantification. We introduce the Agriculturally-Relevant Exposure to Shocks (ARES) model, a novel database of both the temperature and moisture extremes facing individual crops by explicitly accounting for crop characteristics. Specifically, we estimate crop-specific temperature and moisture shocks during the growing season for a 0.25-degree spatial grid and daily time scale from 1961-2014 globally for 17 crops.
The resulting database presented here provides annual crop- and event-specific exposure rates. Both gridded and country-level exposure rates are provided for each of the 17 crops. Our results provide new insights into the changes in the magnitude as well as spatial and temporal distribution of extreme events that impact crops over the past half-century. For additional information, please see the related paper by Jackson et al. (2021) in Environmental Research Letters.
keywords:
Crop-specific; weather extremes; temperature; moisture; global; gridded; time series
published:
2022-04-15
Kim, Hyunbin; Makhnenko, Roman
(2022)
This dataset is provided to support the statements in Kim, H., and R.Y. Makhnenko. 2022. "Evaluation of CO2 sealing potential of heterogeneous Eau Claire shale". Journal of the Geological Society.
In geologic carbon dioxide (CO2) storage in deep saline aquifers, buoyant CO2 tends to float upwards in the reservoirs overlaid by low permeable formations called caprocks. Caprocks should serve as barriers to potential CO2 leakage that can happen through a diffusion loss and permeation through faults, fractures, or pore spaces. The leakage through intact caprock would mainly depend on its permeability and CO2 breakthrough pressure, and is affected by the heterogeneities in the material. Here, we study the sealing potential of a caprock from Illinois Basin - Eau Claire shale, with sandy and shaly fractions distinguished via electron microscopy and grain/pore size and surface area characterization. The direct measurements of permeability of sandy shale provides the values ~ 10-15 m2, while clayey specimens are three orders of magnitude less permeable. The CO2 breakthrough pressure under in-situ stress conditions is 0.1 MPa for the sandy shale and 0.4 MPa for the clayey counterpart – these values are higher than those predicted by the porosimetry methods performed on the unconfined specimens. Sandy Eau Claire shale would allow penetration of large CO2 volumes at low overpressures, while the clayey formation can serve as a caprock in the absence of faults and fractures in it.
keywords:
Geologic carbon storage; Caprock; Shale; CO2 breakthrough pressure; Porosimetry.
published:
2019-03-19
Fernandez, Roberto; Parker, Gary; Stark, Colin P.
(2019)
This dataset includes images and extracted centerlines from experiments looking at the formation and evolution of meltwater meandering channels on ice. The laboratory data includes centimeter- and millimeter-scale rivulets. Dataset also includes an image and corresponding centerlines from the Peterman Ice Island.
All centerlines were manually digitized in Matlab but no distributable code was developed for the process. Once digitized, centerlines were smoothed and standardized following methods and routines developed by other authors (Zolezzi and Guneralp, 2016; Guneralp and Rhoads, 2008). Details about the preparation of the centerlines and processing with these methods is included in the dissertation by Fernández (2018) linked to this dataset.
"Millimeter scale and Peterman Ice Island centerlines.pdf": This file includes the images of two mm-scale experimetns and the Peterman Ice Island image. Seventeen centerlines were digitized from the former and seven were digitized from the latter. Those centerlines are shown above the images themselves.
"Centimeter scale rivulet images.pdf": This file includes images corresponding to all cm-scale centerlines used for the analysis presented in the dissertation by Fernandez (2018). Each image has a short caption indicating the run ID and the time at which it was captured. The images were used to extract centerlines to look at the planform evolution of cm-scale meltwater meandering rivulets on ice. Images include 26 centerlines from four different runs.
"Meltwater meandering channel centerlines.xlsx": This spreadsheet contains the centerline data for all fifty centerlines. The workbook includes 51 sheets. The first 50 are related to each one of the channels. The mm scale and Peterman Ice Island ones are identified using the same IDs shown in "Millimeter scale and Peterman Ice Island centerlines.pdf". The cm-scale centerlines are identified by run ID and a number indicating the time in minutes (with t = 0 min being the time at which water started flowing over the ice block). The naming convention is also associated to the images in "Centimeter scale rivulet images.pdf". The last sheet in the workbook includes a summary of the channel widths measured from every image for each centerline. The 50 sheets with the centerline information have four columns each. The titles of the columns are X, Y, S, and C. X,Y are dimensionless coordinates of the centerline. S is dimensionless streamwise coordinate (location along the centerline). C is dimensionless curvature value. All these values were non-dimensionalized with the channel width. See Fernandez (2018), Zolezzi and Guneralp (2016), and Guneralp and Rhoads (2008) for more details regarding the process of smoothing, standardizing and non-dimensionalization of the centerline coordinates.
keywords:
Meltwater, Meandering, Ice, Supraglacial, Experiments
published:
2024-03-01
Chen, Chu-Chun; Dominguez, Francina
(2024)
This dataset contains model output from the Community Earth System Model, Version 1 (CESM1; Hurrell et al., 2013) and variables from the European Centre for Medium-Range Weather Forecast (ECMWF) Reanalysis v5 (ERA5; Hersbach et al., 2020). These data were used for analysis in “The location of large-scale soil moisture anomalies affects moisture transport and precipitation over southeastern South America”, published in Geophysical Research Letters.
Acknowledgments:
This work was supported by NSF Award AGS-1852709. We acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the NSF. We thank Dr. Haiyan Teng for providing guidance on setting up the CESM experiments and offering valuable advice.
References:
Hersbach H, Bell B, Berrisford P, et al. The ERA5 global reanalysis. Q J R Meteorol Soc. 2020; 146: 1999–2049. https://doi.org/10.1002/qj.3803
Hurrell, J. W., and Coauthors, 2013: The Community Earth System Model: A Framework for Collaborative Research. Bull. Amer. Meteor. Soc., 94, 1339–1360, https://doi.org/10.1175/BAMS-D-12-00121.1
keywords:
atmospheric sciences; climate modeling; land-atmosphere interactions; soil moisture; regional atmospheric circulation; southeastern South America
published:
2020-06-26
Gasparik, Jessica T.; Ye, Qing; Curtis, Jeffrey H.; Presto, Albert A.; Donahue, Neil M.; Sullivan, Ryan C.; West, Matthew; Riemer, Nicole
(2020)
This dataset contains the PartMC-MOSAIC simulations used in the article "Quantifying Errors in the Aerosol Mixing-State Index Based on Limited Particle Sample Size". The 1000 simulations of output data is organized into a series of archived folders, each containing 100 scenarios. Within each scenario directory are 25 NetCDF files, which are the hourly output of a PartMC-MOSAIC simulation containing all information regarding the environment, particle and gas state. This dataset was used to investigate the impact of sample size on determining aerosol mixing state. This data may be useful as a data set for applying different types of estimators.
keywords:
Atmospheric aerosols; single-particle measurements; sampling uncertainty; NetCDF
published:
2025-07-09
Kim, Ahyoung; Kim, Chansong; Waltmann, Tommy; Vo, Thi; Kim, Eun Mi; Kim, Junseok; Shao, Yu-Tsun; Michelson, Aaron; Crockett, John R.; Kalutantirige, Falon C.; Yang, Eric; Yao, Lehan; Hwang, Chu-Yun; Zhang, Yugang; Liu, Yu-Shen; An, Hyosung; Gao, Zirui; Kim, Jiyeon; Mandal, Sohini; Muller, David; Fichthorn, Kristen; Glotzer, Sharon; Chen, Qian
(2025)
This dataset contains the raw transmission electron microscopy (TEM) and scanning electron microscopy (SEM) images used to calculate the synthesis yield of patchy nanoparticles (NPs), as described in Supplementary Table 1 of the paper “Patchy Nanoparticles by Atomic “Stencilling” (2025).” All the images were taken at the Materials Research Laboratory, University of Illinois at Urbana-Champaign by Qian Chen group.
1. We have 21 subfolders, each with a name corresponding to one of the 21 patchy NPs listed in Supplementary Table 1 of the paper “Patchy Nanoparticles by Atomic “Stencilling” (2025)."
2. In TEM images, the bright and dark regions indicate the polymer patches and NP cores, respectively.
3. In SEM images, the bright and dark regions indicate the NP cores and polymer patches, respectively.
4. Each subfolder contains a “readme (subfolder name).txt” file with more detailed information about each sample.
keywords:
Patchy nanoparticle; polymer; synthesis; self-assembly
published:
2024-03-28
Zhang, Yue; Zhao, Helin; Huang, Siyuan; Hossain, Mohhamad Abir; van der Zande, Arend
(2024)
Read me file for the data repository
*******************************************************************************
This repository has raw data for the publication "Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process Induced Strain". We arrange the data following the figure in which it first appeared. For all electrical transfer measurement, we provide the up-sweep and down-sweep data, with voltage units in V and conductance unit in S. All Raman modes have unit of cm^-1.
*******************************************************************************
How to use this dataset
All data in this dataset is stored in binary Numpy array format as .npy file.
To read a .npy file: use the Numpy module of the python language, and use np.load() command.
Example: suppose the filename is example_data.npy. To load it into a python program, open a Jupyter notebook, or in the python program, run:
import numpy as np
data = np.load("example_data.npy")
Then the example file is stored in the data object.
*******************************************************************************
published:
2021-05-10
Zheng, Zhonghua; Zhao, Lei; Oleson, Keith
(2021)
This dataset contains the emulated global multi-model urban daily temperature projections under RCP 8.5 scenario. The dataset is derived from the study "Large model structural uncertainty in global projections of urban heat waves" (XXXX). Details about this dataset and the local urban climate emulator are described in the article. This dataset documents the global urban daily temperatures of 17 CMIP5 Earth system models for 2006-2015 and 2061-2070. This dataset may be useful for multiple communities regarding urban climate change, heat waves, impacts, vulnerability, risks, and adaptation applications.
keywords:
Urban heat waves; CMIP; urban warming; heat stress; urban climate change
published:
2019-03-05
This dataset contains the raw nuclear background radiation data collected in the engineering campus of University of Illinois at Urbana-Champaign. It contains three columns, x, y, and counts, which corresponds to longitude, latitude, and radiation count rate (counts per second). In addition to the original background radiation data, there are several separate files that contain the simulated radioactive sources.
For more detailed README file, please refer to this documentation: <a href= "https://www.dropbox.com/s/xjhmeog7fvijml7/README.pdf?dl=0">https://www.dropbox.com/s/xjhmeog7fvijml7/README.pdf?dl=0</a>
keywords:
Nuclear Radiation
published:
2021-01-04
Zhao, Lei; Oleson, Keith; Bou-Zeid, Elie; Krayenhoff, Eric Scott; Bray, Andrew; Zhu, Qing; Zheng, Zhonghua; Chen, Chen; Oppenheimer, Michael
(2021)
This dataset contains the emulated global multi-model urban climate projections under RCP 8.5 and RCP 4.5 used in the article "Global multi-model projections of local urban climates" (https://www.nature.com/articles/s41558-020-00958-8). Details about this dataset and the local urban climate emulator are described in the article. This dataset documents the monthly mean projections of urban temperatures and urban relative humidity of 26 CMIP5 Earth system models (ESMs) from 2006 to 2100 across the globe. This dataset may be useful for multiple communities regarding urban climate change, impacts, vulnerability, risks, and adaptation applications.
keywords:
Urban climate; multi-model climate projections; CMIP; urban warming; heat stress
published:
2019-05-01
Balasubramanian, Srinidhi; Koloutsou-Vakakis, Sotiria; Rood, Mark
(2019)
This dataset contains scripts and data developed as a part of the research manuscript titled “Spatial and Temporal Allocation of Ammonia Emissions from Fertilizer Application Important for Air Quality Predictions in U.S. Corn Belt”. This includes (1) Spatial and temporal factors for ammonia emissions from agricultural fertilizer usage developed using the hybrid ISS-DNDC method for the Midwest U.S., (2) CAMx job scripts and outputs of predictions of ambient ammonia and total and speciated PM2.5, (3) Observation data used to statistically evaluate CAMx predictions, and (4) MATLAB programs developed to pair CAMx predictions with ground-based observation data in space and time.
keywords:
Air quality; Ammonia; Emissions; PM2.5; CAMx; DNDC; spatial resolution; Midwest U.S.
published:
2025-05-29
Ruess, P.J.; Hanley, Jackie; Konar, Megan
(2025)
These data support Ruess et al (2025) "Drought impacts to water footprints and virtual water transfers of counties of the United States", Water Resources Research, 61, e2024WR037715, https://doi.org/10.1029/2024WR037715.
The dataset contains estimates for Virtual Water Content (VWC) and Virtual Water Trade (VWT) for nine unique combinations of three crop categories (cereal grains, produce, and animal feed) and three water sources (surface water withdrawals, groundwater withdrawals, and groundwater depletion) for the years 2012 and 2017 within the Continental United States. The VWC is calculated by dividing irrigation withdrawal estimates (m3) by the production (tons) at the county resolution. The VWT is calculated by multiplying the VWC by the estimated county level food flows (tons) from Karakoc et al. (2022). All VWC estimates are provided at the county resolution according to county GEOID and are given in units of m3/ton. All VWT estimates are given in pairs of origin and destination GEOID’s and provided in units of m3.
When using, please cite as:
Ruess, P.J., Hanley, J., and Konar, M. (2025) "Drought impacts to water footprints and virtual water transfers of counties of the United States", Water Resources Research, 61, e2024WR037715, doi: 10.1029/2024WR037715.
keywords:
irrigation; water footprints; supply chains
published:
2023-06-10
Cheng, Xi; Kontou, Eleftheria
(2023)
Data and code supporting the paper titled "Estimating the Electric Vehicle Charging Demand of Multi-Unit Dwelling Residents in the United States" by Xi Cheng and Eleftheria Kontou at the University of Illinois Urbana-Champaign. The data and the code enable analytics and assessment of multi-unit dwelling residents travel patterns and their electric vehicle charging demand.
keywords:
multi-unit residents; electric vehicles; home charging; travel patterns; energy use
published:
2025-04-05
Meem, Tasneem Haq; Rhoads, Bruce; Lewis, Quinn; Umar, Muhammad; Sukhodolov, Alex
(2025)
This data set includes information on mixing metric values and distances to determine the average length scale, rates and variability of mixing downstream of 43 river confluences for 150 mixing events. The file "pmx_all data.csv" contains confluence names, the number of events per confluence site, and Pmx values measured at various actual and dimensionless downstream distances. The file "pmx_binned data.csv" provides mean Pmx values within 0.5-unit dimensionless distance bins.
keywords:
river; mixing; confluences; remote sensing
published:
2021-10-04
Wang, Justin; Curtis, Jeffrey H; Riemer, Nicole; West, Matthew
(2021)
This dataset contains all the necessary information to recreate the study presented in the paper entitled "Learning coagulation processes with combinatorially-invariant neural networks". This consists of (1) the aggregated output files used for machine learning, (2) the machine learning codes used to learn the presented models, (3) the PartMC model source code that was used to generate the simulation data and (4) the Python scripts used construct the scenario library for training and testing simulations. This data was used to investigate a method (combinatorally-invariant neural network) for learning the aerosol process of coagulation. This data may be useful for application of other methods.
keywords:
Machine learning; Atmospheric chemistry; Particle-resolved modeling; Coagulation; Atmospheric Science
published:
2025-08-14
Bao, Wencheng; Kontou, Eleftheria
(2025)
Data and code for the paper titled "Electric Vehicle Charging Stations at Risk from Hazardous Events and Power Outages: Analytics and Resilience Implications" published in Renewable and Sustainable Energy Reviews journal (https://doi.org/10.1016/j.rser.2025.116144).
keywords:
electric vehicles; hazardous events; charging infrastructure; power outages; resilience
published:
2025-11-25
Hyunbin, Kim; Kiseok, Kim; Roman, Makhnenko
(2025)
This dataset encompasses experimental results supporting the upcoming journal paper, "Hydro-mechanical-chemical behavior of sedimentary rock during CO2 injection". The dataset includes the measurements and analyses conducted under controlled laboratory conditions, capturing changes in poroviscoelastic properties and pore structure after CO2 treatment.
keywords:
Poroviscoelasticity; Carbonate mineral dissolution; Porosity evolution; Compaction; Shale; Opalinus Clay
published:
2022-04-19
Saleh, Ehsan; Ghaffari, Saba; Forsyth, David; Yu-Xiong, Wang
(2022)
This data repository includes the features and the trained backbone parameters used in the ICLR 2022 Paper "On the Importance of Firth Bias Reduction in Few-Shot Classification".
The code accompanying this data is open-source and available at https://github.com/ehsansaleh/firth_bias_reduction
The code and the data have three modules:
1. The "code_firth" module (10 files) relates to the basic ResNet backbones and logistic classifiers (e.g., Figures 2 and 3 in the main paper).
2. The "code_s2m2rf" module (2 files) relates to the S2M2R feature backbones and cosine classifiers (e.g., Figure 4 in the main paper).
3. The "code_dcf" module (3 files) relates to the few-shot Distribution Calibration (DC) method (e.g., Table 1 in the main paper).
The relevant files for each module have the module name as a prefix in their name.
1. For instance, the "code_dcf_features.tar" file should be placed at the "features" directory of the "code_dcf" module.
2. As another example, "code_firth_features_cifarfs_novel.tar" should be placed in the "features" directory of the "code_firth" module, and it includes the features extracted from the novel split of mini-ImageNet dataset.
Each tar-ball should be extracted in its relevant directory, and the md5 check-sums of the extracted files are also provided in the open-source code repository for verification.
Please note that the actual datasets of images are not included here (since we do not own those datasets). However, helper scripts for automatically downloading the original datasets are also provided in the every module and sub-directory of the GitHub code repository.
keywords:
Computer Vision; Few-Shot Classification; Few-Shot Learning; Firth Bias Reduction
published:
2021-02-24
Bieri, Carolina A.; Dominguez, Francina
(2021)
This dataset contains model output from the Community Earth System Model, Version 2 (CESM2; Danabasoglu et al. 2020). These data were used for analysis in Impacts of Large-Scale Soil Moisture Anomalies in Southeastern South America, published in the Journal of Hydrometeorology (DOI: 10.1175/JHM-D-20-0116.1). See this publication for details of the model simulations that created these data.
Four NetCDF (.nc) files are included in this dataset. Two files correspond to the control simulation (FHIST_SP_control) and two files correspond to a simulation with a dry soil moisture anomaly imposed in southeastern South America (FHIST_SP_dry; see the publication mentioned in the preceding paragraph for details on the spatial extent of the imposed anomaly). For each simulation, one file corresponds to output from the atmospheric model (file names with "cam") of CESM2 and the other to the land model (file names with "clm2"). These files are raw CESM output concatenated into a single file for each simulation.
All files include data from 1979-01-02 to 2003-12-31 at a daily resolution. The spatial resolution of all files is about 1 degree longitude x 1 degree latitude. Variables included in these files are listed or linked below.
Variables in atmosphere model output:
Vertical velocity (omega)
Convective precipitation
Large-scale precipitation
Surface pressure
Specific humidity
Temperature (atmospheric profile)
Reference temperature (temp. at reference height, 2 meters in this case)
Zonal wind
Meridional wind
Geopotential height
Variables in land model output:
See https://www.cesm.ucar.edu/models/cesm1.2/clm/models/lnd/clm/doc/UsersGuide/history_fields_table_40.xhtml
Note that not all of the variables listed at the above link are included in the land model output files in this dataset.
This material is based upon work supported by the National Science Foundation under Grant No. 1454089.
We acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation. The CESM project is supported primarily by the National Science Foundation. We thank all the scientists, software engineers, and administrators who contributed to the development of CESM2.
References
Danabasoglu, G., and Coauthors, 2020: The Community Earth System Model Version 2 (CESM2). Journal of Advances in Modeling Earth Systems, 12, e2019MS001916, https://doi.org/10.1029/2019MS001916.
keywords:
Climate modeling; atmospheric science; hydrometeorology; hydroclimatology; soil moisture; land-atmosphere interactions
published:
2020-05-12
The data provided herein is accelerometer and strain data taken from free vibration response of pre-tensioned, partially submerged steel beam specimens (modulus of elasticity assumed = 29,000 ksi). The specimens were subjected to various levels of pre-tension, and various levels of submersion in water. The purpose of the testing was to quantify the effects of partial submersion on the vibrating frequencies of pretensioned beams. Three specimens were tested, each with different cross section (but identical cross-sectional area). The different cross sections allow
investigation of the effects of specimen width as the specimen vibrates through water.
The testing procedure was as follows:
1) Apply a specified level of tension in the beam. Measure tension via 3 strain gages.
2) Submerge the specimens to a specified depth of water
3) Excite the beams with either a hammer impact or a pull-and-release method (physically pull the middle of the bar and quickly release)
4) Measure the free vibration of the beam with 2 accelerometers.
Schematic drawings of the test setup and the test specimens are provided, as is a picture of the test setup.
keywords:
free vibration; beam; partially-submerged; prestressed;
published:
2019-09-25
Wong, Tony; Hughes, A; Tokuda, K; Indebetouw, R; Onishi, T; Bandurski, J. B.; Chen, C. H. R.; Fukui, Y; Glover, S. C. O.; Klessen, R. S.; Pineda, J. L.; Roman-Duval, J.; Sewilo, M.; Wojciechowski, E.; Zahorecz, S.
(2019)
<sup>12</sup>CO and <sup>13</sup>CO maps for six molecular clouds in the Large Magellanic Cloud, obtained with the Atacama Large Millimeter/submillimeter Array (ALMA). See the associated article in the Astrophysical Journal, and README files within each ZIP archive. Please cite the article if you use these data.
keywords:
Radio astronomy
published:
2025-02-23
Bondarenko, Nikita; Podladchikov, Yury; Williams-Stroud, Sherilyn; Makhnenko, Roman
(2025)
Dataset with numerical routines and laboratory testing data associated with the manuscript: Bondarenko, N., Podladchikov, Y., Williams‐Stroud, S., & Makhnenko, R. (2025). Stratigraphy‐induced localization of microseismicity during CO2 injection in Illinois Basin. Journal of Geophysical Research: Solid Earth, 130, e2024JB029526. https://doi.org/10.1029/2024JB029526
keywords:
Illinois Basin Decatur Project; Induced Seismicity; GPU; Numerical modeling
published:
2023-06-29
Pandit, Akshay; Karakoc, Deniz Berfin; Konar, Megan
(2023)
This database provides estimates of agricultural and food commodity flows [in both tons and $US] between the US and China for the year 2017. Pairwise information is provided between US states and Chinese provinces, and US counties and Chinese provinces for 7 Standardized Classification of Transported Goods (SCTG) commodity categories. Additionally, crosswalks are provided to match Harmonized System (HS) codes and China's Multi-Regional Input Output (MRIO) commodity sectors to their corresponding SCTG commodity codes. The included SCTG commodities are:
- SCTG 01: Iive animals and fish
- SCTG 02: cereal grains
- SCTG 03: agricultural products (except for animal feed, cereal grains, and forage products)
- SCTG 04: animal feed, eggs, honey, and other products of animal origin
- SCTG 05: meat, poultry, fish, seafood, and their preparations
- SCTG 06: milled grain products and preparations, and bakery products
- SCTG 07: other prepared foodstuffs, fats and oils
For additional information, please see the related paper by Pandit et al. (2022) in Environmental Research Letters. ADD DOI WHEN RECEIVED
keywords:
Food flows; High-resolution; County-scale; Bilateral; United States; China
published:
2024-09-16
Wu, Steven; Smith, Hannah
(2024)
This dataset describes an analysis of research documents about the debate between hydrogen fuel cells and
lithium-ion batteries within the context of electric vehicles.
To create this dataset, we first analyzed news articles on the topic of sustainable development. We searched for related science using keywords in Google Scholar. We then identified subtopics and selected one specific subtopic: electric vehicles. We started to identify positions and players about electric vehicles [1].
Within electric vehicles, we started searching in OpenAlex for a topic of reasonable size (about 300 documents) related to a scientific or technical debate. We narrowed to electric vehicles and batteries, then trained a cluster model [2] on OpenAlex’s keywords to develop some possible search queries, and chose one.
Our final search query (May 7, 2024) returned 301 document in OpenAlex:
Title & abstract includes: Electric Vehicle + Hydrogen + Battery
filter is Lithium-ion Battery Management in Electric Vehicle
We used a Python script and the Scopus API to find missing abstracts and DOIs [3].
To identify relevant documents, we used a combination of Abstractkr [4] and manual screening. As a starting point for Abstractkr [4], one person manually screened 200 documents by checking the abstracts for “hydrogen fuel cells” and “battery comparisons”. Then we used Abstractkr [4] to predict the relevance of the remaining documents based on the title, abstract, and keywords. The settings we used were single screening, ordered by most likely to be relevant, and 0 pilot size. We set a threshold of 0.6 for the predictions. After screening and predictions, 176 documents remained
keywords:
controversy mapping; sustainable development; evidence synthesis; OpenAlex; Abstrackr; Scopus; meta-analysis; electric vehicle; hydrogen fuel cells; battery