published: 2024-05-24
This dataset consists the 286 publications retrieved from Web of Science and Scopus on July 6, 2023 as citations for (Willoughby et al., 2014): Willoughby, Patrick H., Jansma, Matthew J., & Hoye, Thomas R. (2014). A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042 We added the DOIs of the citing publications into a Zotero collection, which we exported into a .csv file and an .rtf file. Willoughby2014_286citing_publications.csv is a Zotero data export of the citing publications. Willoughby2014_286citing_publications.rtf is a bibliography of the citing publications, using a variation of American Psychological Association style (7th edition) with full names instead of initials. We developed an automation system to analyze unreliability propagation through the publications citing an unreliable publication: Willoughby et al., 2014 (one of the Python scripts that supported the protocol presented in this publication has a code glitch). We call a publication "unreliable by propagation" when its main findings have become unreliable by citing an unreliable source. The system triaged the citing publications that are in English (284) according to whether they are at risk because of citing Willoughby et al., 2014. We excluded 2 publications that are not in English, their DOIs are 10.13220/j.cnki.jipr.2015.06.004 and 10.19540/j.cnki.cjcmm.20200604.201. We compared the accuracy of the system's triage with a separate manual analysis the chemistry expert (YF) conducted on the 284 citing publications. 284_merged_decision_and_annotation.csv (new in this V2) shows the system triage results and the results of a chemistry domain expert (YF)'s manual analysis on the 284 citing publications.
keywords: scientific publications; arguments; citation contexts; defeasible reasoning; Zotero; Web of Science; Scopus; unreliable cited sources; automation systems; knowledge maintenance
published: 2024-05-23
This dataset contains the training results (model parameters, outputs), datasets for generalization testing, and 2-D implementation used in the article "Learned 1-D passive scalar advection to accelerate chemical transport modeling: a case study with GEOS-FP horizontal wind fields." The article will be submitted to Artificial Intelligence for Earth Systems. The datasets are saved as CSV for 1-D time-series data and *netCDF for 2-D time series dataset. The model parameters are saved in every training epoch tested in the study.
keywords: Air quality modeling; Coarse-graining; GEOS-Chem; Numerical advection; Physics-informed machine learning; Transport operator
published: 2024-05-23
This dataset consists of all the figure files that are part of the main text and supplementary of the manuscript titled "Optical manipulation of the charge density wave state in RbV3Sb5". For detailed information on the individual files refer to the readme file.
keywords: kagome superconductor; optics; charge density wave
published: 2024-05-04
Data from manuscript Atomic-Scale Visualization of a Cascade of Magnetic Orders in the Layered Antiferromagnet GdTe3, to be published in npj Quantum Materials. Powerpoint file has details on how the data can be opened and how the data are labeled.
keywords: Scanning Tunneling Microscopy; Physics; GdTe3; Rare-Earth Tritellurides
published: 2024-03-21
Impact assessment is an evolving area of research that aims at measuring and predicting the potential effects of projects or programs. Measuring the impact of scientific research is a vibrant subdomain, closely intertwined with impact assessment. A recurring obstacle pertains to the absence of an efficient framework which can facilitate the analysis of lengthy reports and text labeling. To address this issue, we propose a framework for automatically assessing the impact of scientific research projects by identifying pertinent sections in project reports that indicate the potential impacts. We leverage a mixed-method approach, combining manual annotations with supervised machine learning, to extract these passages from project reports. This is a repository to save datasets and codes related to this project. Please read and cite the following paper if you would like to use the data: Becker M., Han K., Werthmann A., Rezapour R., Lee H., Diesner J., and Witt A. (2024). Detecting Impact Relevant Sections in Scientific Research. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING). This folder contains the following files: evaluation_20220927.ods: Annotated German passages (Artificial Intelligence, Linguistics, and Music) - training data annotated_data.big_set.corrected.txt: Annotated German passages (Mobility) - training data incl_translation_all.csv: Annotated English passages (Artificial Intelligence, Linguistics, and Music) - training data incl_translation_mobility.csv: Annotated German passages (Mobility) - training data ttparagraph_addmob.txt: German corpus (unannotated passages) model_result_extraction.csv: Extracted impact-relevant passages from the German corpus based on the model we trained rf_model.joblib: The random forest model we trained to extract impact-relevant passages Data processing codes can be found at: https://github.com/khan1792/texttransfer
keywords: impact detection; project reports; annotation; mixed-methods; machine learning
published: 2024-04-18
Data: Variation in pesticide toxicity in the western honey bee (Apis mellifera) associated with consuming phytochemically different monofloral honeys Includes: Identification and quantification of phenolic components of honeys: Raw_data_JOCE.xlsx – sheet: “HoneyPhytochemicals” Effects of honey phytochemicals on acute pesticide toxicity: Raw_data_JOCE.xlsx – sheet: “raw_LD50 Raw_data_JOCE.xlsx – sheet: “raw_LD50_hive_based”
keywords: Honey; honey bee; phenolic acid; flavonoids; bifenthrin; LD50
published: 2016-05-19
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords: taxi;transportation;New York City;GPS
published: 2023-07-14
This dataset includes a total of 300 images of 45 extant species of Podocarpus (Podocarpaceae) and nine images of fossil specimens of the morphogenus Podocarpidites. The goal of this dataset is to capture the diversity of morphology within the genus and create an image database for training machine learning models. The images were taken using Airyscan confocal superresolution microscopy at 630x magnification (63x/NA 1.4 oil DIC). The images are in the CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or open microscopy software, such as ImageJ. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/us/products/microscope-software/zen/czi.html#microscope---image-data]
keywords: superresolution microscopy; Zeiss Airyscan; CZI images; conifer; saccate pollen
published: 2019-11-11
This repository includes scripts and datasets for the paper, "FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models." Note: The results from estimating species trees with ASTRID-multi (included in this repository) are *not* included in the FastMulRFS paper. We estimated species trees with ASTRID-multi in the fall of 2019, but ASTRID-multi had an important bug fix in January 2020. Therefore, the ASTRID-multi species trees in this repository should be ignored.
keywords: Species tree estimation; gene duplication and loss; statistical consistency; MulRF, FastRFS
published: 2020-09-07
This dataset contains BEPAM model code and input data to the replicate the results for "Assessing the Returns to Land and Greenhouse Gas Savings from Producing Energy Crops on Conservation Reserve Program Land." The dataset consists of: (1) The replication codes and data for the BEPAM model. The code file is named as output_0213-2020_Complete_daycent-agversion-[rental payment level]%_[biomass price].gms. (BEPAM-CRP model-Sep2020.zip) (2) Simulation results from the BEPAM model (BEPAM_Simulation_Results.csv) * Item (1) is in GAMS format. Item (2) is in text format.
keywords: Miscanthus; Switchgrass; soil carbon sequestration; greenhouse gas savings; rental payments; biomass price
published: 2021-03-05
Datasets that accompany Beilke, Blakey, and O'Keefe 2021 publication (Title: Bats partition activity in space and time in a large, heterogeneous landscape; Journal: Ecology and Evolution).
keywords: spatiotemporal; chiroptera
published: 2021-04-18
This dataset contains all the code, notebooks, datasets used in the study conducted for the research publication titled "Multi-scale CyberGIS Analytics for Detecting Spatiotemporal Patterns of COVID-19 Data". Specifically, this package include the artifacts used to conduct spatial-temporal analysis with space time kernel density estimation (STKDE) using COVID-19 data, which should help readers to reproduce some of the analysis and learn about the methods that were conducted in the associated book chapter. ## What’s inside - A quick explanation of the components of the zip file * Multi-scale CyberGIS Analytics for Detecting Spatiotemporal Patterns of COVID-19.ipynb is a jupyter notebook for this project. It contains codes for preprocessing, space time kernel density estimation, postprocessing, and visualization. * data is a folder containing all data needed for the notebook * data/county.txt: US counties information and fip code from Natural Resources Conservation Service. * data/us-counties.txt: County-level COVID-19 data collected from New York Times COVID-19 github repository on August 9th, 2020. * data/covid_death.txt: COVID-19 death information derived after preprocessing step, preparing the input data for STKDE. Each record is if the following format (fips, spatial_x, spatial_y, date, number of death ). * data/stkdefinal.txt: result obtained by conducting STKDE. * wolfram_mathmatica is a folder for 3D visulization code. * wolfram_mathmatica/Visualization.nb: code for visulization of STKDE result via weolfram mathmatica. * img is a folder for figures. * img/above.png: result of 3-D visulization result, above view. * img/side.png: result of 3-D visulization, side view.
keywords: CyberGIS; COVID-19; Space-time kernel density estimation; Spatiotemporal patterns
published: 2021-05-13
Data files and R code to replicate the econometric analysis in the journal article: B Chen, BM Gramig and SD Yun. “Conservation Tillage Mitigates Drought Induced Soybean Yield Losses in the US Corn Belt.” Q Open. https://doi.org/10.1093/qopen/qoab007
keywords: R, Conservation Tillage, Drought, Yield, Corn, Soybeans, Resilience, Climate Change
published: 2022-04-11
This data set contains all the map data used for "Quantifying transportation energy vulnerability and its spatial patterns in the United States". The multiple dimensions (i.e., exposure, sensitivity, adaptive capacity) of transportation energy vulnerability (TEV) at the census tract level in the United States, the changes in TEV with electric vehicles adoption, and the detailed data for Chicago, Los Angeles, and New York are in the dataset.
keywords: Transport energy; Vulnerability; Fuel costs; Electric vehicles
published: 2024-05-13
Supplemental data for the paper titled 'Environmental modulators of algae-bacteria interactions at scale'. Each of the excel workbooks corresponding to datasets 1, 2, and 3 contain a README sheet explaining the reported data. Dataset 4 comprising microscopy data contains a README text file describing the image files.
keywords: Algae-bacteria interactions; high-throughput; microfluidic-droplet platform
published: 2021-04-16
This dataset includes five files developed using the procedures described in the article 'Developing County-level Data of Nitrogen Fertilizer and Manure Inputs for Corn Production in the United States' and Supplemental Information published in the Journal of Cleaner Production in 2021. Citation: Xia, Yushu, Hoyoung Kwon, and Michelle Wander. "Developing county-level data of nitrogen fertilizer and manure inputs for corn production in the United States." Journal of Cleaner Production 309 (2021): e126957. Brief method: The fertilizer and manure inputs for corn were generated with a top-down approach by assigning county-level total N inputs reported by USGS to different crops using state- and county-level survey data. The corn N needs were estimated using empirical extension-based equations coupled with soil and environmental covariates. The estimates of fertilizer N inputs were further refined for corn grain and silage production at the county level and gap-filling (using state-level averages) was carried out to generate final files for U.S. county-level N inputs. The dataset is provided in an alternative format in Google Earth Engine: https://code.earthengine.google.com/13a0078e7ee727bc001e045ad0e8c6fc
keywords: Corn; Nitrogen Fertilizer; Manure; Conterminous U.S.
published: 2024-05-13
Survey questions and data collected from Illinois land managers on practices and knowledge relating to impacts to wildlife. 0s indicated non-selection, 1s indicate selection of answer.
keywords: forestry management; online survey; wildlife
published: 2024-05-10
The data provided in this submission are the gene annotations for the Illinois EBP pilot project samples, as well as the predicted proteins for each sample in FASTA format.
keywords: Earth Biogenome Project;genome assembly;Insecta;non-model species;sequencing;annotation
published: 2023-11-14
This repository contains the training dataset associated with the 2023 Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics (DGM-Image Challenge), hosted by the American Association of Physicists in Medicine. This dataset contains more than 100,000 8-bit images of size 512x512. These images emulate coronal slices from anthropomorphic breast phantoms adapted from the VICTRE toolchain [1], with assigned X-ray attenuation coefficients relevant for breast computed tomography. Also included are the labels indicating the breast type. The challenge has now concluded. More information about the challenge can be found here: <a href="https://www.aapm.org/GrandChallenge/DGM-Image/">https://www.aapm.org/GrandChallenge/DGM-Image/</a>. * New in V3: we added a CSV file containing the image breast type labels and example images (PNG).
keywords: Deep generative models; breast computed tomography
published: 2019-06-13
This lexicon is the expanded/enhanced version of the Moral Foundation Dictionary created by Graham and colleagues (Graham et al., 2013). Our Enhanced Morality Lexicon (EML) contains a list of 4,636 morality related words. This lexicon was used in the following paper - please cite this paper if you use this resource in your work. Rezapour, R., Shah, S., & Diesner, J. (2019). Enhancing the measurement of social effects by capturing morality. Proceedings of the 10th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA). Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, MN. In addition, please consider citing the original MFD paper: <a href="https://doi.org/10.1016/B978-0-12-407236-7.00002-4">Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology (Vol. 47, pp. 55-130)</a>.
keywords: lexicon; morality
planned publication date: 2024-07-31
This dataset contains all data and supplementary materials from "Improving precision and accuracy of genetic mapping with genotyping-by-sequencing data in outcrossing species". An Excel file a list of all QTLs and linkage group length (in cM) obtained with two different SNP-calling methods (Tassel-Uneak and Tassel-GBS), genetic map-construction method (linkage-only and reference order-corrected) and depth filters (12x, 20x, 30x and 40x) for genetic mapping of 18 biomass yield traits in a biparental Miscanthus sinensis population using RAD-Seq SNPs is provided as "Supplementary file 1". A Perl script with the code for filtering VCF and HapMap-formatted data files is provided as “Supplementary file 2”. Phenotype data used for QTL mapping is provided as “Supplementary File 3”. A Perl script with the code for the simulation study is provided as “Supplementary file 4”.
keywords: HapMapParser; GenotypingSimulator
published: 2024-05-07
Photographs and video of two Lesser Chameleons (Furcifer minor) nesting together at the same time near Itremo, Madagascar.
keywords: reproductive biology; ecology; Madagascar; lizard; eggs; reptile
published: 2024-05-07
Optical, AFM, and PFM image of α-In2Se3; Short-circuit current and open circuit voltage maps, I-V curve for different intensities; Dependence of the short-circuit current density, open-circuit voltage, depolarization field, and efficiency on intensity and thickness; Benchmarking the performance.
published: 2024-05-07
This dataset builds on an existing dataset which captures artists’ demographics who are represented by top tier galleries in the 2016–2017 New York art season (Case-Leal, 2017, https://web.archive.org/web/20170617002654/http://www.havenforthedispossessed.org/) with a census of reviews and catalogs about those exhibitions to assess proportionality of media coverage across race and gender. The readme file explains variables, collection, relationship between the datasets, and an example of how the Case-Leal dataset was transformed. The ArticleDataset.csv provides all articles with citation information as well as artist, artistic identity characteristic, and gallery. The ExhibitionCatalog.csv provides exhibition catalog citation information for each identified artist.
keywords: diversity and inclusion; diversity audit; contemporary art; art exhibitions; art exhibition reviews; exhibition catalogs; magazines; newspapers; demographics