Illinois Data Bank

TextTransfer: Datasets for Impact Detection

Impact assessment is an evolving area of research that aims at measuring and predicting the potential effects of projects or programs. Measuring the impact of scientific research is a vibrant subdomain, closely intertwined with impact assessment. A recurring obstacle pertains to the absence of an efficient framework which can facilitate the analysis of lengthy reports and text labeling. To address this issue, we propose a framework for automatically assessing the impact of scientific research projects by identifying pertinent sections in project reports that indicate the potential impacts. We leverage a mixed-method approach, combining manual annotations with supervised machine learning, to extract these passages from project reports. This is a repository to save datasets and codes related to this project.

Please read and cite the following paper if you would like to use the data:
Becker M., Han K., Werthmann A., Rezapour R., Lee H., Diesner J., and Witt A. (2024). Detecting Impact Relevant Sections in Scientific Research. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).

This folder contains the following files:
evaluation_20220927.ods: Annotated German passages (Artificial Intelligence, Linguistics, and Music) - training data
annotated_data.big_set.corrected.txt: Annotated German passages (Mobility) - training data
incl_translation_all.csv: Annotated English passages (Artificial Intelligence, Linguistics, and Music) - training data
incl_translation_mobility.csv: Annotated German passages (Mobility) - training data
ttparagraph_addmob.txt: German corpus (unannotated passages)
model_result_extraction.csv: Extracted impact-relevant passages from the German corpus based on the model we trained
rf_model.joblib: The random forest model we trained to extract impact-relevant passages

Data processing codes can be found at: https://github.com/khan1792/texttransfer

Social Sciences
impact detection; project reports; annotation; mixed-methods; machine learning
CC0
German Federal Ministry of Education and Research-Grant:01IO1634
Kanyao Han
459 times
Version DOI Comment Publication Date
1 10.13012/B2IDB-9934303_V1 2024-03-21

1.06 MB File
602 KB File
2.57 MB File
2.88 MB File
160 MB File
2.26 MB File
43.6 MB File

Contact the Research Data Service for help interpreting this log.

RelatedMaterial create: {"material_type"=>"Conference paper", "availability"=>nil, "link"=>"https://aclanthology.org/2024.lrec-main.424", "uri"=>"https://aclanthology.org/2024.lrec-main.424", "uri_type"=>"URL", "citation"=>"Maria Becker, Kanyao Han, Antonina Werthmann, Rezvaneh Rezapour, Haejin Lee, and Jana Diesner. 2024. Detecting Impact Relevant Sections in Scientific Research. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4744–4749, Torino, Italy. ELRA and ICCL.", "dataset_id"=>2664, "selected_type"=>"Other", "datacite_list"=>"IsSupplementTo", "note"=>nil, "feature"=>nil} 2024-05-20T18:12:28Z
RelatedMaterial update: {"uri"=>["", "https://github.com/khan1792/texttransfer"], "uri_type"=>["", "URL"], "citation"=>["", "https://github.com/khan1792/texttransfer"], "datacite_list"=>["", "IsSupplementTo"]} 2024-05-20T18:12:28Z
RelatedMaterial update: {"uri"=>[nil, ""], "uri_type"=>[nil, ""], "datacite_list"=>[nil, ""], "note"=>[nil, ""], "feature"=>[nil, false]} 2024-03-22T14:10:01Z
Dataset update: {"subject"=>["", "Social Sciences"]} 2024-03-22T14:10:01Z
RelatedMaterial create: {"material_type"=>"Code", "availability"=>nil, "link"=>"https://github.com/khan1792/texttransfer", "uri"=>nil, "uri_type"=>nil, "citation"=>"", "dataset_id"=>2664, "selected_type"=>"Code", "datacite_list"=>nil, "note"=>nil, "feature"=>nil} 2024-03-22T00:07:31Z
Funder update: {"name"=>["Leibniz Institute for the German Language", "German Federal Ministry of Education and Research"], "grant"=>["", "01IO1634"]} 2024-03-22T00:07:31Z
Dataset update: {"keywords"=>["impact detection, project reports, annotation, mixed-methods, machine learning", "impact detection; project reports; annotation; mixed-methods; machine learning"], "version_comment"=>[nil, ""], "subject"=>[nil, ""]} 2024-03-21T20:01:32Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us