Illinois Data Bank - Dataset

Version DOI Comment Publication Date
3 10.13012/B2IDB-0651259_V3 Include new data 2025-03-14
2 10.13012/B2IDB-0651259_V2 The dataset was modified due to revision requirements from the journal of submission. 2025-01-29
1 10.13012/B2IDB-0651259_V1 2024-03-09

3.94 KB File
103 KB File
1.28 GB File
64 MB File

Contact the Research Data Service for help interpreting this log.

Dataset update: {"publication_state"=>["version candidate under curator review", "released"], "release_date"=>[nil, Fri, 14 Mar 2025]} 2025-03-14T17:00:43Z
Dataset update: {"description"=>["Hype - PubMed dataset\r\nPrepared by Apratim Mishra\r\n\r\nThis dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences.\r\n\r\nThe candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’.\r\n\r\nThis is version 2 of the dataset. Changes include:\r\n\r\nAdded “Year” variable.\r\nRemoved “Abstract length” variable.\r\nModified variable information due to updated probabilistic model of hype.\r\nNumber of hype words - 35 (updated from 36 based on revised findings).\r\n\r\nFile 1: hype_dataset_final.tsv\r\n\r\nPrimary dataset. It has the following columns:\r\n\r\n1. PMID: represents unique article ID in PubMed\r\n2. Year: Year of publication\r\n3. Hype_word: Candidate hype word, such as ‘novel.’\r\n4. Sentence: Sentence in abstract containing the hype word.\r\n5. Hype_percentile: Abstract relative position of hype word.\r\n6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location.\r\n7. Introduction: The ‘I’ component of the hype word based on IMRaD\r\n8. Methods: The ‘M’ component of the hype word based on IMRaD\r\n9. Results: The ‘R’ component of the hype word based on IMRaD\r\n10. Discussion: The ‘D’ component of the hype word based on IMRaD\r\n\r\nFile 2: hype_removed_phrases_final.tsv\r\n\r\nSecondary dataset with same columns as File 1.\r\nHype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases:\r\n\r\n1. Major: histocompatibility, component, protein, metabolite, complex, surgery\r\n2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid\r\n3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment\r\n4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values\r\n5. Essential: medium, features, properties, opportunities, oil\r\n6. Unique: model, amino\r\n7. Robust: regression\r\n8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information\r\n9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains\r\n10. Remarkable: properties\r\n11. Definite: radiotherapy, surgery", "Hype - PubMed dataset\r\nPrepared by Apratim Mishra\r\n\r\nThis dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences.\r\n\r\nThe candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’.\r\n\r\nThis is version 3 of the dataset. Added new file - WSD_hype.tsv\r\n\r\nFile 1: hype_dataset_final.tsv\r\n\r\nPrimary dataset. It has the following columns:\r\n\r\n1. PMID: represents unique article ID in PubMed\r\n2. Year: Year of publication\r\n3. Hype_word: Candidate hype word, such as ‘novel.’\r\n4. Sentence: Sentence in abstract containing the hype word.\r\n5. Hype_percentile: Abstract relative position of hype word.\r\n6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location.\r\n7. Introduction: The ‘I’ component of the hype word based on IMRaD\r\n8. Methods: The ‘M’ component of the hype word based on IMRaD\r\n9. Results: The ‘R’ component of the hype word based on IMRaD\r\n10. Discussion: The ‘D’ component of the hype word based on IMRaD\r\n\r\nFile 2: hype_removed_phrases_final.tsv\r\n\r\nSecondary dataset with same columns as File 1.\r\nHype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases:\r\n\r\n1. Major: histocompatibility, component, protein, metabolite, complex, surgery\r\n2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid\r\n3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment\r\n4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values\r\n5. Essential: medium, features, properties, opportunities, oil\r\n6. Unique: model, amino\r\n7. Robust: regression\r\n8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information\r\n9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains\r\n10. Remarkable: properties\r\n11. Definite: radiotherapy, surgery\r\n\r\nFile 3: WSD_hype.tsv\r\nIncludes hype-based disambiguation for candidate words targeted for WSD (Word sense disambiguation)"]} 2025-03-13T17:20:58Z
Dataset update: {"hold_state"=>["version candidate under curator review", "none"]} 2025-03-13T15:42:56Z
Dataset update: {"version_comment"=>[nil, "Include new data"]} 2025-03-13T04:35:22Z
RelatedMaterial create: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-0651259_V2", "uri"=>"10.13012/B2IDB-0651259_V2", "uri_type"=>"DOI", "citation"=>"Mishra, Apratim; Diesner, Jana; Torvik, Vetle I. (2025): Hype - PubMed dataset. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-0651259_V2", "dataset_id"=>2917, "selected_type"=>"Dataset", "datacite_list"=>"IsNewVersionOf", "note"=>nil, "feature"=>nil} 2025-03-13T04:35:06Z
Creator create: {"family_name"=>"Torvik", "given_name"=>"Vetle I.", "identifier"=>"0000-0002-0035-1850", "email"=>"jdiesner@illinois.edu", "is_contact"=>false, "row_position"=>3} 2025-03-13T04:35:06Z
Creator create: {"family_name"=>"Diesner", "given_name"=>"Jana", "identifier"=>"0000-0001-8183-7109", "email"=>"vtorvik@illinois.edu", "is_contact"=>false, "row_position"=>2} 2025-03-13T04:35:05Z
Creator create: {"family_name"=>"Mishra", "given_name"=>"Apratim", "identifier"=>"0000-0002-2946-308X", "email"=>"apratim3@illinois.edu", "is_contact"=>true, "row_position"=>1} 2025-03-13T04:35:04Z
Dataset update: {"corresponding_creator_name"=>[nil, "Apratim Mishra"], "corresponding_creator_email"=>[nil, "apratim3@illinois.edu"]} 2025-03-13T04:35:04Z