Dataset
|
update: {"description"=>["Hype - PubMed dataset\r\nPrepared by Apratim Mishra\r\n\r\nThis dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences.\r\n\r\nThe candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’.\r\n\r\nThis is version 2 of the dataset. Changes include:\r\n\r\nAdded “Year” variable.\r\nRemoved “Abstract length” variable.\r\nModified variable information due to updated probabilistic model of hype.\r\nNumber of hype words - 35 (updated from 36 based on revised findings).\r\n\r\nFile 1: hype_dataset_final.tsv\r\n\r\nPrimary dataset. It has the following columns:\r\n\r\n1. PMID: represents unique article ID in PubMed\r\n2. Year: Year of publication\r\n3. Hype_word: Candidate hype word, such as ‘novel.’\r\n4. Sentence: Sentence in abstract containing the hype word.\r\n5. Hype_percentile: Abstract relative position of hype word.\r\n6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location.\r\n7. Introduction: The ‘I’ component of the hype word based on IMRaD\r\n8. Methods: The ‘M’ component of the hype word based on IMRaD\r\n9. Results: The ‘R’ component of the hype word based on IMRaD\r\n10. Discussion: The ‘D’ component of the hype word based on IMRaD\r\n\r\nFile 2: hype_removed_phrases_final.tsv\r\n\r\nSecondary dataset with same columns as File 1.\r\nHype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases:\r\n\r\n1. Major: histocompatibility, component, protein, metabolite, complex, surgery\r\n2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid\r\n3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment\r\n4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values\r\n5. Essential: medium, features, properties, opportunities, oil\r\n6. Unique: model, amino\r\n7. Robust: regression\r\n8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information\r\n9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains\r\n10. Remarkable: properties\r\n11. Definite: radiotherapy, surgery", "Hype - PubMed dataset\r\nPrepared by Apratim Mishra\r\n\r\nThis dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences.\r\n\r\nThe candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’.\r\n\r\nThis is version 3 of the dataset. Added new file - WSD_hype.tsv\r\n\r\nFile 1: hype_dataset_final.tsv\r\n\r\nPrimary dataset. It has the following columns:\r\n\r\n1. PMID: represents unique article ID in PubMed\r\n2. Year: Year of publication\r\n3. Hype_word: Candidate hype word, such as ‘novel.’\r\n4. Sentence: Sentence in abstract containing the hype word.\r\n5. Hype_percentile: Abstract relative position of hype word.\r\n6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location.\r\n7. Introduction: The ‘I’ component of the hype word based on IMRaD\r\n8. Methods: The ‘M’ component of the hype word based on IMRaD\r\n9. Results: The ‘R’ component of the hype word based on IMRaD\r\n10. Discussion: The ‘D’ component of the hype word based on IMRaD\r\n\r\nFile 2: hype_removed_phrases_final.tsv\r\n\r\nSecondary dataset with same columns as File 1.\r\nHype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases:\r\n\r\n1. Major: histocompatibility, component, protein, metabolite, complex, surgery\r\n2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid\r\n3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment\r\n4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values\r\n5. Essential: medium, features, properties, opportunities, oil\r\n6. Unique: model, amino\r\n7. Robust: regression\r\n8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information\r\n9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains\r\n10. Remarkable: properties\r\n11. Definite: radiotherapy, surgery\r\n\r\nFile 3: WSD_hype.tsv\r\nIncludes hype-based disambiguation for candidate words targeted for WSD (Word sense disambiguation)"]}
|
2025-03-13T17:20:58Z
|