Illinois Data Bank

Comparison of NLM Publication Type Indexing in 2016 and 2025 vs. the Multi-Tagger Model vs. the Transformer Metadata Model

[Title:] Comparison of NLM Publication Type Indexing in 2016 and 2025 vs. the Multi-Tagger Model vs. the Transformer Metadata Model

[Contributors:] Puranjani Das, Jodi Schneider, Evan Mayo-Wilson, Dongin Nam, Kiran Ninan, Jean-Pierre Oberste, Ang Michael Troy, Xiangji Ying, Arthur W. Holt, Neil R. Smalheiser

[Publisher:] University of Illinois Urbana-Champaign Databank

[Publication Year:] 2026

[Funding:] This work was supported by National Institutes of Health (NIH)/National Library of Medicine (NLM) grant number R01LM014292.

[Keywords:] Study designs; evidence-based medicine; databases, bibliographic; Indexing; Study Characteristics; National Library of Medicine (U.S.)

[License:] CC-BY

[Corresponding Creators:] Puranjani Das (puranjanidas02@gmail.com)

[Preferred Citation:] Das, Puranjani; Schneider, Jodi; Mayo-Wilson, Evan; Nam, Dongin; Ninan, Kiran; Oberste, Jean-Pierre; Troy, Ang Michael; Ying, Xiangji; Holt, Arthur W.; Smalheiser, Neil R. (2026): Comparison of NLM Publication Type Indexing in 2016 and 2025 vs. the Multi-Tagger Model vs. the Transformer Metadata Model. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-8307201_V1

[Related Articles:] Puranjani Das, Jodi Schneider, Evan Mayo-Wilson, Halil Kilicoglu, Joe D. Menke, Dongin Nam, Kiran Ninan, Jean-Pierre Oberste, Ang Michael Troy, Xiangji Ying, Arthur W. Holt, Neil R. Smalheiser. Study design indexing in transition: A focused comparison of manual NLM indexing vs. transformer-based automated models.
[For an updated list please check the data description metadata field in the Illinois Databank]

[Data Description:] The raw data from this study was further filtered and analyzed in the accompanying preprint 'When Study Design Indexing Systems Disagree: A Focused Comparison of NLM Indexing vs. a Transformer-based Automated Model?' (mentioned in Related Articles)

The entire sample of data is articles (title, abstract) taken from PubMed which are MeSH-indexed by the National Library of Medicine (NLM) from 2016 and from 2025. We are specifically looking at 5 study designs - case-control study, case report, cross-sectional study, cohort study, and systematic reviews. The indexing of NLM is compared with 2 automated indexing models - Multi-Tagger ('MT') and Transformer Metadata model ('TM'). The value for TM and MT are marked as TRUE if the score given by them are above a particular F1 threshold [1,2]. For NLM we marked the value as TRUE if NLM indexed the article with the given study design and FALSE if it did not.
In the sample, we divided the articles in 7 categories:
TTT - MT, TM and NLM marked TRUE
TTF - MT and TM marked TRUE and NLM marked FALSE
TFT - MT marked TRUE, TM marked FALSE, NLM marked TRUE
TFF - MT marked TRUE, TM and NLM marked FALSE
FTT - MT marked FALSE, TM and NLM marked TRUE
FTF - MT marked FALSE, TM marked TRUE, NLM marked FALSE
FFT - MT and TM marked FALSE, NLM marked TRUE

For manual annotation of the article we selected a subset of the sample- the top 100 articles from the TTF and the bottom FFT categories according to TM predictive score. The manual annotation was performed by Evan Mayo-Wilson, Kiran Ninan, Jean-Pierre Oberste, Xiangji Ying, Ang Michael Troy and Neil R. Smalheiser.

FILE FORMATS: There are 6 types of CSV files in this data deposit. There are 2 text files README.txt and Definitions and annotator notes for manual examination.txt

FILE NAMES:
There are 6 types of csv files in this deposit:
a) Sample summary.csv: This file contain the count of each category of articles. It also contains the F1 threshold for TM and MT for based on which the articles were marked as TRUE or FALSE by TM and MT.

b) Sample statistics.csv: This file contains the statistics of the entire sample based on the sample summary.csv file.

c) Bin-wise distribution of manual annotation sheets.csv: This file contains which bin value (A-P) corresponds to which manual annotation spreadsheet.

d) Manual Annotation file(s): There are 16 CSV files of this type. The name of the file is in the format 'Manual Annotation [STUDY DESIGN (YEAR1, CATEGORY)]'
Manual Annotation cohort studies (2016 TTF).csv
Manual Annotation cohort studies (2016 FFT).csv
Manual Annotation cohort studies (2025 TTF).csv
Manual Annotation cohort studies (2025 FFT).csv
Manual Annotation case-control studies (2016 TTF).csv
Manual Annotation case-control studies (2016 FFT).csv
Manual Annotation case-control studies (2025 TTF).csv
Manual Annotation case-control studies (2025 FFT).csv
Manual Annotation case reports (2016 TTF).csv
Manual Annotation case reports (2016 FFT).csv
Manual Annotation case reports (2025 TTF).csv
Manual Annotation case reports (2025 FFT).csv
Manual Annotation cross-sectional studies (2016 TTF).csv
Manual Annotation cross-sectional studies (2016 FFT).csv
Manual Annotation cross-sectional studies (2025 TTF).csv
Manual Annotation cross-sectional studies (2025 FFT).csv

e) Inter-annotator agreement-disagreement numbers before reconciliation.csv: This CSV file contains four numbers of articles for each study design and year. Three that the annotators agreed on as: of the study design, not be of the study design and uncertain if of the study design or not. It also contains the number of articles the annotators disagreed on. The total number of articles in each row add up to 100.

f) F1, recall and precision of MT and TM.csv: This CSV file contain the F1, recall and precision score outputted by the models TM and MT

There are 2 text files in this deposit:
- README.txt: This contains elaborate description of the files in this deposit.
- Definitions and annotator notes for manual examination.txt: This file contain the definition and annotation notes the annotators used for manually annotating the articles. The 16 manual annotation CSV files are created based on this document.

COLUMN HEADER EXPLANATIONS:
a) Sample summary.csv:
Study Design: The study design for which we are checking the predictions of TM and MT and the indexing from NLM
Year: The year for which we are checking the prediction of TM and MT and the indexing from NLM for the study design mentioned in the column 'Study Design'
F1 threshold for MT: The score greater than which articles are marked as TRUE (for the study design mentioned in column 'Study Design') by MT
F1 threshold for TM: The score greater than which articles are marked as TRUE (for the study design mentioned in column 'Study Design') by TM
MT binary value at F1 threshold: TRUE or FALSE based on the threshold mentioned in column 'F1 threshold for MT'.
TM binary value at F1 threshold: TRUE or FALSE based on the threshold mentioned in column 'F1 threshold for TM'.
NLM binary value: TRUE if the articles are indexed by NLM as of the study design mentioned in column 'Study Design' and FALSE if not.
Category: Values selected from the 7 categories TTT, TTF, TFT, TFF, FTT, FTF, FFT described above in the Data Description section.
Count (Total): Total number of articles in the category described in the column 'Category'
Count (TM with score >= 0.9) - Total number of articles in the category described in the column 'Category', for TM gave a score equal to or above 0.9 for the study design mentioned in the column 'Study Design'
Count (TM with score <= 0.1): Total number of articles in the category described in the column 'Category', for TM gave a score equal to or below 0.1 for the study design mentioned in the column 'Study Design'

Example Row [The comma (,) is replaced with bar (|)]:
cohort study| 2016| 0.2347| 0.314| TTT| TRUE| TRUE| TRUE| 3398| 20| 0
Interpretation:
- For articles from 2016, we check their TRUE/FALSE marking according to TM, MT and NLM, with respect to the study design cohort study.
- For 2016, MT marks TRUE if the predictive score given is greater than 0.2347.
- For 2016, TM marks TRUE if the predictive score given is greater than 0.314.
- For the category of TTT, we define them as MT, TM and NLM marked as TRUE.
- In 2016 with respect to the study design cohort study, total number of articles in the TTT category was 3398.
- In 2016 with respect to the study design cohort study, total number of articles in the TTT category, which had a predictive score equal to or greater than 0.9, was 20.
- In 2016 with respect to the study design cohort study, total number of articles in the TTT category, which had a predictive score equal to or lower than 0.1, was 0.

b) Sample statistics.csv
Study Design: The study design for which we are checking the TRUE predictions from TM and MT and indexing from NLM
Year: The year for which we are checking the TRUE predictions from TM and MT and indexing from NLM for the study design mentioned in the column 'Study Design'
Indexing approach: The indexing approach whose TRUE predictions (TM, MT) or indexing (NLM) is being checked.
Total TRUE: Number of articles in the year (mentioned in the column 'Year') that was marked TRUE by the given indexing approach (mentioned in the column 'Indexing approach')
Correct TRUE: Number of articles in the year, mentioned in the column 'Year', that was marked TRUE by the given indexing approach, mentioned in the column 'Indexing approach', and NLM
Correct TRUE (%): (Correct TRUE/Total number of articles marked as TRUE by NLM)*100%
[The columns 'Correct TRUE' and 'Correct TRUE (%)' are empty when the 'indexing approach' column is NLM, because 'Correct TRUE' and its percentage is calculated based on NLM. Therefore, 'Correct TRUE' for indexing approach NLM will always be the 'Total TRUE' for NLM and the 'Correct TRUE (%)' will always be 100%. ]

Example Row:
cohort study| 2016| MT| 12389| 3632| 36%
Interpretation:
- For articles from 2016, we are checking their TRUE/FALSE marking from TM, MT and NLM, with respect to the study design cohort study.
- For 2016, 12389 articles were marked as TRUE by MT for cohort study
- For 2016, 3632 articles marked as TRUE by MT were also marked as TRUE by NLM for cohort study
- For 2016, 36% articles marked as TRUE by MT were also marked as TRUE by NLM for cohort study

c) Bin-wise distribution of manual annotation sheets.csv
Bin: Named from A-P (used for internal organization of the manual annotation spreadsheets)
Study Type: The study design for which the manual examination is done
Year: The year for which the manual examination is done
Category: The categories for which the manual examination is done. There are 7 categories described above, but manual annotation is only done for two of them TTF and FFT. Therefore this field will contain either TTF or FFT.

d) Manual Annotation file(s):
There are two groups of manual annotation files annotated by 2 groups of annotators. The column headers are different based on the group which annotated the files. Here it is to be mentioned that the blank cells are intentionally kept because they annotators did not find any valuable information to be recorded for those cells.

The first group of annotators (Evan Mayo-Wilson, Kiran Ninan, Jean-Pierre Oberste, Xiangji Ying) annotated the CSV files which have cohort studies and case-control studies mentioned in their names. Their columns are
batch: Bin number A-H (used for internal organization)
pmid: PubMed ID taken from PubMed
title: Title of the article taken from PubMed
abstract: Abstract of the article taken from PubMed
cohort: Answers if the study is a cohort study. Yes if it is, No if it is not and Can't tell if the annotators are uncertain if the study is a cohort study.
prospective: If the cohort column is Yes, then this column records if the study is a prospective study. Yes if it is, No if it is not and Can't tell if the annotators are uncertain if the study is a prospective study.
unit_is_individual_people: If the cohort column is Yes, then this column records if the unit mentioned in the study is individual people. Yes if it is, No if it is not and Can't tell if the annotators are uncertain if the unit mentioned in the study is individual people.
evaluates_the_effect_of_an: If the cohort column is Yes, then this column records if the study evaluates the effect of an exposure observed after the start of the cohort on an outcome. Yes if it is, No if it is not and Can't tell if the annotators are uncertain if the study evaluates the effect of an exposure observed after the start of the cohort on an outcome.
case_control: Answers if the study is a case-control study. Yes if it is, No if it is not and Can't tell if the annotators are uncertain if the study is a case-control study.
other_study_design: If the columns cohort and case_control are both No, then the annotators filled this column with the study design they thought to be appropriate among the following options - Cross-sectional, RCT , SR and Other.
comment: Any additional information regarding the article indexing. Not mandatory field, used only if the annotators deemed it necessary.
used FT: If the annotators used full-text for annotation it has 'Yes' marked, otherwise empty

The second group of annotators (Neil R. Smalheiser and Ang Michael Troy) annotated the CSV files which have case reports and cross-sectional studies mentioned in their names. Their columns are
batch: Bin number I-P (used for internal organization)
pmid: PubMed ID taken from PubMed
title: Title of the article taken from PubMed
abstract: Abstract of the article taken from PubMed
annotation: Judgement by the annotator about the article. There are the following types of judgement :
IS_PT : The article is of the given study design
IS_NOT_PT: The article is not of the given study design
UNCERTAIN: Not conclusive if the article is of or not of the given study design
UNCERTAIN_CS: Not conclusive if the article is of or not of the given study design but could be a case series
UNCERTAIN_nonclinical: Not conclusive if the article is of or not of the given study design but could be a nonclinical study
comment: Any additional information regarding the article indexing. Not mandatory field, used only if the annotators deemed it necessary.
used FT: If the annotators used full-text for annotation it has 'Yes' marked, otherwise empty.

e) Inter-annotator agreement-disagreement numbers before reconciliation.csv:
Study Design: The study design that the annotated articles are according to TM if they fall under TOP 100. The study design that the annotated articles are according to NLM if they fall under LOWEST 100.
Year: Indexing year of the study designs according to NLM
Initial agreement on: Is the study design: Number of articles that the annotators agree is of the study design mentioned in column Study Design
Initial agreement on: Is not the study design: Number of articles that the annotators agree is not of the study design mentioned in column Study Design
Initial agreement on: Uncertain if the study design: Number of articles that the annotators agree is uncertain if it is of the study design mentioned in column Study Design or not
Initial Agreement (%): Total number of articles on adding columns Initial agreement on: Is the study design, Initial agreement on: Is not the study design, and Initial agreement on: Uncertain if the study design
Initial Disagreement (%): Total number of articles the annotators could not reach an agreement on. Types of disagreement includes - Is of study design/Is not of study design, Is of study design/Uncertain if study design and Is not of study design/Uncertain if study design.

f) F1, recall and precision of MT and TM.csv:
study design: The study design being scored
year: The year being scored (2016 or 2025)
model: The model TM or MT from which the scores were output
F1 Score: F1 score for the corresponding study design, year and model
recall: recall score for the corresponding study design, year and model
precision: precision score for the corresponding study design, year and model

[References:]
[1] Cohen, Aaron M., Schneider, Jodi, Fu, Yuanxi, McDonagh, Marian S., Das, Prerna, Holt, Arthur W., & Smalheiser, Neil R. (2021). Fifty ways to tag your PubTypes: Multi-tagger, a set of probabilistic publication type and study design taggers to support biomedical indexing and evidence-based medicine [Preprint]. medRxiv. https://doi.org/10.1101/2021.07.13.21260468 

[2] Menke, Joe D., Kilicoglu, Halil, & Smalheiser, Neil R. (2025). Publication type tagging using transformer models and multi-label classification [Preprint]. medRxiv. https://doi.org/10.1101/2025.03.06.25323516 

Social Sciences
Study designs; evidence-based medicine; manual annotation; indexing; Study Characteristics; National Library of Medicine (U.S.)
CC BY
U.S. National Institutes of Health (NIH)-Grant:R01LM014292
Puranjani Das
Version DOI Comment Publication Date
README.txt 16.2 KB File
Bin-wise distribution of manual annotation sheets.csv 514 Bytes File
Definitions and annotator notes for manual examination.txt 8 KB File
F1, recall and precision of MT and TM.csv 1011 Bytes File
Inter-annotator agreement-disagreement numbers before reconciliation.csv 831 Bytes File
Manual Annotation case reports (2016 FFT).csv 158 KB File
Manual Annotation case reports (2016 TTF).csv 126 KB File
Manual Annotation case reports (2025 FFT).csv 171 KB File
Manual Annotation case reports (2025 TTF).csv 142 KB File
Manual Annotation case-control studies (2016 FFT).csv 179 KB File
Manual Annotation case-control studies (2016 TTF).csv 169 KB File
Manual Annotation case-control studies (2025 FFT).csv 189 KB File
Manual Annotation case-control studies (2025 TTF).csv 190 KB File
Manual Annotation cohort studies (2016 FFT).csv 192 KB File
Manual Annotation cohort studies (2016 TTF) .csv 177 KB File
Manual Annotation cohort studies (2025 FFT).csv 199 KB File
Manual Annotation cohort studies (2025 TTF).csv 197 KB File
Manual Annotation cross-sectional studies (2016 FFT).csv 175 KB File
Manual Annotation cross-sectional studies (2016 TTF).csv 188 KB File
Manual Annotation cross-sectional studies (2025 FFT).csv 187 KB File
Manual Annotation cross-sectional studies (2025 TTF).csv 196 KB File
Sample statistics.csv 1.8 KB File
Sample summary.csv 4.82 KB File
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us