There are two files in this dataset.
File1: AffiNorm
AffiNorm contains 1,001 rows, including one header row, randomly sampled from MapAffil 2018 Dataset ([**https://doi.org/10.13012/B2IDB-2556310_V1**](https://databank.illinois.edu/datasets/IDB-2556310)). Each row in the file corresponds to a particular author on a particular PubMed record, and contains the following 26 columns, comma-delimited. All columns are ASCII, except city which contains Latin-1.
COLUMN DESCRIPTION
1. PMID: the PubMed identifier. int.
2. ORDER: the position of the author. int.
3. YEAR - The year of publication. int(4), eg: 1975.
4. affiliation - affiliation string of the author. eg: Department of Pathology, University of Chicago, Illinois 60637.
5. annotation_type: the number of institutions annotated, denoted by S, M, O, or Z, where "S" (single) indicates 1 institution was annotated; "M" (Multiple) indicates more than one institutions were annotated; "O" (Out of Vocabulary or None) indicates no institution was annotated, but an institution was apparently mentioned; "Z" indicates no institution was mentioned.
6. Institution: the standard name(s) of the annotated institution(s), according to ROR. if "S" (single institution), it is saved as a string, eg: University of Chicago; if "M", it is saved as a string that looks like a python list, eg: ['Public Health Laboratory Service'; 'Centre for Applied Microbiology and Research']; if "O" or "Z", then blank.
7. inst_type: the type of institution, according to ROR. the potential values are: education, funder, healthcare, company, archive, nonprofit, government, facility, other. An institution may have more than one type, eg: ['Education', 'Funder']
8. type_edu: TRUE if the inst_type contains "Education"; FALSE otherwise.
9. RORid: ROR identifier(s), eg: https://ror.org/05hs6h993. when multiple, the order corresponds to institution (column 6)
10. RORid_label. the standard name(s) of the annotated institution(s) according to ROR.same as institution (column 6)
11. GRIDid: GRID identifier(s). eg: grid.170205.1
12. GRIDid_label: the standard name(s) of the annotated institution(s) according to GRID. eg: University of Chicago.
13. WikiDataid: WikiData identifier(s). eg: Q131252
14. WikiDataid_label: the standard name(s) of the annotated institution(s) according to WikiData. eg: University of Chicago
15. synonyms: a comma separated list of variant names from InsVar (file 2) . format of string. eg: University of Chicago, Chicago University, U of C, UChicago, uchicago.edu, U Chicago, ...
16. MapAffil-grid: GRID from the MapAffil 2018 Dataset.
17. MapAffil-grid_label: The standard name of institution from MapAffil 2018 Dataset.
18. judge_mapA: TRUE if GRIDid (column 11) contains MapAffil-grid (column 16); FALSE otherwise.
19. MapAffiltemporal-grid: GRID from the temporal version of MapAffil, http://abel.ischool.illinois.edu/data/MapAffilTempo2018.tsv.gz
20. MapAffiltemporal-grid_label: The standard name of institution from MapAffilTemporal 2018 Dataset.
21. judge_mapT: TRUE if GRIDid (column 11) contains MapAffiltemporal-grid (column 19); FALSE otherwise.
22. RORapi_query_id: ROR from ROR api tool (query endpoint)
23. RORapi_query_id_label: The standard name of institution from ROR api tool (query endpoint). format in string.
24. judge_rorapi_affiliation: TRUE if RORid (column 9) contains RORapi_query_id (column 22); FALSE otherwise.
25. rorapi_affiliation_id: ROR from ROR api tool (affiliation endpoint).
26. judge_rorapi_affiliation: TRUE if RORid (column 9) contains RORapi_affiliation (column 25); FALSE otherwise.
File 2: insVar.json
InsVar is a supplementary dataset for AffiNorm, which includes the institution ID and its redirected aliases from wikidata. The institution ID list is from GRID, the redirected aliases are from wiki api, for example: https://en.wikipedia.org/wiki/Special:WhatLinksHere?target=University+of+Illinois+Urbana-Champaign&namespace=&hidetrans=1&hidelinks=1&limit=100
In InsVar, the data is saved in a python dictionary format. the key is the GRID identifier, for example: "grid.1001.0" (Australian National University), and the value is a list of redirected aliases strings.
{"grid.1001.0": ["ANU", "ANU College", "ANU College of Arts and Social Sciences", "ANU College of Asia and the Pacific", "ANU Union", "ANUSA", "Asia Pacific Week", "Australia National University", "Australian Forestry School", "the Australian National University", ...], "grid.1002.3": ...}