Illinois Data Bank Dataset Search Results
Results
published:
2016-12-19
Files in this dataset represent an investigation into use of the Library mobile app Minrva during the months of May 2015 through December 2015. During this time interval 45,975 API hits were recorded by the Minrva web server. The dataset included herein is an analysis of the following: 1) a delineation of API hits to mobile app modules use in the Minrva app by month, 2) a general analysis of Minrva app downloads to module use, and 3) the annotated data file providing associations from API hits to specific modules used, organized by month (May 2015 – December 2015).
keywords:
API analysis; log analysis; Minrva Mobile App
published:
2021-06-16
Warnow , Tandy; Wedell, Eleanor
(2021)
Thank you for using these datasets.
These RNAsim aligned fragmentary sequences were generated from the query sequences selected by Balaban et al. (2019) in their variable-size datasets (https://doi.org/10.5061/dryad.78nf7dq). They were created for use for phylogenetic placement with the multiple sequence alignments and backbone trees provided by Balaban et al. (2019).
The file structures included here also correspond with the data Balaban et al. (2020) provided.
This includes:
Directories for five varying backbone tree sizes, shown as 5000, 10000, 50000, 100000, and 200000. These directory names are also used by Balaban et al. (2019), and indicate the size of the backbone tree included in their data.
Subdirectories for each replicate from the backbone tree size labelled 0 through 4. For the smaller four backbone tree sizes there are five replicates, and for the largest there is one replicate.
Each replicate contains 200 text files with one aligned query sequence fragment in fasta format.
keywords:
Fragmentary Sequences; RNAsim
published:
2021-04-18
Lyu, Fangzheng; Kang, Jeon-Young; Wang, Shaohua; Han, Su; Li, Zhiyu; Wang, Shaowen; Padmanabhan, Anand
(2021)
This dataset contains all the code, notebooks, datasets used in the study conducted for the research publication titled "Multi-scale CyberGIS Analytics for Detecting Spatiotemporal Patterns of COVID-19 Data". Specifically, this package include the artifacts used to conduct spatial-temporal analysis with space time kernel density estimation (STKDE) using COVID-19 data, which should help readers to reproduce some of the analysis and learn about the methods that were conducted in the associated book chapter.
## What’s inside - A quick explanation of the components of the zip file
* Multi-scale CyberGIS Analytics for Detecting Spatiotemporal Patterns of COVID-19.ipynb is a jupyter notebook for this project. It contains codes for preprocessing, space time kernel density estimation, postprocessing, and visualization.
* data is a folder containing all data needed for the notebook
* data/county.txt: US counties information and fip code from Natural Resources Conservation Service.
* data/us-counties.txt: County-level COVID-19 data collected from New York Times COVID-19 github repository on August 9th, 2020.
* data/covid_death.txt: COVID-19 death information derived after preprocessing step, preparing the input data for STKDE. Each record is if the following format (fips, spatial_x, spatial_y, date, number of death ).
* data/stkdefinal.txt: result obtained by conducting STKDE.
* wolfram_mathmatica is a folder for 3D visulization code.
* wolfram_mathmatica/Visualization.nb: code for visulization of STKDE result via weolfram mathmatica.
* img is a folder for figures.
* img/above.png: result of 3-D visulization result, above view.
* img/side.png: result of 3-D visulization, side view.
keywords:
CyberGIS; COVID-19; Space-time kernel density estimation; Spatiotemporal patterns
published:
2023-09-13
Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy
(2023)
This upload contains one additional set of datasets (RNASim10k, ten replicates) used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment".
The zipped file has the following structure:
10k
|__R0
|__unaln.fas
|__true.fas
|__true.tre
|__R1
...
# Alignment files:
1. `unaln.fas`: all unaligned sequences.
2. `true.fas`: the reference alignment of all sequences.
3. `true.tre`: the reference tree on all sequences.
For other datasets that uniquely appeared in EMMA, please refer to the related dataset (which is linked below): Shen, Chengze; Liu, Baqiao; Williams, Kelly P.; Warnow, Tandy (2022): Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2567453_V1
keywords:
SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published:
2025-03-28
8-bit RGB realizations of a stochastic image model (SIM) of the **kinds** of things seen in fluorescence microscopy of biological samples. Note that no attempt was made to model a particular tissue, sample, or microscope. Distinct image features are seen in each color channel. The first public mention of these SIMs is in "Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure" by Frank Brooks and Rucha Deshpande. Manuscript on ArXiv and submitted for publication.
keywords:
image models; fluorescence microscopy; training data; image-to-image translation; generative model evaluation
published:
2022-03-30
Tiemann, Jeremy S.; Stodola, Alison P.; Douglass, Sarah A.; Vinsel, Rachel M.; Cummings, Kevin S.
(2022)
This dataset is associated with a larger manuscript published in 2022 in the Illinois Natural History Survey Bulletin to summarize all known records for nonindigenous aquatic mollusks in Illinois, and full sources are referenced within the manuscript. We examined museum holdings, literature accounts, publicly available databases sponsored by the U.S. Geological Survey (USGS) - Nonindigenous Aquatic Species program (http://nas.er.usgs.gov/.) and InvertEBase (invertebase.org). We also included sporadic field survey data of encounters of nonindigenous aquatic species from colleagues within the Illinois Natural History Survey, Illinois Department of Natural Resources, U.S. Fish and Wildlife Service, county forest preserve districts, and other natural resource agencies about their encounters with nonindigenous aquatic mollusk species. Lastly, we examined the role and utility of citizen-science data to document occurrences of nonindigenous aquatic mollusk species. We queried iNaturalist (www.inaturalist.org) for all available nonindigenous freshwater mollusk data for Illinois.
Table heading descriptions (if not intuitive) are: “INHS verified” is whether an INHS staff member verified the record by observing vouchered specimen or photograph; “Source” is where a record was accessed or obtained; “individualCount” is number collected or observed in a record; “MuseumCode” is standard museum abbreviation or acronym; “Institution” is source that housed or reported a record, and this also includes the spelled-out museum code; “Collectors” typically indicates who collected the specimen or voucher; “Lat_Long determined by” denotes whether collection coordinates were stated by the collector or by a curator (using inference from data available); “fieldNumber” typically indicates a unique field number that a collector may have used in the field; “identifiedBy” typically explains who identified a specimen or verified a specimen identification.
keywords:
Illinois; Exotic species; Non-native aquatic species; NAS; Aquatic Invasive Species; AIS; Mollusk
published:
2024-12-20
Stuchiner, Emily; Xu, Jiacheng; Eddy, William C.; DeLucia, Evan H.; Yang, Wendy
(2024)
All data presented in the manuscript published in the Journal of Geophysical Research-Biogeosciences by Stuchiner et al. 2025, "Hot or not? An evaluation of methods for identifying hot moments of nitrous oxide emissions from soils." This includes hourly N2O flux measurements from 20 autochambers from May 2022 to April 2023 in a maize field in central Illinois, and various metrics used to assess hot moments that are evaluated in the manuscript. Note that chamber 5 for each sampling node is sampled from a deep soil collar (50 cm depth) that excludes roots for the purpose of measuring heterotrophic respiration rates.
keywords:
nitrous oxide; maize; hot moments; outlier detection; soil emissions
published:
2018-08-03
Kim, Eun Sun; Zaya, David N.; Fant, Jeremie B.; Ashley, Mary V.
(2018)
These data include information on a field experiment on Castilleja coccinea (L.) Spreng., scarlet Indian paintbrush (Orobanchaceae). There is intraspecific variation in scarlet Indian paintbrush in the color of the bracts surrounding the flowers. Two bract color morphs were included in this study, the scarlet and yellow morphs. The experiment was conducted at Illinois Beach State Park in 2012. The aim of the work was to compare the color morphs with regard to 1) self-compatibility, 2) response to pollinator exclusion, 3) cross-compatibility between the color morphs, and 4) relative female fertility and male fitness.
Three files are attached with this record. The raw data are in "fruitSet.csv" and "seedSet.csv", while "readme.txt" has detailed explanations of the raw data files.
keywords:
Castilleja coccinea; Orobanchaceae; floral color polymorphism; bract color polymorphism; breeding system; hand-pollination; self-compatibility; reproductive assurance
published:
2024-06-27
Han, Hee-Sun ; Schrader, Alex; Lee, JuYeon
(2024)
U-2 OS MERFISH data set prepared by the Han lab at UIUC based off of procedures developed in Moffitt et al. Proc. Natl. Acad. Sci. USA 113 (39), 11046–11051.
Data is comprised of ~2 million spots from 130 genes with x,y,z location, cell assignment, and correction status.
keywords:
smFISH; single transcript spatial transcriptomics; U-2 OS; Cancer cell line; MERFISH
published:
2023-12-15
Abidi, Syeda Nayab; Hsu, Felicity; Smith-Bolton, Rachel
(2023)
This page contains the data for the publication "Regenerative growth is constrained by brain tumor to ensure proper patterning in Drosophila" published in PLOS Genetics in 2023.
published:
2023-12-18
Johnson, Claire A.; Benson, Thomas J.
(2023)
Data in this publication were used to examine the effects of habitat and landscape-level covariates on occupancy and interannual dynamics and the effects of environmental factors on detection of Black-billed Cuckoos and Yellow-billed Cuckoos. Data were collected between 2019-2020 in northern Illinois, USA. Procedures were approved by the Illinois Institutional Animal Care and Use Committee (IACUC), protocol no. 19086.
keywords:
Black-billed Cuckoo; habitat use; multi-scale; occupancy dynamics; turnover; Yellow-billed Cuckoo
published:
2020-06-30
Chakraborty, Sulagna; Cristina Drumond Andrade , Flavia; Lee Smith, Rebecca
(2020)
This file contains 13 unique case studies that were created for the One health: Infectious diseases course offered at the University of Illinois at Urbana-Champaign campus. The case studies are being made available as educational resources for other One health courses. Each case study is focused on a theme/topic which is associated with One health. These case studies were created using publicly available information and references have been provided for each case study.
keywords:
One health education; infectious diseases; case studies
published:
2022-07-22
Johnson, Claire A.; Benson, Thomas J.
(2022)
Data in this publication were used to examine the effects of environmental and temporal covariates on detection probability, and the effects of habitat and landscape level covariates on occupancy and within season turnover of Black-billed Cuckoos and Yellow-billed Cuckoos. Data were collected between 2019-2020 in northern Illinois, USA. Procedures were approved by the Illinois Institutional Animal Care and Use Committee (IACUC), protocol no. 19086.
keywords:
Black-billed Cuckoo; call broadcast; Coccyzus americanus; Coccyzus erythropthalmus; detection probability; occupancy dynamics; rare and secretive species; Yellow-billed Cuckoo
published:
2023-04-06
Yao, Lehan; Lyu, Zhiheng; Li, Jiahui; Chen, Qian
(2023)
Example data for https://github.com/chenlabUIUC/UsiNet
The data contains computer simulated and experimental tilting series (or sinograms) of gold nanoparticles.
Two training data examples are provided:
1. simulated_data.zip
2. experimental_data.zip
In each zip folder, we include an image_data.zip and a training_data.zip. The former is for viewing and only the latter is needed for model training. For more details, please refer to our GitHub repository.
keywords:
electron tomography; deep learning
published:
2024-12-17
Nesbitt, Stephen; Niescier, Robert
(2024)
This repository contains precipitation spectra from a Parsivel-2 disdrometer deployed at Lancaster High School, Lancaster, NY, as well as a MRR-2 radar deployed at the same site. The site was located at 42.9299° N, 78.6708° W. Parsivel data were converted to netCDF using the pyDSD python package. MRR-2 spectra are raw from the manufacturer's software. The Parsivel and MRR-2 data include periods collected during November 2022 as described in the paper.
keywords:
snowfall; disdrometer; spectra; micro rain radar; Doppler
published:
2025-04-01
Getahun, Elias; Zavelle, Atticus; Keefer, Laura
(2025)
ICoastalDB, which was developed using Microsoft structured query language (SQL) Server, consists of water quality and related data in the Illinois coastal zone that were collected by various organizations. The information in the dataset includes, but is not limited to, sample data type, method of data sampling, location, time and date of sampling and data units.
keywords:
Illinois Coastal Zone; Water Quality Data
published:
2021-02-25
Ferin, Kelsie; Chen, Luoye; Zhong, Jia; Heaton, Emily; Khanna, Madhu; VanLoocke, Andy
(2021)
Total nitrogen leaching rates were calculated over the Mississippi Atchafalaya River Basin (MARB) using an integrated economic-biophysical modeling approach. Land allocation for corn production and total nitrogen application rates were calculated for crop reporting districts using the Biofuel and Environmental Policy Analysis Model (BEPAM) for 5 RFS2 policy scenarios. These were used as input in the Integrated BIosphere Simulator-Agricultural Version (Agro-IBIS) and the Terrestrial Hydrologic Model with Biogeochemistry (THMB) to calculate the nitrogen loss.
Land allocation and total nitrogen application simulations were simulated for the period 2016-2030 for 303 crop reporting districts (https://www.nass.usda.gov/Data_and_Statistics/County_Data_Files/Frequently_Asked_Questions/county_list.txt). The final 2030 values are reported here. Both are stored in csv files. Units for land allocation are million ha and nitrogen application are million kg.
The nitrogen leaching rates were modeled with a spatial resolution of 5' x 5' using the North American Datum of 1983 projection and stored in NetCDF files. The 30-year average is calculated over the last 30 years of the 45 years being simulated. Leaching rates are calculated in kg-N/ha.
keywords:
nitrogen leaching, bioethanol, bioenergy crops
published:
2024-09-26
Kamara, Shasta; Hay, Allison; Oller, Reagan; Suski, Cory
(2024)
This dataset is from a study of a simulated angling tournament livewell holding in June of 2023 on Largemouth Bass (Micropterus nigricans) on Clinton Lake, Illinois. Fish were collected via electrofishing, weighed, measured and assessed for physical injury prior to receiving a commercially available cull tag and being placed in a simulated livewell. After a six hour livewell holding period, fish were removed from the livewell assessed for physical injury and then assessed for reflex action mortality predictors prior to being placed in a net pen for 3 days of observation. This dataset includes, weights, total lengths, physical injury scores, and reflex action mortality predictor scores for Largemouth Bass and water quality parameters of livewells and the lake in net pens.
keywords:
sport fish conservation; fisheries management; high-grading; stringer
published:
2023-07-14
Schneider, Jodi; Das, Susmita; Léveillé, Jacqueline ; Proescholdt, Randi
(2023)
Data for Post-retraction citation: A review of scholarly research on the spread of retracted science
Schneider, Jodi; Das, Susmita; Léveillé, Jacqueline; Proescholdt, Randi
Contact: Jodi Schneider jodi@illinois.edu & jschneider@pobox.com
**********
OVERVIEW
**********
This dataset provides further analysis for an ongoing literature review about post-retraction citation.
This ongoing work extends a poster presented as:
Jodi Schneider, Jacqueline Léveillé, Randi Proescholdt, Susmita Das, and The RISRS Team. Characterization of Publications on Post-Retraction Citation of Retracted Articles. Presented at the Ninth International Congress on Peer Review and Scientific Publication, September 8-10, 2022 hybrid in Chicago. https://hdl.handle.net/2142/114477 (now also in https://peerreviewcongress.org/abstract/characterization-of-publications-on-post-retraction-citation-of-retracted-articles/ )
Items as of the poster version are listed in the bibliography 92-PRC-items.pdf.
Note that following the poster, we made several changes to the dataset (see changes-since-PRC-poster.txt). For both the poster dataset and the current dataset, 5 items have 2 categories (see 5-items-have-2-categories.txt).
Articles were selected from the Empirical Retraction Lit bibliography (https://infoqualitylab.org/projects/risrs2020/bibliography/ and https://doi.org/10.5281/zenodo.5498474 ). The current dataset includes 92 items; 91 items were selected from the 386 total items in Empirical Retraction Lit bibliography version v.2.15.0 (July 2021); 1 item was added because it is the final form publication of a grouping of 2 items from the bibliography: Yang (2022) Do retraction practices work effectively? Evidence from citations of psychological retracted articles http://doi.org/10.1177/01655515221097623
Items were classified into 7 topics; 2 of the 7 topics have been analyzed to date.
**********************
OVERVIEW OF ANALYSIS
**********************
DATA ANALYZED:
2 of the 7 topics have been analyzed to date:
field-based case studies (n = 20)
author-focused case studies of 1 or several authors with many retracted publications (n = 15)
FUTURE DATA TO BE ANALYZED, NOT YET COVERED:
5 of the 7 topics have not yet been analyzed as of this release:
database-focused analyses (n = 33)
paper-focused case studies of 1 to 125 selected papers (n = 15)
studies of retracted publications cited in review literature (n = 8)
geographic case studies (n = 4)
studies selecting retracted publications by method (n = 2)
**************
FILE LISTING
**************
------------------
BIBLIOGRAPHY
------------------
92-PRC-items.pdf
------------------
TEXT FILES
------------------
README.txt
5-items-have-2-categories.txt
changes-since-PRC-poster.txt
------------------
CODEBOOKS
------------------
Codebook for authors.docx
Codebook for authors.pdf
Codebook for field.docx
Codebook for field.pdf
Codebook for KEY.docx
Codebook for KEY.pdf
------------------
SPREADSHEETS
------------------
field.csv
field.xlsx
multipleauthors.csv
multipleauthors.xlsx
multipleauthors-not-named.csv
multipleauthors-not-named.xlsx
singleauthors.csv
singleauthors.xlsx
***************************
DESCRIPTION OF FILE TYPES
***************************
BIBLIOGRAPHY (92-PRC-items.pdf) presents the items, as of the poster version. This has minor differences from the current data set. Consult changes-since-PRC-poster.txt for details on the differences.
TEXT FILES provide notes for additional context. These files end in .txt.
CODEBOOKS describe the data we collected. The same data is provided in both Word (.docx) and PDF format.
There is one general codebook that is referred to in the other codebooks: Codebook for KEY lists fields assigned (e.g., for a journal or conference). Note that this is distinct from the overall analysis in the Empirical Retraction Lit bibliography of fields analyzed; for that analysis see Proescholdt, Randi (2021): RISRS Retraction Review - Field Variation Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2070560_V1
Other codebooks document specific information we entered on each column of a spreadsheet.
SPREADSHEETS present the data collected. The same data is provided in both Excel (.xlsx) and CSV format.
Each data row describes a publication or item (e.g., thesis, poster, preprint).
For column header explainations, see the associated codebook.
*****************************
DETAILS ON THE SPREADSHEETS
*****************************
field-based case studies
CODEBOOK: Codebook for field
--REFERS TO: Codebook for KEY
DATA SHEET: field
REFERS TO: Codebook for KEY
--NUMBER OF DATA ROWS: 20 NOTE: Each data row describes a publication/item.
--NUMBER OF PUBLICATION GROUPINGS: 17
--GROUPED PUBLICATIONS: Rubbo (2019) - 2 items, Yang (2022) - 3 items
author-focused case studies of 1 or several authors with many retracted publications
CODEBOOK: Codebook for authors
--REFERS TO: Codebook for KEY
DATA SHEET 1: singleauthors (n = 9)
--NUMBER OF DATA ROWS: 9
--NUMBER OF PUBLICATION GROUPINGS: 9
DATA SHEET 2: multipleauthors (n = 5
--NUMBER OF DATA ROWS: 5
--NUMBER OF PUBLICATION GROUPINGS: 5
DATA SHEET 3: multipleauthors-not-named (n = 1)
--NUMBER OF DATA ROWS: 1
--NUMBER OF PUBLICATION GROUPINGS: 1
*********************************
CRediT <http://credit.niso.org>
*********************************
Susmita Das: Conceptualization, Data curation, Investigation, Methodology
Jaqueline Léveillé: Data curation, Investigation
Randi Proescholdt: Conceptualization, Data curation, Investigation, Methodology
Jodi Schneider: Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Supervision
keywords:
retraction; citation of retracted publications; post-retraction citation; data extraction for scoping reviews; data extraction for literature reviews;
published:
2022-12-11
The data are original electron micrographs from the lab of the late Dr. Burt Endo of the USDA. These data were digitized from photographic prints and glass plate negatives at 600 DPI as 16 bit TIFF files. This fourth version added 6 new ZIP files from the Endo data collection. "Endo folder database.xlsx" is updated to reflect the addition. Information in "Readme_FileNameFormatting.docx" remains the same as in V3.
keywords:
Heterodera glycines; Meloidogyne incognita; Burt Endo; nematode
published:
2016-12-12
Zhang, Qian; Chunyan, Li; Braud, Dewitt
(2016)
This dataset is about a topographic LIDAR survey (saved in “waxlake-lidar.img”) that was conducted over the Wax Lake delta, between longitudes −91.5848 to −91.292 degrees, and latitudes 29.3647 to 29.6466 degrees. Different from other elevation data, the positive value in the LIDAR data indicates land elevation, while the zero value implies riverbed without identifying specific water depth.
keywords:
LIDAR; Wax Lake delta
published:
2024-01-30
This data set includes the cochlear implant (CI) electrodograms recorded in 2 different acoustic conditions using acoustic head KEMAR. It is a part of a study intended to explore the effect of interaural asymmetry on interaural coherence after CI processing.
keywords:
cochlear implant; electrodogram; KEMAR; interaural coherence
published:
2016-12-12
Zhang, Qian; Li, Chunyan
(2016)
This dataset is the field measurements of water depth at the Wax Lake delta on the date 2012-12-01.
keywords:
Wax Lake delta; Bathymetry
published:
2023-07-11
Parulian, Nikolaus
(2023)
The dissertation_demo.zip contains the base code and demonstration purpose for the dissertation: A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning.
Each chapter has a demo folder for demonstrating provenance queries or tools.
The Airbnb dataset for demonstration and simulation is not included in this demo but is available to access directly from the reference website.
Any updates on demonstration and examples can be found online at: https://github.com/nikolausn/dissertation_demo
published:
2025-06-05
Guan, Yingjun; Fang, Liri
(2025)
There are two files in this dataset.
File1: AffiNorm
AffiNorm contains 1,001 rows, including one header row, randomly sampled from MapAffil 2018 Dataset ([**https://doi.org/10.13012/B2IDB-2556310_V1**](https://databank.illinois.edu/datasets/IDB-2556310)). Each row in the file corresponds to a particular author on a particular PubMed record, and contains the following 26 columns, comma-delimited. All columns are ASCII, except city which contains Latin-1.
COLUMN DESCRIPTION
1. PMID: the PubMed identifier. int.
2. ORDER: the position of the author. int.
3. YEAR - The year of publication. int(4), eg: 1975.
4. affiliation - affiliation string of the author. eg: Department of Pathology, University of Chicago, Illinois 60637.
5. annotation_type: the number of institutions annotated, denoted by S, M, O, or Z, where "S" (single) indicates 1 institution was annotated; "M" (Multiple) indicates more than one institutions were annotated; "O" (Out of Vocabulary or None) indicates no institution was annotated, but an institution was apparently mentioned; "Z" indicates no institution was mentioned.
6. Institution: the standard name(s) of the annotated institution(s), according to ROR. if "S" (single institution), it is saved as a string, eg: University of Chicago; if "M", it is saved as a string that looks like a python list, eg: ['Public Health Laboratory Service'; 'Centre for Applied Microbiology and Research']; if "O" or "Z", then blank.
7. inst_type: the type of institution, according to ROR. the potential values are: education, funder, healthcare, company, archive, nonprofit, government, facility, other. An institution may have more than one type, eg: ['Education', 'Funder']
8. type_edu: TRUE if the inst_type contains "Education"; FALSE otherwise.
9. RORid: ROR identifier(s), eg: https://ror.org/05hs6h993. when multiple, the order corresponds to institution (column 6)
10. RORid_label. the standard name(s) of the annotated institution(s) according to ROR.same as institution (column 6)
11. GRIDid: GRID identifier(s). eg: grid.170205.1
12. GRIDid_label: the standard name(s) of the annotated institution(s) according to GRID. eg: University of Chicago.
13. WikiDataid: WikiData identifier(s). eg: Q131252
14. WikiDataid_label: the standard name(s) of the annotated institution(s) according to WikiData. eg: University of Chicago
15. synonyms: a comma separated list of variant names from InsVar (file 2) . format of string. eg: University of Chicago, Chicago University, U of C, UChicago, uchicago.edu, U Chicago, ...
16. MapAffil-grid: GRID from the MapAffil 2018 Dataset.
17. MapAffil-grid_label: The standard name of institution from MapAffil 2018 Dataset.
18. judge_mapA: TRUE if GRIDid (column 11) contains MapAffil-grid (column 16); FALSE otherwise.
19. MapAffiltemporal-grid: GRID from the temporal version of MapAffil, http://abel.ischool.illinois.edu/data/MapAffilTempo2018.tsv.gz
20. MapAffiltemporal-grid_label: The standard name of institution from MapAffilTemporal 2018 Dataset.
21. judge_mapT: TRUE if GRIDid (column 11) contains MapAffiltemporal-grid (column 19); FALSE otherwise.
22. RORapi_query_id: ROR from ROR api tool (query endpoint)
23. RORapi_query_id_label: The standard name of institution from ROR api tool (query endpoint). format in string.
24. judge_rorapi_affiliation: TRUE if RORid (column 9) contains RORapi_query_id (column 22); FALSE otherwise.
25. rorapi_affiliation_id: ROR from ROR api tool (affiliation endpoint).
26. judge_rorapi_affiliation: TRUE if RORid (column 9) contains RORapi_affiliation (column 25); FALSE otherwise.
File 2: insVar.json
InsVar is a supplementary dataset for AffiNorm, which includes the institution ID and its redirected aliases from wikidata. The institution ID list is from GRID, the redirected aliases are from wiki api, for example: https://en.wikipedia.org/wiki/Special:WhatLinksHere?target=University+of+Illinois+Urbana-Champaign&namespace=&hidetrans=1&hidelinks=1&limit=100
In InsVar, the data is saved in a python dictionary format. the key is the GRID identifier, for example: "grid.1001.0" (Australian National University), and the value is a list of redirected aliases strings.
{"grid.1001.0": ["ANU", "ANU College", "ANU College of Arts and Social Sciences", "ANU College of Asia and the Pacific", "ANU Union", "ANUSA", "Asia Pacific Week", "Australia National University", "Australian Forestry School", "the Australian National University", ...], "grid.1002.3": ...}
keywords:
PubMed; MEDLINE; Digital Libraries; Bibliographic Databases; Institution Names; Author Affiliations; Institution Name Ambiguity; Authority files