This dataset consists of a citation graph. It was constructed by downloading and parsing the Works section of the Open Alex catalog of the global research system. Open Alex (see citation below) contains detailed information about scholarly research, including articles, authors, journals, institutions, and their relationships. The data were downloaded on 2024-07-15.
The dataset comprises two compressed (.xz) files.
1) filename: openalexID_integer_id_hasDOI.parquet.xz. The tabular data within contains three columns: openalex_id, integer_id, and hasDOI. Each row represents a record with the following data types:
• openalex_id: A unique identifier from the Open Alex catalog.
• integer_id: An integer representing the new identifier (assigned by the authors)
• hasDOI: An integer (0 or 1) indicating whether the record has a DOI (0 for no, 1 for yes).
2) filename: citation_table.tsv.xz
This edgelist of citations has two columns (no header) of integer values that represent citing and cited integer_id, respectively.
Summary Features
• Total Nodes (Documents): 256,997,006
• Total Edges (citations): 2,148,871,058
• Documents with DOIs: 163,495,446
• Edges between documents with DOIs: 1,936,722,541
The code used to generate these files can be found here: https://github.com/illinois-or-research-analytics/lorran_openalex/
|