Illinois Data Bank

Curated PI3K (Phosphoinositide 3-kinase) Network

In version 2 of this network, we retrieved citing and cited articles around around a founder article [Whitman et al. (1988) Nature, 332(6165):644–646] using a breadth first search (BFS) with API calls to the Dimensions database. This BFS protocol replaces the seed seed expansion protocol used in version 1 where multiple rounds of citing and cited are harvested as in the first version.

The data are useful for structural and simulation studies and are stored as three edgelists in parquet format. Level 1 (L1) consists of all edges between articles that are one hop (one citation or one reference away) from the founder article (Whitman et al. 1988) Level 2 (L2) includes L1 and nodes that are two hops away from the founder. Similarly with Level 3 (L3). A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder and BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to. The integer ids have been freshly minted and there is no correspondence between the ids in version 1 of this data repository. The column headers used are source and target.

In these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science, which owns the Dimensions database. The edgelists provide structural information for topological studies. Nodelists can be extracted by taking the union of unique(source) and unique(target). The authors thank Digital Science for providing access to Dimensions.

Level 1: 962 nodes and 8,214 edges
Level 2: 115,366 nodes and 1,918,815 edges
Level 3: 7,881,135 nodes and 277,338,664 edges

Note that level 1 nodes and edges are a subset of level 2 nodes and edges, in turn a subset of level 3 nodes and edges.

Technology and Engineering
pi3k citation network
CC BY
Digital Science
George Chacko
14 times
Version DOI Comment Publication Date
2 10.13012/B2IDB-9184261_V2 Use a breadth-first-search *(BFS) protocol to extract data from the Dimensions database and replace proprietary identifiers with integer ids. 2026-06-22
1 10.13012/B2IDB-9184261_V1 2026-05-07

22.8 KB File
8.38 MB File
1.1 GB File

Contact the Research Data Service for help interpreting this log.

Dataset update: {"all_globus"=>[nil, true]} 2026-06-24T12:03:40Z
Dataset update: {"all_medusa"=>[nil, true]} 2026-06-22T15:45:08Z
Dataset update: {"publication_state"=>["version candidate under curator review", "released"], "release_date"=>[nil, Mon, 22 Jun 2026]} 2026-06-22T15:33:11Z
RelatedMaterial update: {"material_type"=>["Article", "Dataset"], "selected_type"=>["Article", "Dataset"]} 2026-06-15T19:53:30Z
Dataset update: {"keywords"=>["pi3k citation network; ", "pi3k citation network"], "version_comment"=>["This network was created using a seed set expansion protocol that misses some citations. We are now using a breadth-first-search *(BFS) protocol to extract data from the Dimensions database and replace proprietary identifiers with integer ids. The new version will be of greater utility since it allows users to select from 1-hop, 2-hop, or 3-hop neighbors of the founder node in the network (Whitman et al. 1988). We have no objections to keep the existing file but the new data are better.", "Use a breadth-first-search *(BFS) protocol to extract data from the Dimensions database and replace proprietary identifiers with integer ids."]} 2026-06-15T19:53:30Z
RelatedMaterial update: {"note"=>[nil, ""]} 2026-06-15T19:49:24Z
Dataset update: {"description"=>["In version 2 of this network, we retrieved citing and cited articles around around a founder article [Whitman et al. (1988) Nature, 332(6165):644–646] using a breadth first search (BFS) with API calls to the Dimensions database. This, instead of the seed seed expansion protocol used in version 1 where multiple rounds of citing and cited are harvested as in the first version. \r\n\r\nThe data are useful for structural and simulation studies and are stored as three edgelists in parquet format. Level 1 (L1) consists of all edges between articles that are one hop (one citation or one reference away) from the founder article (Whitman et al. 1988) Level 2 (L2) includes L1 and nodes that are two hops away from the founder. Similarly with Level 3 (L3). A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder and BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to. The integer ids have been freshly minted and there is no correspondence between the ids in version 1 of this data repository. The column headers used are source and target.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science, which owns the Dimensions database. The edgelists provide structural information for topological studies. Nodelists can be extracted by taking the union of unique(source) and unique(target). The authors thank Digital Science for providing access to Dimensions.\r\n\r\nLevel 1: 962 nodes and 8,214 edges\r\nLevel 2: 115,366 nodes and 1,918,815 edges\r\nLevel 3: 7,881,135 nodes and 277,338,664 edges\r\n\r\nNote that level 1 nodes and edges are a subset of level 2 nodes and edges, in turn a subset of level 3 nodes and edges.\r\n", "In version 2 of this network, we retrieved citing and cited articles around around a founder article [Whitman et al. (1988) Nature, 332(6165):644–646] using a breadth first search (BFS) with API calls to the Dimensions database. This BFS protocol replaces the seed seed expansion protocol used in version 1 where multiple rounds of citing and cited are harvested as in the first version. \r\n\r\nThe data are useful for structural and simulation studies and are stored as three edgelists in parquet format. Level 1 (L1) consists of all edges between articles that are one hop (one citation or one reference away) from the founder article (Whitman et al. 1988) Level 2 (L2) includes L1 and nodes that are two hops away from the founder. Similarly with Level 3 (L3). A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder and BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to. The integer ids have been freshly minted and there is no correspondence between the ids in version 1 of this data repository. The column headers used are source and target.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science, which owns the Dimensions database. The edgelists provide structural information for topological studies. Nodelists can be extracted by taking the union of unique(source) and unique(target). The authors thank Digital Science for providing access to Dimensions.\r\n\r\nLevel 1: 962 nodes and 8,214 edges\r\nLevel 2: 115,366 nodes and 1,918,815 edges\r\nLevel 3: 7,881,135 nodes and 277,338,664 edges\r\n\r\nNote that level 1 nodes and edges are a subset of level 2 nodes and edges, in turn a subset of level 3 nodes and edges.\r\n"]} 2026-06-15T15:20:36Z
Funder create: {"name"=>"Digital Science", "identifier"=>"", "identifier_scheme"=>"", "grant"=>"", "dataset_id"=>3473, "code"=>"other"} 2026-06-15T15:19:11Z
Dataset update: {"description"=>["In version 2 of this network, we retrieved citing and cited articles around around Whitman et al. (1988) Nature, 332(6165):644–646 using a breadth first search (BFS) instead of a seed seed expansion as in the first version.The authors thank Digital Science for supporting this project through access to the Dimensions database. The data are stored in three sets, each with a nodelist and edgelist. Level 1 (L1) consists of all articles retrieved through the Dimensions API that are one hop (one citation or one reference away) from teh Whitman article. Level 2 (L2) includes L1 and nodes that are two hops away from the founder. A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder. The BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science. The edgelist and nodelist provide adequate structural information for topological studies and date stamps on each node. \r\n", "In version 2 of this network, we retrieved citing and cited articles around around a founder article [Whitman et al. (1988) Nature, 332(6165):644–646] using a breadth first search (BFS) with API calls to the Dimensions database. This, instead of the seed seed expansion protocol used in version 1 where multiple rounds of citing and cited are harvested as in the first version. \r\n\r\nThe data are useful for structural and simulation studies and are stored as three edgelists in parquet format. Level 1 (L1) consists of all edges between articles that are one hop (one citation or one reference away) from the founder article (Whitman et al. 1988) Level 2 (L2) includes L1 and nodes that are two hops away from the founder. Similarly with Level 3 (L3). A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder and BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to. The integer ids have been freshly minted and there is no correspondence between the ids in version 1 of this data repository. The column headers used are source and target.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science, which owns the Dimensions database. The edgelists provide structural information for topological studies. Nodelists can be extracted by taking the union of unique(source) and unique(target). The authors thank Digital Science for providing access to Dimensions.\r\n\r\nLevel 1: 962 nodes and 8,214 edges\r\nLevel 2: 115,366 nodes and 1,918,815 edges\r\nLevel 3: 7,881,135 nodes and 277,338,664 edges\r\n\r\nNote that level 1 nodes and edges are a subset of level 2 nodes and edges, in turn a subset of level 3 nodes and edges.\r\n"]} 2026-06-15T15:19:11Z
Dataset update: {"description"=>["This network is a curated version of a network created by harvesting citing and cited articles around Whitman et al. (1988) Nature, 332(6165):644–646. For further details refer to <a href=\"https://databank.illinois.edu/datasets/IDB-4897629\">https://databank.illinois.edu/datasets/IDB-4897629</a>. Curation was performed by removing nodes (articles identified by Dimensions publication ids) whose year or DOI record was missing from the Dimensions database and retaining the largest connected component of the resulting network. This curated network represents the largest connected component. Integer ids were generated by the authors to replace the Dimensions ids. Access to the raw data requires a license from Digital Science. \r\n\r\nThe original pi3k network contains 17,970,340 nodes of which only 17,508,111 (97.42%) them have both year and DOI information. In this curated version, 127,255,020 edges were reduced to 125,118,817 edges (98.32%). The edges are represented with two columns in the file where the \"source_iid\" column represents the citing node and \"target_iid\" column represents the cited node. Restricting the original pi3k network to only those nodes with both year and DOI information results in a graph that has 21,469 connected components where the largest connected component has 17,486,619 nodes (97.31%) . Thus, this network represents 97.31% of the nodes and 98.32% of the edges in the original network. The authors thank Digital Science for supporting this project through access to the Dimensions database.", "In version 2 of this network, we retrieved citing and cited articles around around Whitman et al. (1988) Nature, 332(6165):644–646 using a breadth first search (BFS) instead of a seed seed expansion as in the first version.The authors thank Digital Science for supporting this project through access to the Dimensions database. The data are stored in three sets, each with a nodelist and edgelist. Level 1 (L1) consists of all articles retrieved through the Dimensions API that are one hop (one citation or one reference away) from teh Whitman article. Level 2 (L2) includes L1 and nodes that are two hops away from the founder. A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder. The BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science. The edgelist and nodelist provide adequate structural information for topological studies and date stamps on each node. \r\n"]} 2026-06-12T15:36:54Z
Dataset update: {"hold_state"=>["version candidate under curator review", "none"]} 2026-06-11T22:27:14Z
Dataset update: {"version_comment"=>[nil, "This network was created using a seed set expansion protocol that misses some citations. We are now using a breadth-first-search *(BFS) protocol to extract data from the Dimensions database and replace proprietary identifiers with integer ids. The new version will be of greater utility since it allows users to select from 1-hop, 2-hop, or 3-hop neighbors of the founder node in the network (Whitman et al. 1988). We have no objections to keep the existing file but the new data are better."]} 2026-06-11T21:54:31Z
RelatedMaterial create: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-9184261_V1", "uri"=>"10.13012/B2IDB-9184261_V1", "uri_type"=>"DOI", "citation"=>"Park, Minhyuk; Chacko, George (2026): Curated PI3K (Phosphoinositide 3-kinase) Network. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-9184261_V1", "dataset_id"=>3473, "selected_type"=>"Dataset", "datacite_list"=>"IsNewVersionOf", "note"=>nil, "feature"=>nil} 2026-06-11T21:51:07Z
RelatedMaterial create: {"material_type"=>"Article", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-4897629_V1", "uri"=>"10.13012/B2IDB-4897629_V1", "uri_type"=>"DOI", "citation"=>"Park, Minhyuk; Chacko, George (2026): PI-3Kinase Citation Network. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4897629_V1", "dataset_id"=>3473, "selected_type"=>"Article", "datacite_list"=>"IsSupplementedBy", "note"=>"", "feature"=>nil} 2026-06-11T21:51:07Z
Creator create: {"family_name"=>"Chacko", "given_name"=>"George", "identifier"=>"0000-0002-2127-1892", "email"=>"chackoge@illinois.edu", "is_contact"=>true, "row_position"=>2} 2026-06-11T21:51:07Z
Dataset update: {"corresponding_creator_name"=>[nil, "George Chacko"], "corresponding_creator_email"=>[nil, "chackoge@illinois.edu"]} 2026-06-11T21:51:07Z
Creator create: {"family_name"=>"Park", "given_name"=>"Minhyuk", "identifier"=>"0000-0002-8676-7565", "email"=>"minhyuk2@illinois.edu", "is_contact"=>false, "row_position"=>1} 2026-06-11T21:51:07Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us