Curated PI3K (Phosphoinositide 3-kinase) Network
Dataset Description |
In version 2 of this network, we retrieved citing and cited articles around around a founder article [Whitman et al. (1988) Nature, 332(6165):644–646] using a breadth first search (BFS) with API calls to the Dimensions database. This BFS protocol replaces the seed seed expansion protocol used in version 1 where multiple rounds of citing and cited are harvested as in the first version. The data are useful for structural and simulation studies and are stored as three edgelists in parquet format. Level 1 (L1) consists of all edges between articles that are one hop (one citation or one reference away) from the founder article (Whitman et al. 1988) Level 2 (L2) includes L1 and nodes that are two hops away from the founder. Similarly with Level 3 (L3). A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder and BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to. The integer ids have been freshly minted and there is no correspondence between the ids in version 1 of this data repository. The column headers used are source and target. In these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science, which owns the Dimensions database. The edgelists provide structural information for topological studies. Nodelists can be extracted by taking the union of unique(source) and unique(target). The authors thank Digital Science for providing access to Dimensions. Level 1: 962 nodes and 8,214 edges
Note that level 1 nodes and edges are a subset of level 2 nodes and edges, in turn a subset of level 3 nodes and edges. |
Subject |
Technology and Engineering |
Keywords |
pi3k citation network |
License |
CC BY |
Funder |
Digital Science |
Corresponding Creator |
George Chacko |
Downloaded |
14 times |
| Version | DOI | Comment | Publication Date |
|---|---|---|---|
| 2 | 10.13012/B2IDB-9184261_V2 | Use a breadth-first-search *(BFS) protocol to extract data from the Dimensions database and replace proprietary identifiers with integer ids. | 2026-06-22 |
| 1 | 10.13012/B2IDB-9184261_V1 | 2026-05-07 |
Contact the Research Data Service for help interpreting this log.
| Dataset | update: {"all_globus"=>[nil, true]} | 2026-06-24T12:03:40Z |
| Dataset | update: {"all_medusa"=>[nil, true]} | 2026-06-22T15:45:08Z |
| Dataset | update: {"publication_state"=>["version candidate under curator review", "released"], "release_date"=>[nil, Mon, 22 Jun 2026]} | 2026-06-22T15:33:11Z |
| RelatedMaterial | update: {"material_type"=>["Article", "Dataset"], "selected_type"=>["Article", "Dataset"]} | 2026-06-15T19:53:30Z |
| Dataset | update: {"keywords"=>["pi3k citation network; ", "pi3k citation network"], "version_comment"=>["This network was created using a seed set expansion protocol that misses some citations. We are now using a breadth-first-search *(BFS) protocol to extract data from the Dimensions database and replace proprietary identifiers with integer ids. The new version will be of greater utility since it allows users to select from 1-hop, 2-hop, or 3-hop neighbors of the founder node in the network (Whitman et al. 1988). We have no objections to keep the existing file but the new data are better.", "Use a breadth-first-search *(BFS) protocol to extract data from the Dimensions database and replace proprietary identifiers with integer ids."]} | 2026-06-15T19:53:30Z |
| RelatedMaterial | update: {"note"=>[nil, ""]} | 2026-06-15T19:49:24Z |
| Dataset | update: {"description"=>["In version 2 of this network, we retrieved citing and cited articles around around a founder article [Whitman et al. (1988) Nature, 332(6165):644–646] using a breadth first search (BFS) with API calls to the Dimensions database. This, instead of the seed seed expansion protocol used in version 1 where multiple rounds of citing and cited are harvested as in the first version. \r\n\r\nThe data are useful for structural and simulation studies and are stored as three edgelists in parquet format. Level 1 (L1) consists of all edges between articles that are one hop (one citation or one reference away) from the founder article (Whitman et al. 1988) Level 2 (L2) includes L1 and nodes that are two hops away from the founder. Similarly with Level 3 (L3). A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder and BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to. The integer ids have been freshly minted and there is no correspondence between the ids in version 1 of this data repository. The column headers used are source and target.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science, which owns the Dimensions database. The edgelists provide structural information for topological studies. Nodelists can be extracted by taking the union of unique(source) and unique(target). The authors thank Digital Science for providing access to Dimensions.\r\n\r\nLevel 1: 962 nodes and 8,214 edges\r\nLevel 2: 115,366 nodes and 1,918,815 edges\r\nLevel 3: 7,881,135 nodes and 277,338,664 edges\r\n\r\nNote that level 1 nodes and edges are a subset of level 2 nodes and edges, in turn a subset of level 3 nodes and edges.\r\n", "In version 2 of this network, we retrieved citing and cited articles around around a founder article [Whitman et al. (1988) Nature, 332(6165):644–646] using a breadth first search (BFS) with API calls to the Dimensions database. This BFS protocol replaces the seed seed expansion protocol used in version 1 where multiple rounds of citing and cited are harvested as in the first version. \r\n\r\nThe data are useful for structural and simulation studies and are stored as three edgelists in parquet format. Level 1 (L1) consists of all edges between articles that are one hop (one citation or one reference away) from the founder article (Whitman et al. 1988) Level 2 (L2) includes L1 and nodes that are two hops away from the founder. Similarly with Level 3 (L3). A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder and BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to. The integer ids have been freshly minted and there is no correspondence between the ids in version 1 of this data repository. The column headers used are source and target.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science, which owns the Dimensions database. The edgelists provide structural information for topological studies. Nodelists can be extracted by taking the union of unique(source) and unique(target). The authors thank Digital Science for providing access to Dimensions.\r\n\r\nLevel 1: 962 nodes and 8,214 edges\r\nLevel 2: 115,366 nodes and 1,918,815 edges\r\nLevel 3: 7,881,135 nodes and 277,338,664 edges\r\n\r\nNote that level 1 nodes and edges are a subset of level 2 nodes and edges, in turn a subset of level 3 nodes and edges.\r\n"]} | 2026-06-15T15:20:36Z |
| Funder | create: {"name"=>"Digital Science", "identifier"=>"", "identifier_scheme"=>"", "grant"=>"", "dataset_id"=>3473, "code"=>"other"} | 2026-06-15T15:19:11Z |
| Dataset | update: {"description"=>["In version 2 of this network, we retrieved citing and cited articles around around Whitman et al. (1988) Nature, 332(6165):644–646 using a breadth first search (BFS) instead of a seed seed expansion as in the first version.The authors thank Digital Science for supporting this project through access to the Dimensions database. The data are stored in three sets, each with a nodelist and edgelist. Level 1 (L1) consists of all articles retrieved through the Dimensions API that are one hop (one citation or one reference away) from teh Whitman article. Level 2 (L2) includes L1 and nodes that are two hops away from the founder. A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder. The BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science. The edgelist and nodelist provide adequate structural information for topological studies and date stamps on each node. \r\n", "In version 2 of this network, we retrieved citing and cited articles around around a founder article [Whitman et al. (1988) Nature, 332(6165):644–646] using a breadth first search (BFS) with API calls to the Dimensions database. This, instead of the seed seed expansion protocol used in version 1 where multiple rounds of citing and cited are harvested as in the first version. \r\n\r\nThe data are useful for structural and simulation studies and are stored as three edgelists in parquet format. Level 1 (L1) consists of all edges between articles that are one hop (one citation or one reference away) from the founder article (Whitman et al. 1988) Level 2 (L2) includes L1 and nodes that are two hops away from the founder. Similarly with Level 3 (L3). A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder and BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to. The integer ids have been freshly minted and there is no correspondence between the ids in version 1 of this data repository. The column headers used are source and target.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science, which owns the Dimensions database. The edgelists provide structural information for topological studies. Nodelists can be extracted by taking the union of unique(source) and unique(target). The authors thank Digital Science for providing access to Dimensions.\r\n\r\nLevel 1: 962 nodes and 8,214 edges\r\nLevel 2: 115,366 nodes and 1,918,815 edges\r\nLevel 3: 7,881,135 nodes and 277,338,664 edges\r\n\r\nNote that level 1 nodes and edges are a subset of level 2 nodes and edges, in turn a subset of level 3 nodes and edges.\r\n"]} | 2026-06-15T15:19:11Z |
| Dataset | update: {"description"=>["This network is a curated version of a network created by harvesting citing and cited articles around Whitman et al. (1988) Nature, 332(6165):644–646. For further details refer to <a href=\"https://databank.illinois.edu/datasets/IDB-4897629\">https://databank.illinois.edu/datasets/IDB-4897629</a>. Curation was performed by removing nodes (articles identified by Dimensions publication ids) whose year or DOI record was missing from the Dimensions database and retaining the largest connected component of the resulting network. This curated network represents the largest connected component. Integer ids were generated by the authors to replace the Dimensions ids. Access to the raw data requires a license from Digital Science. \r\n\r\nThe original pi3k network contains 17,970,340 nodes of which only 17,508,111 (97.42%) them have both year and DOI information. In this curated version, 127,255,020 edges were reduced to 125,118,817 edges (98.32%). The edges are represented with two columns in the file where the \"source_iid\" column represents the citing node and \"target_iid\" column represents the cited node. Restricting the original pi3k network to only those nodes with both year and DOI information results in a graph that has 21,469 connected components where the largest connected component has 17,486,619 nodes (97.31%) . Thus, this network represents 97.31% of the nodes and 98.32% of the edges in the original network. The authors thank Digital Science for supporting this project through access to the Dimensions database.", "In version 2 of this network, we retrieved citing and cited articles around around Whitman et al. (1988) Nature, 332(6165):644–646 using a breadth first search (BFS) instead of a seed seed expansion as in the first version.The authors thank Digital Science for supporting this project through access to the Dimensions database. The data are stored in three sets, each with a nodelist and edgelist. Level 1 (L1) consists of all articles retrieved through the Dimensions API that are one hop (one citation or one reference away) from teh Whitman article. Level 2 (L2) includes L1 and nodes that are two hops away from the founder. A node that cites both the founder (direct L1 relationship) and a citing node of the founder (L2 relationship) is simply classified at its closest distance — L1, since it directly cites the founder. The BFS assigns each node the minimum hop distance at which it's first discovered, so the L2 path is redundant and ignored. The edges themselves are all captured in the edgelist regardless of which layer the endpoint belongs to.\r\n\r\nIn these data, the original Dimensions ids have been replaced with randomly assigned integer ids. Access to the mapping requires a licence from Digital Science. The edgelist and nodelist provide adequate structural information for topological studies and date stamps on each node. \r\n"]} | 2026-06-12T15:36:54Z |
| Dataset | update: {"hold_state"=>["version candidate under curator review", "none"]} | 2026-06-11T22:27:14Z |
| Dataset | update: {"version_comment"=>[nil, "This network was created using a seed set expansion protocol that misses some citations. We are now using a breadth-first-search *(BFS) protocol to extract data from the Dimensions database and replace proprietary identifiers with integer ids. The new version will be of greater utility since it allows users to select from 1-hop, 2-hop, or 3-hop neighbors of the founder node in the network (Whitman et al. 1988). We have no objections to keep the existing file but the new data are better."]} | 2026-06-11T21:54:31Z |
| RelatedMaterial | create: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-9184261_V1", "uri"=>"10.13012/B2IDB-9184261_V1", "uri_type"=>"DOI", "citation"=>"Park, Minhyuk; Chacko, George (2026): Curated PI3K (Phosphoinositide 3-kinase) Network. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-9184261_V1", "dataset_id"=>3473, "selected_type"=>"Dataset", "datacite_list"=>"IsNewVersionOf", "note"=>nil, "feature"=>nil} | 2026-06-11T21:51:07Z |
| RelatedMaterial | create: {"material_type"=>"Article", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-4897629_V1", "uri"=>"10.13012/B2IDB-4897629_V1", "uri_type"=>"DOI", "citation"=>"Park, Minhyuk; Chacko, George (2026): PI-3Kinase Citation Network. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4897629_V1", "dataset_id"=>3473, "selected_type"=>"Article", "datacite_list"=>"IsSupplementedBy", "note"=>"", "feature"=>nil} | 2026-06-11T21:51:07Z |
| Creator | create: {"family_name"=>"Chacko", "given_name"=>"George", "identifier"=>"0000-0002-2127-1892", "email"=>"chackoge@illinois.edu", "is_contact"=>true, "row_position"=>2} | 2026-06-11T21:51:07Z |
| Dataset | update: {"corresponding_creator_name"=>[nil, "George Chacko"], "corresponding_creator_email"=>[nil, "chackoge@illinois.edu"]} | 2026-06-11T21:51:07Z |
| Creator | create: {"family_name"=>"Park", "given_name"=>"Minhyuk", "identifier"=>"0000-0002-8676-7565", "email"=>"minhyuk2@illinois.edu", "is_contact"=>false, "row_position"=>1} | 2026-06-11T21:51:07Z |