Synthetic Networks For Benchmarking
Dataset Description |
The synthetic networks in this dataset were generated using the RECCS protocol developed by Anne et al. (2024). Briefly, the RECCS process is as follows. An input network and clustering (by any algorithm) is used to pass input parameters to a stochastic block model (SBM) generator. The output is then modified to improve fit to the input real world clusters after which outlier nodes are added using one of three different options. See Anne et al. (2024): in press Complex Networks and Applications XIII (preprint : arXiv:2408.13647). The networks in this dataset were generated using either version 1 or version 2 of the RECCS protocol followed by outlier strategy S1. The input networks to the process were (i) the Curated Exosome Network (CEN), Wedell et al. (2021), (ii) cit_hepph (https://snap.stanford.edu/), (iii) cit_patents (https://snap.stanford.edu/), and (iv) wiki_topcats (https://snap.stanford.edu/). Input Networks:
The synthetic file naming system should be interpreted as follows: a_b_c.tsv.gz where
Thus, cit_hepph_0.01_v1.tsv indicates that this network was modeled on the cit_hepph network and RECCSv1 was used to match edge count and connectivity to a Leiden-CPM 0.01 clustering of cit_hepph. For SBM generation, we used the graph_tool software (P. Peixoto, Tiago 2014. The graph-tool python library. figshare. Dataset. https://doi.org/10.6084/m9.figshare.1164194.v14) Additionally, this dataset contains synthetic networks generated for a replication experiment (repl_exp.tar.gz). The experiment aims to evaluate the consistency of RECCS-generated networks by producing multiple replicates under controlled conditions. These networks were generated using different configurations of RECCS, varying across two versions (v1 and v2), and applying the Connectivity Modifier (CM++, Ramavarapu et al. (2024)) pre-processing. Please note that the CM pipeline used for this experiment filters small clusters both before and after the CM treatment. Input Network : CEN Within repl_exp.tar.gz, the synthetic file naming system should be interpreted as follows:
where:
For example:
The ground truth clustering input to RECCS is contained in repl_exp_groundtruths.tar.gz. |
Subject |
Technology and Engineering |
Keywords |
Community Detection; Synthetic Networks; Stochastic Block Model (SBM); |
License |
CC BY |
Funder |
Illinois:Insper Partnership-Grant:NA |
Corresponding Creator |
George Chacko |
Downloaded |
596 times |
| Version | DOI | Comment | Publication Date |
|---|---|---|---|
| 1 | 10.13012/B2IDB-9805305_V1 | 2025-02-08 |
Contact the Research Data Service for help interpreting this log.
| Dataset | update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Technology and Engineering"], "external_files_link"=>[nil, ""], "external_files_note"=>[nil, ""]} | 2025-07-16T20:49:07Z |
| RelatedMaterial | destroy: {"material_type"=>"Preprint", "availability"=>nil, "link"=>"https://arxiv.org/abs/2502.02050", "uri"=>nil, "uri_type"=>nil, "citation"=>"Anne, L., Vu-Le, T.A., Park, M., Warnow, T. and Chacko, G., 2025. RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation. arXiv preprint arXiv:2502.02050.", "dataset_id"=>2780, "selected_type"=>"Other", "datacite_list"=>nil, "note"=>nil, "feature"=>nil} | 2025-02-10T22:49:50Z |
| RelatedMaterial | destroy: {"material_type"=>"Preprint", "availability"=>nil, "link"=>"https://arxiv.org/abs/2502.00686", "uri"=>nil, "uri_type"=>nil, "citation"=>"Park, M., Feng, D.W., Digra, S., Vue-Le, T.A., Anne, L., Chacko, G. and Warnow, T., 2025. Improved Community Detection using Stochastic Block Models. arXiv preprint arXiv:2502.00686.", "dataset_id"=>2780, "selected_type"=>"Other", "datacite_list"=>nil, "note"=>nil, "feature"=>nil} | 2025-02-10T22:49:50Z |