My (Benjamin Dayan's) master's thesis at ETH, under the supervision of Marc Kaufmann, Ulysse Schaller, Prof. Dr. Johannes Lengler and Prof. Dr. Angelika Steger
benji_src
├── do_*.py # scripts / experiments, e.g. do_MCMC_ordered.py fits GIRG node locations for a real graph by iterative improvement
├── notebooks/*.ipynb # many messy jupyter notebooks for testing code and some data analysis
├── tests/*.py # dearth of tests
...
└── benji_girgs # Main python backbone for functions and tools
├── fitting.py # DEPRECATED attempts to fit a GIRG to a real graph
├── generation.py # generating GIRGs (cube/torus, max norm / MCD etc.).
Also Chung-Lu generation/fitting, power law distributions.
generate_GIRG_nk is the main omni-function for generating a GIRG
├── graph_kernels.py # testing out graph kernels to distinguish graphs
├── mcmc.py # iteratively refining GIRG fit point locations: MCMC / likelihood maximisation
├── plotting.py # plotting graph node degree distributions with their power law fit tails
├── points.py # classes for different distance functions: Max, MCD, Boolean mixed, distorted
└── utils.py # Utils for working with graphs. Including diffusion maps for initial node location estimates
thesis/ # thesis report latex
nemo-eva/ # Modified Bläsius et al. framework
girg-sampling/ # Python wrapper of Bläsius et al. C++ GIRG generation code
The first half of the thesis is extending the framework of Bläsius et al. to include a range of GIRG variants. Modified framework source code is in nemo-eva
Thomas Bläsius, Tobias Friedrich, Maximilian Katzmann, Anton Krohmer and Jonathan Striebel
Towards a Systematic Evaluation of Generative Network Models
15th Workshop on Algorithms and Models for the Web Graph (2018)
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10836)
We also make use of a Bläsius linear time GIRG generating code in C++, in combination with our own derived algorithms for sampling a cube GIRG and Boolean / MCD GIRGs
Bläsius, T., Friedrich, T., Katzmann, M., Meyer, U., Penschuck, M., & Weyand, C. Efficiently generating geometric inhomogeneous and hyperbolic random graphs
os.environ[DATA_PATH]
is location of the folder containing socfb-*
graphs
Main framework experimental results run in benji_src/do_feature_extract.py: Fitting different generative models to real graphs, generating a mirrored synthetic graph dataset, and recording features of all real and generated graphs is done by FeatureExtractor
class in nemo-eva/src/feature_extractor.py. We have modified this class to additionally include GIRG fitting and generation.
The extracted features are then cleaned in nemo-eva/src/feature_cleaner.py, and a real/synthetic classifier trained and evaluated on each feature set and generative model combination in nemo-eva/src/main.py.
The experiments were run on the ETH Zurich Euler cluster, which has Intel(R) Xeon(R) CPU E3-1284L v4 @ 2.90GHz
cpus, generally requesting 10-12 CPUs and 16-24 GB of RAM, and taking a couple of hours to a day or two to complete (FeatureExtraction - cleaning and classification is much faster).
Main point fitting experiments run in benji_src/do_MCMC_ordered.py: Using a diffusion map initial estimate of GIRG node locations, and then iteratively improving locations by ordered likelihood maximisation.
The final fit point locations are saved, for later edge accuracy metric analysis.