A python implementation of the Louvain and Leiden community detection algorithms.
This project is an implementation of the Louvain and Leiden algorithms for community detection in graphs.
The implementation was conducted and tested using Python version 3.10.12.
In order to try out this implementation, do the following:
- Check out this repository, or download the latest version and unzip it somewhere.
- Install the dependencies, for example using
pip install -r requirements.txt
. - Run
jupyter notebook
and explore the available notebooks.
To use this implementation, you only need the NetworkX graph library – it was implemented and tested with version 3.0:
networkx==3.0
For running the notebooks and working with the example datasets, you will need the following packages in addition to NetworkX:
jupyter_core==5.2.0
notebook==6.5.2
pandas==1.5.3
matplotlib==3.7.0
Lastly, to run the tests, generate code coverage information and run static analysis, the following packages are required:
pytest==7.2.1
pytest-cov==4.0.0
black==23.1.0
mypy==1.0.1
ruff==0.0.290
The python dependencies are specified in the requirements.txt
file and can be installed (preferrably in a virtual environment) using the command pip install -r requirements.txt
.
Alternatively, if you're running nix and if you are so inclined, you can also run nix-shell
to install the dependencies.
To keep the Python code as close as possible to the pseudocode and the notation used, Unicode symbols are used for some identifiers.
The Python specification allows certain Unicode symbols to be used in identifiers, for example Greek symbols such as 𝓗
by their "normal" variants, such as H
here.
If you cannot use the unicode version of the code, for example due to rendering issues, you may also use the ascii
branch instead.
For demonstration purposes of this library, the following notebooks are included in this repository:
Demo (Karate Club).ipynb
, which demonstrates the usage of this implementation,Parameter Exploration.ipynb
, which examines the influence of parameters on the detected communities on a larger problem instance, andRuntime Comparisons.ipynb
which applies the implementations to networks of varying size to compare the runtimes.
This project has unit tests for almost all individual utility functions and the quality functions. In addition to that, there are tests which execute the community detection algorithms on known graphs and check the results.
The tests are located in the tests
directory and can be executed running the pytest
tool as follows from a shell in the repository root, generating branch coverage information in the process:
pytest --cov --cov-report=html
The coverage information will be placed in a directory called htmlcov
.
Apart from the tests, this project employs a number of static analysis tools:
The tool ruff is used to lint the code and look for any problems with the code.
A variety of checks is configured, among them lints which check for public functions having documentation blocks in the correct format, all functions being annotated with type information, etc.
To check that the code satisfies the linting rules, run the following command:
ruff .
It will list problems it has found in the code. If no lint finds an issue, then this command produces no output.
The black formatter ensures that the python code is formatted according to its code style. To reformat the code according to the code style, run the following command:
black .
You should see the line 15 files left unchanged.
when running it on a fresh checkout.
Python is a dynamically typed language, meaning that the type of a variable may change over its lifetime and that type checking is done at runtime.
The code uses type hints for all functions, in order to add static types, which can be checked using a type checker, such as mypy
.
The type hints serve as further documentation of a function's usage and allow for a type checker to verify the types of variables and functions in a program, so that it may calls to functions with arguments of an incorrect type, for example.
To check this project using mypy
, run the following command:
mypy .
You should see the message Success: no issues found in 15 source files
.
The following papers were used as sources for the implementation:
- Fast unfolding of communities in large networks for a description of the Louvain algorithm.
- From Louvain to Leiden: guaranteeing well-connected communities and especially its supplementary material for pseudocode for the algorithms and some further explanations of the ideas.
The notebooks in this repository use the following datasets:
- The Karate Club dataset (provided by the NetworkX library)
- The Cora dataset (contained in the datasets directory)
- The Jazz Musicians dataset (also part of the datasets directory)
- The Arxiv GR-QC dataset (also also part of the datasets directory)