Skip to content

Code for SIGKDD'2021 paper: Deep Clustering based Fair Outlier Detection

Notifications You must be signed in to change notification settings

brandeis-machine-learning/FairOutlierDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fair Outlier Detection

Hanyu Song, Peizhao Li, and Hongfu Liu. "Deep Clustering based Fair Outlier Detection, SIGKDD 2021.

Paper Abstract

In this paper, we focus on the fairness issues regarding unsupervised outlier detection. Traditional algorithms, without specific design for algorithmic fairness, could implicitly encode and propagate statistical bias in data and raise societal concerns. To correct such unfairness and deliver a fair set of potential outlier candidates, we propose Deep Clustering-based Fair Outlier Detection (DCFOD) that learns a good representation for utility maximization while enforcing the learnable representation to be subgroup-invariant on the sensitive attribute. Considering the coupled and reciprocal nature between clustering and outlier detection, we leverage deep clustering to discover the intrinsic cluster structure and out-of-structure instances. Meanwhile, an adversarial training erases the sensitive pattern for instances for fairness adaptation. Technically, we propose an instance-level weighted representation learning strategy to enhance the joint deep clustering and outlier detection, where the dynamic weight module re-emphasizes contributions of likely-inliers while mitigating the negative impact from outliers. Demonstrated by experiments on eight datasets comparing to 17 outlier detection algorithms, our DCFOD method consistently achieves superior performance on both the outlier detection validity and two types of fairness notions in outlier detection.

Software Requirement

numpy==1.7.1
torch==1.7.0
sklearn==0.22.0
pandas==1.0.5
cuda=10.1.243

Model training requires at least one GPU.

Datasets

We compare performance on eight datasets student, asd, obesity, cc, german, drug, adult, kdd.

To prepare datasets before model training, run

python3 getDatasets.py

Theoretically, you don't have to modify any path for datasets or folders throughout model training.

DCFOD

To obtain DCFOD's performance on a speicifc dataset, run

python3 train.py *dataset_name* *GPU_index* *with_weight*

i.e., 

python3 train.py student 0 true

GPU_index indicates the i-1 th GPU you want to train on. If you only have one, simply type 0.

To obtain the results for derivative DCOD, change the kf hyperparameter value to 0 in the train method.

Competitive Method: FairLOF

FairLOF requires the baseline result of LOF, you should first run

python3 LOF.py *dataset_name*

followed by

python3 get_Ws_for_FairLOF.py

which will retrieve all the Ws variable based on LOF results for all datasets, which is requried in FairLOF calculation.

If you don't have the LOF results for all datasets, tweak the datasets list in the get_Ws_for_FairLOF.py file.

then run

python3 FairLOF.py *dataset_name*

the experiment runs on 4 GPUs, you can tweak line 21-25 in FairLOF.py to modify cuda and the associated GPU settings.

Competitive Method: FairOD

We run FairOD with

python3 FairOD.py *dataset_name* *GPU_index* *fair_command*

you should first obtain a baseline result, i.e.,

python3 FairOD.py student 0 f

then train the fair model

python3 FairOD.py student 0 t 

Competitive Method: Conventional Outlier Detectors

To save methods' outlier scores and obtain metrics' values on a specific dataset, run

python3 pyod_results.py *dataset_name*

Or, if you already ran the above command, which has saved outlier scores, and you want to re-obtain the metrics' values, simply run

python3 Retriever.py *dataset_name*

Numerical Evaluation

We use AUC to measure detection validity, and Fgap, Frank to measure two types of fairness degree.

We calculate AUC with roc_auc_score from sklearn.metrics, and define Fgap, Frank in Retriever.py.

During model training, we obtain the fairness metrics with the fetch method in Retriever.py.

Reference

@inproceedings{10.1145/3447548.3467225,
author = {Song, Hanyu and Li, Peizhao and Liu, Hongfu},
title = {Deep Clustering Based Fair Outlier Detection},
year = {2021},
isbn = {9781450383325},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3447548.3467225},
doi = {10.1145/3447548.3467225},
booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining},
pages = {1481–1489},
numpages = {9},
keywords = {fair representation learning, deep clustering on outlier detection, outlier detection},
location = {Virtual Event, Singapore},
series = {KDD '21}
}

About

Code for SIGKDD'2021 paper: Deep Clustering based Fair Outlier Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages