A privacy-aware federated computing scheme to let non-trusted clients perform statistical analysis. Process data scattered across multiple servers, each with its own privacy policy. Merge these remote data with local (or simulated) sources.
Author: Emmanouil (Manios) Krasanakis
Contact: maniospas@hotmail.com
License: Apache 2
Quickstart data server
Install FedMed with:
pip install fedmed
Data servers host your data for clients to use.
Available map-reduce operations
are specified in a configuration file.
Find a first default in the example/
folder.
You can reuse
the same configuration between servers
and clients, but this is not mandatory.
For example, some servers may remove
some operations from the configuration,
or add more privacy policies.
import fedmed as fm
server = fm.Server(config="config.yaml")
🚧 Privacy policies may make
fedmed.ops.private
operations inexact.
Each server can contain fragments of several datasets. Load data as pandas dataframes or combinations of lists and dicts, and set them as fragments like so:
data = [1, 2, 3] # or dict of lists, pandas dataframe, etc
server["test array part 1"] = data
Finally, run your server with a flask-supporing WSGI library, like waitress. This will let clients include it in data operations.
from waitress import serve
if __name__ == "__main__":
serve(server.app, host="127.0.0.1", port=8000)
🌐 Set up a reverse proxy server to restrict who can perform operations on your system. Even the best privacy policies fail against repeated attacks from malicious actors.
Example client program
Install FedMed with:
pip install fedmed
Set up communication channels with remote data fragments and organize them into one dataset. Fragments may only partially match in terms of structure and operations - they only need to support the particular analysis you are performing.
import fedmed as fm
sources = [
fm.Remote(ip="http://127.0.0.1:8000", fragment="test array part 1"),
fm.Remote(ip="http://127.0.0.1:8000", fragment="test array part 2")
]
data = fm.FedData(sources, config="config.yaml")
Call simple operations among those described in the
default configuration file config.yaml
(find a first default
in the example/
folder) or define new ones.
The same configuration could be shared between the client and
servers, but this is not mandatory;
some servers may not support some
capabilities, in which case some computations will fail.
Operations are performed under a map-reduce scheme. The map is performed in the servers, and the reduce on the client. For example, run the following code after setting up some devices at the respective ip addresses:
from fedmed.stats import sum, len
mean = sum(data) / len(data)
print('Mean', mean)
🔒 Server owners are left in control of both how it performs its namesake map methods, and how it distorts outcomes to comply with its privacy policies.
Example simulation
Install FedMed with:
pip install fedmed
Set up one or more server instances like above to host some data fragments, but don't actually run them:
import fedmed as fm
from random import random, seed
seed(5)
serverA = fm.Server(config="config.yaml")
serverA["fragment"] = {
"Treatment1": [random() for _ in range(1000)],
"Treatment2": [random()**2+0.22 for _ in range(1000)],
}
serverB = fm.Server(config="config.yaml")
serverB["fragment"] = {
"Treatment1": [random() for _ in range(300)],
"Treatment2": [random()**2+0.25 for _ in range(300)],
}
Now switch to writting client code. Use the above servers to declare simulation data fragments:
data = fm.FedData([
fm.Simulation(server=serverA, fragment="fragment"),
fm.Simulation(server=serverB, fragment="fragment")
])
Perform any fedmed operations you would like. For example, if the privacy policy is lax enough, you can even perform non-parametric tests and reconstruct an estimation of the data distribution (not the actual data):
treat1 = data["Treatment1"]
treat2 = data["Treatment2"]
print(fm.stats.base.wilcoxon(treat1, treat2))
distr = fm.stats.base.reconstruct(treat1-treat2) # synthetic estimation based on privacy-aware operations
from matplotlib import pyplot as plt
plt.hist(distr)
plt.show()