You do not have to install esfbdata to use it. Simply clone the repository, change to the clone directory, and run
python -m esfbdata --help
esfbdata Relies on the following Python modules, and has been tested on Python version 2.7.11, but should also work on 3.2+.
elasticsearch
beautifulsoup4
dateutil
esfbdata is a small command-line program to parse Facebook data archives
and ingest them into an Elasticsearch cluster. Currently, it is capable of
parsing the html/events.htm
, html/messages.htm
, and html/timeline.htm
files. It requires the Python interpreter, version 2.7 or 3.2+, and it is not
platform specific. It is released under GPLv3 terms. A copy of the GPLv3 is
included in LICENSE.
esfbdata [-h] [--version] -n NODE [NODE ...] [-i INDEX]
[--ignore STATUS_CODE [STATUS_CODE ...]]
[--parser {html.parser,lxml,html5lib}]
[--ingest {events,messenger,timeline} [{events,messenger,timeline} ...]]
[-v] [-d] [--log-format LOG_FORMAT] [-s]
FILE [FILE ...]
positional arguments:
FILE The Facebook archives to ingest
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-n NODE [NODE ...], --nodes NODE [NODE ...]
Elasticsearch nodes to connect to
-i INDEX, --index INDEX
Elasticsearch index to ingest data into (default:
facebook)
--ignore STATUS_CODE [STATUS_CODE ...]
Elasticsearch errors to ignore (default: [400])
--parser {html.parser,lxml,html5lib}
HTML parser to use (default: html.parser)
--ingest {events,messenger,timeline} [{events,messenger,timeline} ...]
Set archives to ingest (default: ['events',
'messenger', 'timeline'])
-v, --verbose Set log level to INFO
-d, --debug Set log level to DEBUG (supercedes --verbose)
--log-format LOG_FORMAT
Set the format of logs (default: %(asctime)s -
%(levelname)s - %(message)s)
-s, --simulate Skip indexing of data
Spin up instance of Elasticsearch
docker run -d elasticsearch
Get the name of the docker instance
docker ps
Spin up an instance of Kibana and attach it (not required, but you probably will want it)
docker run --link some_elasticsearch:elasticsearch -d kibana
Get your Elasticsearch IP
docker inspect some_elasticsearch_id
Run esfbdata on the ZIP archive downloaded from Facebook
esfbdata -n some_elasticsearch_ip -v /path/to/facebook-username.zip
This will process the data and ingest it into elasticsearch with the default
options. Likely, you will want to use lxml
if you have it and add
--parser lxml
to the command arguments. Beware of the --debug
option, as
it will generate an extreme amount of data and should really only be used
for very tailored debugging scenarios.
Developers will likely want to inherit their parser from the FacebookIngester
class or use the already existing classes (FacebookEventsIngester
,
FacebookMessengerIngester
, and FacebookTimelineIngester
).
esfbdata uses the logging framework with logger named esfbdata
.