Skip to content

Thisita/esfbdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Installation

You do not have to install esfbdata to use it. Simply clone the repository, change to the clone directory, and run

python -m esfbdata --help

esfbdata Relies on the following Python modules, and has been tested on Python version 2.7.11, but should also work on 3.2+.

elasticsearch
beautifulsoup4
dateutil

Description

esfbdata is a small command-line program to parse Facebook data archives and ingest them into an Elasticsearch cluster. Currently, it is capable of parsing the html/events.htm, html/messages.htm, and html/timeline.htm files. It requires the Python interpreter, version 2.7 or 3.2+, and it is not platform specific. It is released under GPLv3 terms. A copy of the GPLv3 is included in LICENSE.

Usage

esfbdata [-h] [--version] -n NODE [NODE ...] [-i INDEX]
         [--ignore STATUS_CODE [STATUS_CODE ...]]
         [--parser {html.parser,lxml,html5lib}]
         [--ingest {events,messenger,timeline} [{events,messenger,timeline} ...]]
         [-v] [-d] [--log-format LOG_FORMAT] [-s]
         FILE [FILE ...]

Options

positional arguments:
  FILE                  The Facebook archives to ingest

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -n NODE [NODE ...], --nodes NODE [NODE ...]
                        Elasticsearch nodes to connect to
  -i INDEX, --index INDEX
                        Elasticsearch index to ingest data into (default:
                        facebook)
  --ignore STATUS_CODE [STATUS_CODE ...]
                        Elasticsearch errors to ignore (default: [400])
  --parser {html.parser,lxml,html5lib}
                        HTML parser to use (default: html.parser)
  --ingest {events,messenger,timeline} [{events,messenger,timeline} ...]
                        Set archives to ingest (default: ['events',
                        'messenger', 'timeline'])
  -v, --verbose         Set log level to INFO
  -d, --debug           Set log level to DEBUG (supercedes --verbose)
  --log-format LOG_FORMAT
                        Set the format of logs (default: %(asctime)s -
                        %(levelname)s - %(message)s)
  -s, --simulate        Skip indexing of data

Example

Spin up instance of Elasticsearch

docker run -d elasticsearch

Get the name of the docker instance

docker ps

Spin up an instance of Kibana and attach it (not required, but you probably will want it)

docker run --link some_elasticsearch:elasticsearch -d kibana

Get your Elasticsearch IP

docker inspect some_elasticsearch_id

Run esfbdata on the ZIP archive downloaded from Facebook

esfbdata -n some_elasticsearch_ip -v /path/to/facebook-username.zip

This will process the data and ingest it into elasticsearch with the default options. Likely, you will want to use lxml if you have it and add --parser lxml to the command arguments. Beware of the --debug option, as it will generate an extreme amount of data and should really only be used for very tailored debugging scenarios.

Developers

Developers will likely want to inherit their parser from the FacebookIngester class or use the already existing classes (FacebookEventsIngester, FacebookMessengerIngester, and FacebookTimelineIngester).

esfbdata uses the logging framework with logger named esfbdata.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages