There are some analysis using Spark to implement different algorithms such as TF-IDF, counting NGrams, logistic regression, graph analysis in order to find some characteristics such as connectivity-degree, diameter of gram and so on, and dimention reduction to implement an MLP network.
Used as a tool to manage the flow of data, making scheduling much more easier and flexible.
A simple but yet effective strategy to manage large amount of data comming, putting them on different channels and process them to be usable for furtur applications.
Used as an end-point to Kafka channels and a tool to answer text-based queries. This tool is also used to do pre-process on text data.
Using Presto to query on Kafka's data, adding codes to make the Presto able to understand its data and to query.