High Performance Innovative Data Lake Management
Proven and tested Data Lake end-to-end management and processing
What are High Performance Computing Clusters?
A High Performance Cluster Computing platform built for high-speed data engineering.
HPCC Systems key advantage comes from its lightweight core architecture. Better performance, near real-time results and full-spectrum operational scale — no need for a massive development team, unnecessary add-ons or increased processing costs.
Innovative Features
View the many advantages HPCC Systems brings to the maintenance of your Data Lake or Big Data environment.
Harness the benefits of Kubernetes
Learn how using our containerized, cloud native platform can improve your current cloud deployments. Today’s HPCC Systems combines the usability of our legacy bare metal platform with the automation of Kubernetes to make it easy to set up, manage and scale your implementation.
Runs on Kubernetes | New Storage Plane Architecture supports | Elasticity | Security |
Support for Azure Kubernetes Service Support for Amazon Elastic Kubernetes Service | Object Stores: AWS Simple Storage Service (S3) and Azure Blob Storage Disk Stores: AWS Elastic Block Storage and Azure Files/Azure Disks | Scaling a cluster without moving the data Auto wakeup to enable on demand processing by compute resources | End to end encryption Service Mesh Options (Linkerd and Istio) OAuth 2.0 support for Authentication, with built in support for Azure AD JWT |
Learn About our Cloud Native Platform
Visit the Cloud Native Wiki page for access to Helm charts, blog content, videos and other instructional information.
Running HPCC System on a Local Machine
A Virtual Machine containerized deployment with Docker Desktop or Minikube is an excellent resource for experimenting, evaluating and training on the HPCC Systems platform.
Containerized Platform Documentation
Documentation useful for cloud-based deployments featuring Terraform, Helm, and other deployments (large or small) as well as local testing and development deployments.
Ultra Performance
HPCC Systems key advantage comes from its lightweight core architecture. Better performance, near real-time results and full-spectrum operational scale — without a massive development team, unnecessary add-ons or increased processing costs.
HPCC Systems Overview
HPCC Systems is an open source platform for big data implementations,
whether as a data lake or data warehouse, providing users with a
clear path from data discovery to production.
End to End Data Lake Management
Data lakes are helping leading organizations solve the
problem of extremely large, unstructured datasets,
allowing them to increase responsiveness and
scalability while reducing costs.
Spark Comparative Analysis
A Comparative analysis of Spark and HPCC Systems including the architectures and feature support of Spark and HPCC Systems in regard to data lake capabilities and their focus on different parts of the big data pipeline.
Code Less — Accomplish More
A declarative programming language, ECL allows a programmer to express the logic of a computation without describing its flow control. Developers tell the system what they need, but leaves it up to the system to determine the best way to do it.
DataSeers Case Study
With the efficiency of ECL, fewer lines of code allows prototypes that can be iterated quickly, speeding both time to market and time to revenue.
Try the ECL Playground
Try our Enterprise Control Language (ECL), the data-oriented programming language specially designed for data processing and analytics.
Access Free Training
From free training courses to rich community resources and a comprehensive wiki, we have all the resources for every stage from initial installation all the way to power user.
Machine Learning Library and Causality Analytics
The ML Library provides a wide range of algorithms and is designed to utilize the parallel computing capabilities of HPCC Systems. Build and test ML models and to use those models to predict qualitative or quantitative values.
Machine Learning Demystified
A quick but potent intro to Machine Learning for those who are new to the subject. This article provides enough of the basic theory and terminology to make you dangerous.
Machine Learning Workshop
Follow along with our trainers as they demonstrate our DBSCAN, K-Means, Logistic and Linear Regression, Generalized Neural Networks and Learning Trees bundles.
Machine Learning Library
The HPCC Systems Machine Learning Library provides a wide range of Machine Learning algorithms accessible from ECL, and designed to utilize the parallel computing capabilities of HPCC Systems.
Integrate with Ease
HPCC Systems continues to develop new plugins, connectors and stand-alone applications which are free to the community to help you integrate many popular third party tools with the HPCC Systems platform.
Free Add-On Modules
HPCC Systems continues to develop new stand-alone applications and plug-in modules that extend the capabilities of the base HPCC Systems platform.
ECL Bundles
An ECL Bundle is a self-contained set of ECL files, designed to accomplish specific tasks. They are encapsulated for versioning, distribution and download.
Third Party Integrations
Use embedded languages and external datastores with HPCC Systems to integrate your system to your data.
Use your favorite language or data source
ECL is very flexible. You can embed a number of different languages within your ECL code and process data on a HPCC Systems cluster from a variety of different sources using the various plugins and connectors we provide specifically to help you bridge the gap.
Using Your Favorite Language or Data Source
How flexible is ECL? Read about supported languages, plugins and connectors.
Embedded Languages and Data Stores Wiki
The full list of supported languages, plugins and connectors, including links to other information you might find useful.
Advanced Python Embedding
Learn how ECL makes it easy to transition between declarative and procedural worlds through use of embedding.
Committed to Open Source Innovation
Freely available to the open source community for more than 10 years and licensed under Apache 2.0, we continue to push the boundaries of Big Data with a vibrant development community both online and in academic institutions.
GitHub Repository
HPCC Systems is an open source, massive parallel-processing computing platform for Big Data processing and analytics.
Stack Overflow Community Forum
Receive peer to peer support on our Stack Overflow forums. Ask questions specific to your development or read and answer questions others have posted.
Academic Research
The HPCC Systems Team collaborates with multiple colleges, universities, high schools and institutions of higher learning around the world to help train and develop the future managers of Big Data projects.
Proven, Stable and Secure
HPCC Systems is a mature platform that has been heavily used in commercial applications for more than two decades, predating even the development of Hadoop. Created by LexisNexis Risk Solutions, an innovative pioneer in big data processing, and open source for nearly a decade now, HPCC Systems features a vibrant development community that continues to push the boundaries of big data.
Securing your environment & protecting your data
This blog highlights some of the many security features that make HPCC Systems a compelling solution for users that require a robust, configurable, highly secure computing platform.
Detail on the many security features that make HPCC Systems a compelling solution for users that require a robust, configurable, highly secure computing platform.
Data Lake Curation and Governance with Tombolo
Tombolo provides the tools required to implement, document, and maintain an organizational infrastructure and can implement safeguards to govern what users and applications have access to those data assets.
Conduct curation and governance operations in an automated fashion to consistently and reliably curate huge amounts of inbound new data and ensure continuous availability.
What You Need to Know About Securing Your Platform
Blog discussing some of the basic security considerations to properly secure a Big Data platform from unauthorized access or data theft.
Get Started
Want to do a little more testing before you install a full cluster? If you’re ready to start building your Data Lake, you can jump straight to learning about how to install your first complete HPCC Systems cluster. Interested in learning just how powerful, flexible, and efficient ECL really is? Take a look at our ECL guide.
Localized Machine
Containerized deployments using Docker Desktop or Minikube are easier to start up locally and provide more flexibility and stability.
Documentation & Training
Tackling big data problems? We’ve got you covered, with documentation and training to support you from initial installation all the way to power user.
Get Up and Running
Get a high level overview to help new users get started with HPCC Systems and ECL (Enterprise Control Language).
Test Drive
Test our code in a virtual playground using a sample dataset. Or, create your own high performance computing cluster (Thor) and/or query cluster (Roxie).
HPCC Systems: The End-to-End Data Lake Management Solution
Ready. Set. Go.
Are you ready to get started using HPCC Systems? Use the panels below to get a quick overview of the HPCC Systems platform, learn about how you can ingest, clean and deliver your mixed schema data to make it useful and relevant for both you and your customers.
Versatile. Flexible. Refined.
An experienced HPCC Systems user explains the benefits and advantages of using HPCC Systems as your big data management solution.
Ingest data from your Data Lake
Here are some example data sets for example programs provided by members of the HPCC Systems Community.
Get more from your data with the Machine Learning Library
The HPCC Systems Machine Learning Library provides a wide range of Machine Learning algorithms accessible from ECL, and designed to utilize the parallel computing capabilities of HPCC Systems.
Design and automate your data workflows
Tombolo technology is the central console for developers and operators, providing all of the facilities needed for designing, developing, automating, documenting, and governing data lakes.
A legacy of Innovation and Open Source software for more than 10 years
Freely available to the open source community for more than 10 years and licensed under Apache 2.0, we continue to push the boundaries of big data with a vibrant development community both online and in academic institutions.
Have a Question?
Check out our FAQ page. Browse the topics to discover more about HPCC Systems technology and answers to common questions about HPCC Systems, ECL and more.
Stay Informed
Keep up with the latest in HPCC Systems developer news and community information. Sign up for our newsletter here
Get the latest news covering platform updates, technical blogs, events and other related announcements. Just put your email address in the form below. We will not send you junk mail or sell your email address. Just the latest information to keep you up to date.