Skip to content

Latest commit

 

History

History

operon

Modern C++ framework for Symbolic Regression

License build-linux build-windows Documentation Status Gitter chat

Operon is a modern C++ framework for symbolic regression that uses genetic programming to explore a hypothesis space of possible mathematical expressions in order to find the best-fitting model for a given regression target. Its main purpose is to help develop accurate and interpretable white-box models in the area of system identification. More in-depth documentation available at https://operongp.readthedocs.io/.

How does it work?

Broadly speaking, genetic programming (GP) is said to evolve a population of "computer programs" ― AST-like structures encoding behavior for a given problem domain ― following the principles of natural selection. It repeatedly combines random program parts keeping only the best results ― the "fittest". Here, the biological concept of fitness is defined as a measure of a program's ability to solve a certain task.

In symbolic regression, the programs represent mathematical expressions typically encoded as expression trees. Fitness is usually defined as goodness of fit between the dependent variable and the prediction of a tree-encoded model. Iterative selection of best-scoring models followed by random recombination leads naturally to a self-improving process that is able to uncover patterns in the data:

Build instructions

The project requires CMake and a C++17 compliant compiler. The recommended way to build Operon is via either nix or vcpkg.

Required dependencies

Optional dependencies

Build options

The following options can be passed to CMake:

Option Description
-DCERES_TINY_SOLVER=ON Use the very small and self-contained tiny solver from the Ceres suite for solving non-linear least squares problem.
-DUSE_SINGLE_PRECISION=ON Perform model evaluation using floats (single precision) instead of doubles. Great for reducing runtime, might not be appropriate for all purposes.
-DUSE_OPENLIBM=ON Link against Julia's openlibm, a high performance mathematical library (recommended to improve consistency across compilers and operating systems).
-DBUILD_TESTS=ON Build the unit tests.
-DBUILD_PYBIND=ON Build the Python bindings.
-DUSE_JEMALLOC=ON Link against jemalloc, a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support (mutually exclusive with tcmalloc).
-DUSE_TCMALLOC=ON Link against tcmalloc (thread-caching malloc), a malloc(3) implementation that reduces lock contention for multi-threaded programs (mutually exclusive with jemalloc).
-DUSE_MIMALLOC=ON Link against mimalloc a compact general purpose malloc(3) implementation with excellent performance (mutually exclusive with jemalloc or tcmalloc).

Publications

If you find Operon useful you can cite our work as:

@inproceedings{10.1145/3377929.3398099,
    author = {Burlacu, Bogdan and Kronberger, Gabriel and Kommenda, Michael},
    title = {Operon C++: An Efficient Genetic Programming Framework for Symbolic Regression},
    year = {2020},
    isbn = {9781450371278},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3377929.3398099},
    doi = {10.1145/3377929.3398099},
    booktitle = {Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion},
    pages = {1562–1570},
    numpages = {9},
    keywords = {symbolic regression, genetic programming, C++},
    location = {Canc\'{u}n, Mexico},
    series = {GECCO '20}
}

Operon was also featured in a recent survey of symbolic regression methods, where it showed good results:

@misc{lacava2021contemporary,
      title={Contemporary Symbolic Regression Methods and their Relative Performance}, 
      author={William La Cava and Patryk Orzechowski and Bogdan Burlacu and Fabrício Olivetti de França and Marco Virgolin and Ying Jin and Michael Kommenda and Jason H. Moore},
      year={2021},
      eprint={2107.14351},
      archivePrefix={arXiv},
      primaryClass={cs.NE}
}