Description
The default BLAS Julia uses is OpenBLAS. Apple's M1 has proprietary dedicated matrix hardware that is only accessible via Apple's Accelerate BLAS implementation. That proprietary interface can provide 2x to 4x speedups for some linear algebra use cases (see https://discourse.julialang.org/t/does-mac-m1-in-multithreads-is-slower-that-in-single-thread/61114/12?u=kristoffer.carlsson for some benchmarks and discussion.)
Since Julia 1.7 there's a BLAS multiplexer:
- https://github.com/staticfloat/libblastrampoline
(this currently -- as far as I understood it -- requires still proprietary code for each BLAS, and for M1 there's so far only a minimal shim via AppleAccelerateLinAlgWrapper.jl )
So in theory, it should be possible to extend this so that depending on a given platform either OpenBLAS or other BLAS solutions are used transparently by default.
So this issue discusses what needs to be done to have Apple's Accelerate access to M1 hardware acceleration available by default in Julia
Activity