Description
Huge fan of machinethink.net. Sorry to ask you this here, but wasn't sure of the best way to do so - you'd recently mentioned that you noticed CoreML inference falls back to GPU if there's a custom layer in the model; how did you know if it was deployed to the Neural Engine in the first place?
also, do you know if Metal Performance Shaders can access the Neural Engine? I ask because it seems that the MPS Graph API, which now supports both training and inference through a very Tensorflow-like fashion, seems to be the most low-level, robust abstraction Apple offers for custom ML, and it's quite bizarre that it appears so strictly bound to the GPU. It makes sense in that exists in Metal, but you'd really think that Apple would offer a computational graph API that sits above both BNNS (for CPU) and MPS (for GPU) and whatever specific instructions the Neural Engine supports, thus allowing you to express a model, train it, and run it across all three architectures depending on power and speed targets. It appears that CoreML now supports this (which is wondering how you knew the model was specifically running on the Neural Engine) but I find to be bizarre that if you actually decide to express and train a complex model in MPS, which is the only way to do it at the moment (and what CreateML is built off of) that you're then stuck, in a sense, to the GPU. I feel like I'm missing something here.