-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation: use the new top-k implementation in Accelerate #7
Comments
Do you have a sense of what range of values for |
hi, I've created a little benchmark repo to compare different implementations https://github.com/jkrukowski/topk Look like |
@jkrukowski Very cool, thanks for that! 🙌 Actually, I was confused when I read the release notes and the implementation I linked in this issue is a nearest-neighbor selection, not From your analysis, it looks like the current implementation is faster for small numbers, but degrades for larger values of For typical values of |
Adjusted https://github.com/jkrukowski/topk please take a look. I've added additional XCTest performance benchmark. Google benchmark is not very conclusive but according to XCTest |
@pcuenca given the above do you think that implementation present in https://github.com/jkrukowski/topk/blob/f2c164e5c360a6a055b785db8830b78c38936e12/Sources/TopK/Accelerate.swift#L7 would be ok? If so I could create a PR |
@jkrukowski Sounds great! I ran it on my system (M1 Max) and observed similar results, but wondered why the Google Benchmark shows results so different. I moved the array creation inside the measurement loop to invalidate any potential caching that might be in place. The differences are smaller (as they are dominated by the overhead of array creation), but the Accelerate implementation is still faster, so it looks safe to go ahead! I'd just recommend to add a few top-k tests to the repo, if possible, to ensure nothing changes before and after – and because it'd be awesome to have them! 😇. Thanks a lot for working on this, it's great to bring performance closer to the greedy case! |
Reference: https://developer.apple.com/documentation/accelerate/bnns#4164142
We should use it when permitted by the underlying OS, and compare its performance with the current implementation: d2917b2
The text was updated successfully, but these errors were encountered: