Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Optimize Cython code. Use scipy blas function pointers. #1524

Merged
merged 4 commits into from
Apr 6, 2014

Conversation

jseabold
Copy link
Member

Didn't have to change any of the build setup which is nice. Tested building on windows for Python 2.7 32- and 64-bit without problems.

The old mailing list example from 9/13 is now 10-30x faster.

Since the Z matrix in the univariate ARMA case is always [[1, 0, ..., 0]]. We now avoid writing general KF code and just use slicing to select what we want.

There's probably a bit more performance that can be squeezed out here. We can us PyArray_DATA rather than memoryviews for the arrays that don't need to be sliced often. There are also still a few cases where we use dgemm when we don't need to. E.g., we're taking the outer product of two r x 1 vectors, but I left these as dgemm. The initial variance estimation is also still pure Python because I didn't want to rewrite it. This might gain us the most, but it's only called once per hit. There might also be some more ways to avoid the temporary arrays, but my thinking in BLAS linear algebra needs some work.

Need to add a note to the release about this.

@josef-pkt
Copy link
Member

just to understand: this is a rewrite of #1069 where you replaced the lapack/blas part by using the scipy pointers ?

Do we have anyone to test on Apple? (given Pauli's comments about "funny" linear algebra libraries)

@jseabold
Copy link
Member Author

Yes, it's a rewrite. We don't use any of the problematic headers from Apple. They're mostly only for single precision and we use double everywhere. I made a note about it at https://github.com/ChadFulton/pykalman_filter/issues/1

@josef-pkt
Copy link
Member

sounds good, one worry less.

However it would still be good if an Apple user could test this before merge.

@jseabold
Copy link
Member Author

We don't use sdot or any of the other problematic functions that Pauli pointed out. AFAIK, it's a non-issue if you're using scipy >= 0.13.x too or a non-accelerate BLAS. If we want to support older scipy and use these problematic functions, then we'll need to do something or tell people not to use the borked Accelerate BLAS.

@jseabold
Copy link
Member Author

We need to bump the Cython version on Travis. Cf. #1523.

@bashtage
Copy link
Member

Ubuntu 14.04 LTS will have SciPy 0.13.1, which makes it a little easier to choose a newer version.

@jseabold
Copy link
Member Author

jseabold commented Apr 3, 2014

The current issue is that CObject does not exist in Python >= 3.2. They switched to PyCapsule. PyCapsule exists in Python >= 2.7, but as long as we target 2.6, then we can't use the easy fix.

We need some kind of conditional header file. It looks like NumPy provides this in core/include/numpy/npy_3kcompat.h, but I can't get it to build on Python 3 for some reason. I have a message out to the cython-user list. If I don't get any help there, then I'll have another look at this when I have a few hours.

@ev-br
Copy link
Contributor

ev-br commented Apr 3, 2014

(Probably irrelevant) might want to use dsymm instead of dgemm here and there

@jseabold
Copy link
Member Author

jseabold commented Apr 3, 2014

Yes, P is a symmetric matrix but it not yet handled as such in the code.

@jseabold
Copy link
Member Author

jseabold commented Apr 3, 2014

Would save some operations. I think I'll have one more round of optimization / profiling before merging this, provided we get the build crap pinned down. We may not need to use memoryviews here all that much either. We might be better off without buffer acquisition here and using the numpy C api more.

@ev-br
Copy link
Contributor

ev-br commented Apr 3, 2014

Ignorance here: what is the reason for not using scipy blas wrappers as is? xSYMM wrappers are available from 0.13 onwards.

@jseabold
Copy link
Member Author

jseabold commented Apr 3, 2014

Well, we are using the scipy blas wrappers. We're using the C function pointer to the fblas function directly from Cython. We don't want to involve any Python function calls in the C code.

Our dependency for scipy right now is 0.11.0 or 0.12.0. I don't remember offhand.

@jseabold
Copy link
Member Author

jseabold commented Apr 5, 2014

Ah, I think I got this. Was trying to be too cute. Just wrote a header with the proper py3k macros.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling afba01a on jseabold:arma-speedup-scipy into bed3499 on statsmodels:master.

@jseabold
Copy link
Member Author

jseabold commented Apr 5, 2014

All green. Debating whether to try to squeeze a little more performance out of this.

@jseabold
Copy link
Member Author

jseabold commented Apr 6, 2014

I'm leaning towards merging this and starting a new issue for further micro-optimizations. This is already a big speed-up even if we can get more. No reason to wait until I have some time to do further profiling.

jseabold added a commit that referenced this pull request Apr 6, 2014
ENH: Optimize Cython code. Use scipy blas function pointers.
@jseabold jseabold merged commit a738b4f into statsmodels:master Apr 6, 2014
@jseabold jseabold deleted the arma-speedup-scipy branch April 6, 2014 15:53
@josef-pkt josef-pkt added the PR label Apr 14, 2014
@piskvorky
Copy link
Contributor

I'll be porting these scipy-blas shenanigans soon too, for dual py2/3 compatibility in gensim.

How has the "capsule" thing panned out since being merged, Skipper? Any issues?

@jseabold
Copy link
Member Author

Works fine AFAICT. No complaints on Travis. See Pauli's reply here [1] for potential issues on Apple's Accelerate with some functions. You can find all the known issues here [2]. There's also some workaround code in Theano that tries to detect problems and conditionally write the headers [3]. I'm fairly certain that's the same sdot issue, but I don't know theano all that well.

[1] http://scipy-user.10969.n7.nabble.com/SciPy-User-scipy-linalg-blas-before-0-12-0-td19274.html
[2] https://github.com/scipy/scipy/tree/master/scipy/_build_utils/src
[3] https://github.com/Theano/Theano/blob/master/theano/tensor/blas_headers.py

@piskvorky
Copy link
Contributor

Yeah I noticed, I reported these Apple problems some month ago: https://groups.google.com/d/msg/cython-users/V_DR1xi5Ang/48OamQSX7TwJ and numpy/numpy#4007

General response was "not our problem", so my solution in gensim was to detect whether the sdot signature returns "float" or "double" at runtime... a nasty hack, but seems to get the work done :)

Will check out the theano code too, thanks for the links and heads up, Skipper.

@jseabold
Copy link
Member Author

Oh right. I remember reading that thread now. Fun stuff and small world.

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014
ENH: Optimize Cython code. Use scipy blas function pointers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants