-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling performance #86
Comments
Systematic profiling would be great. MD and MC have different hotspots so I'd recommend to separate the two cases. |
This might also be interesting (cachegrind). |
Adding this one to the list. |
FWIW, the comments in the reddit post concerning this article where very much in favor of Edit: Added link to said post on reddit. |
I didn't catch that, thanks ! |
Adding this to the list of good reads |
I started doing some profiling using I also have some weird stuff, for example in the bench |
No AFAIK, there should be no How do I read the graph? When I click a bar with a function, I see above the contributions to the function, i.e. which functions are called? |
Yes this is the CU graph. I didn't check the memory, do you know any tool to do that ?
This is exactly it. This is explained in more detail here. |
Yes, that make no sense. Maybe there is an issue with the flamegraph generation?
Do you have numbers to share? jemalloc (the default rust allocator) is known to have a bigger memory footprint but is faster than the standard malloc. Also, if the footprint is not too big, I tend not to worry: most of the classical simulations are CPU bound (this is not the case for quantum calculations), so I happily trade any increase in memory for faster code. We still need to be careful about memory usage to fit as much data as possible in the cache though.
I've heard of massif, which is an heap profiler coming with valgrind. |
@Luthaf in #114 you mentioned that |
I'll check the direct result from |
I remember having a hard time finding expressions for the ewald forces. I'll check if I can find my sources again (remind me tomorrow if I did not answer here). I've read in a lot of places that one can tune the alpha parameter for Ewald to get O(n^3/2) behaviour (k is not really relevant here, as it will never be bigger than a few tens -- and less than 10 in most of the cases). I'll try to find something about it too! |
Sure, was just wondering if you also looked at memory.
I currently only have a german source. Fourier space (k-space) is equation 4.5, real space part is 4.4. Eq. 4.6 is the dipole correction. But there are plenty of ways to write the equations so you will find different formulations depending on where you look. Edit: Here you go! (complete derivation) |
"Understanding molecular simulation" mentions it in section 12.1.5, but how I understand it this is under the hypothesis of a uniform distribution of the particles (is it OK ?) and when taking advantage of the cutoff in the real space computation, which we don't really do.
Well 5^3 is already more than 100, which is approximately the ratio of performance between forces and energy in your benchmark.
I'll look at it thanks ! If you find an English version in between I'll take it, my german isn't that good ^^ |
I edited my comment with a source to the complete derivation in english. Can you open the link? Not sure if one needs special access (I'm in university network at the moment). |
Nope I get a "403 Forbidden". |
You have mail 😄 |
You beat me on this! |
From the looks of it, we'll never get to the O(n^(3/2)) dream for the forces. I even think that if we use the optimized alpha that gives the best complexity for the energy computation, we may end up with a O(n^(5/2)) for the forces, which is not cool. Is the Ewald summation usually used for MD, or is it more of a MC thing ? |
It is the first and crudest technique for coulombic interactions, and it is still used in MD (LAMMPS propose both Ewald and PPPM summations for coulombic interactions). We also have other techniques for computing coulombic interactions: Particle-Mesh Ewald (PME), Smooth Particle-Mesh Ewald (SPME) and Particle-Particle Particle-Mesh summation (PPPM) are the more common techniques. There is a current trend for using the Wolf method, but it is only good for homogeneous systems. I think we can still spend a few time optimizing Ewald (even if it is just the constant before the n^5/2), because our implementation is correct (at least for the NIST tests) but very slow compared to other MD codes. |
Do the other codes use Verlet lists and such ? Do we plan to use some ? |
Yes and yes. But the speed difference is already perceptible for small systems (the same size as the cutoff radius), where from my understanding the Verlet lists do not give such a big improvement (evert particle in the system is your neighbour!) |
From my experience, cells/verlet lists are good when systems are beyond a certain size. As Luthaf said, if the box is too small, we will not have any advantage or even disadvantages due to overhead. I wrote this in the closed issue #109: I think there is a problem with the forces at the moment. At least the resulting pressure is weird (running NPT MC at 1 bar gives 8e4 bar as internal pressure). As soon as I know how to track whats going wrong there, I'll open an issue. |
Closing this issue since there is not much to be done. We have #12 for verlet/cell lists |
Kind of related to #62 but not quite: I intend to do some serious profiling to see where the bottlenecks are, I open this issue to keep you updated. I think I'll do it on some examples and some of the benchmarks.
I'll be starting from this, this and this. If you have any suggestion on what code would be interesting to profile or how to do this (this is the first time I'm profiling Rust code) don't hesitate. @Luthaf I think you told me you already did some kind of profiling, anything I can reuse ?
The text was updated successfully, but these errors were encountered: