-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split AllEquations into pieces for more parallelism #472
Conversation
This might break some of the scripts that load equation data from this file. Ideally they wouldn't be doing that in the first place and should read from a separate python-friendly file. If things don't build for this reason I can try to to fix things. |
Can you point me to the code that does this? It should not be hard to point them at the new files instead. (EDIT: they should all be fixed now, but I'm not sure I know how to run all these scripts correctly so please double check that they are working as intended.) |
There are several I'm aware of:
equational_theories/scripts/find_dual.py Line 169 in c27789e
|
@digama0 : A lot of the automated diagram generating python and ruby scripts assume the existence of AllEquations.lean because this structure was agreed on near the beginning of the project. They read this file as a textfile. This isn't super ideal and maybe we should have written an export script in lean that exports the equations of "AllEquations.lean" into json. But fixing this now will be non-trivial. |
(I'd be happy to help rewrite the scripts to load from an appropriate json data file as appropriate.) |
The python script that generated the equations appears deterministic, so in principle, the same script can be modified to output the json file. However the script writers must then use this json file. |
Actually, regarding speeding up equations generation in #478 , I think lean would also benefit from reading input via JSON instead of having to parse it out of a lean file. I think a large factor in the remaining performance is just the top level command loop, running linters and other things on each command in addition to running it. The drawback is that it would most likely end up less readable... |
There are three main improvements in this implementation: * The command builds terms directly instead of constructing syntax and re-elaborating it through the regular frontend * The proof of the `models_iff` theorem is streamlined and also generalized to arbitrary arity instead of proving every arity separately * The slow persistent env extension is replaced with a TagExtension, on the assumption that we only need to ask for the equations in bulk and do not need to maintain a data structure for fast query. The result is that it now only takes 30 seconds instead of 2.6 minutes to parse the 4600 equations. Including the effect of #472 it only takes 6 seconds.
I can update the ruby code when I'm back at the computer later today. |
Note: the current state is that all scripts have been fixed, although I don't know how to use them all so they are only lightly tested. |
Ah, apologies for the confusion. |
I will merge this. Fixes to the scripts if required can be done from new PRs. This will save CI time |
While more can be done to improve the performance of the
equation
command, one thing that helps a lot is to make an embarrassingly-parallel task more parallel in practice by splitting it into separate files. TheEquation
files have also been moved into a common folder.