Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split AllEquations into pieces for more parallelism #472

Merged
merged 12 commits into from
Oct 10, 2024

Conversation

digama0
Copy link
Contributor

@digama0 digama0 commented Oct 9, 2024

While more can be done to improve the performance of the equation command, one thing that helps a lot is to make an embarrassingly-parallel task more parallel in practice by splitting it into separate files. The Equation files have also been moved into a common folder.

@carlini
Copy link
Contributor

carlini commented Oct 9, 2024

This might break some of the scripts that load equation data from this file. Ideally they wouldn't be doing that in the first place and should read from a separate python-friendly file. If things don't build for this reason I can try to to fix things.

@digama0
Copy link
Contributor Author

digama0 commented Oct 9, 2024

Can you point me to the code that does this? It should not be hard to point them at the new files instead. (EDIT: they should all be fixed now, but I'm not sure I know how to run all these scripts correctly so please double check that they are working as intended.)

@carlini
Copy link
Contributor

carlini commented Oct 9, 2024

There are several I'm aware of:

File.read(File.join(__dir__, '../equational_theories/AllEquations.lean')).split("\n").each { |s|

EQUATIONS_FILENAME = "../equational_theories/AllEquations.lean"

for line in open("../equational_theories/AllEquations.lean"):

for line in open("equational_theories/AllEquations.lean"):

@digama0 digama0 mentioned this pull request Oct 9, 2024
@Shreyas4991
Copy link
Collaborator

@digama0 : A lot of the automated diagram generating python and ruby scripts assume the existence of AllEquations.lean because this structure was agreed on near the beginning of the project. They read this file as a textfile. This isn't super ideal and maybe we should have written an export script in lean that exports the equations of "AllEquations.lean" into json. But fixing this now will be non-trivial.

@teorth
Copy link
Owner

teorth commented Oct 9, 2024

This is what #142 is supposed to do. Perhaps #427 can be adapted to create such a JSON, which all other tools then draw from? Then if we ever refactor the equation lean files again then one just has to change the one export tool.

@carlini
Copy link
Contributor

carlini commented Oct 10, 2024

(I'd be happy to help rewrite the scripts to load from an appropriate json data file as appropriate.)

@Shreyas4991
Copy link
Collaborator

Shreyas4991 commented Oct 10, 2024

The python script that generated the equations appears deterministic, so in principle, the same script can be modified to output the json file. However the script writers must then use this json file.

@digama0
Copy link
Contributor Author

digama0 commented Oct 10, 2024

This is what #142 is supposed to do. Perhaps #427 can be adapted to create such a JSON, which all other tools then draw from? Then if we ever refactor the equation lean files again then one just has to change the one export tool.

Actually, regarding speeding up equations generation in #478 , I think lean would also benefit from reading input via JSON instead of having to parse it out of a lean file. I think a large factor in the remaining performance is just the top level command loop, running linters and other things on each command in addition to running it.

The drawback is that it would most likely end up less readable...

Shreyas4991 pushed a commit that referenced this pull request Oct 10, 2024
There are three main improvements in this implementation:
* The command builds terms directly instead of constructing syntax and
re-elaborating it through the regular frontend
* The proof of the `models_iff` theorem is streamlined and also
generalized to arbitrary arity instead of proving every arity separately
* The slow persistent env extension is replaced with a TagExtension, on
the assumption that we only need to ask for the equations in bulk and do
not need to maintain a data structure for fast query.

The result is that it now only takes 30 seconds instead of 2.6 minutes
to parse the 4600 equations. Including the effect of #472 it only takes
6 seconds.
@vlad902
Copy link
Contributor

vlad902 commented Oct 10, 2024

I can update the ruby code when I'm back at the computer later today.

@digama0
Copy link
Contributor Author

digama0 commented Oct 10, 2024

Note: the current state is that all scripts have been fixed, although I don't know how to use them all so they are only lightly tested.

@vlad902
Copy link
Contributor

vlad902 commented Oct 10, 2024

Note: the current state is that all scripts have been fixed, although I don't know how to use them all so they are only lightly tested.

Ah, apologies for the confusion.

@Shreyas4991
Copy link
Collaborator

Shreyas4991 commented Oct 10, 2024

I will merge this. Fixes to the scripts if required can be done from new PRs. This will save CI time

@Shreyas4991 Shreyas4991 merged commit abf2c24 into teorth:main Oct 10, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants