Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

travis: Remove valgrind #18899

Merged
merged 1 commit into from
May 7, 2020
Merged

travis: Remove valgrind #18899

merged 1 commit into from
May 7, 2020

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented May 6, 2020

When the valgrind run was added, it took 2 hours. Travis kindly raised the timeout limit to the maximum possible of 3 hours.

Today, a full build of Bitcoin Core with all tests takes more than three hours. Thus, it is impossible to run all tests on travis.

Moreover, the feedback loop for developers that create a pull request takes at least 2 hours, but in some cases (when the travis queue is full) until the next day. This is unacceptable.

Fix both issues by removing the build from travis.

Please note that the ci/test/ configurations are not removed. They will stay in the repo and can be executed anywhere (just not on travis).

@jamesob
Copy link
Contributor

jamesob commented May 6, 2020

ACK fa082d0

Sad but true.

@jnewbery
Copy link
Contributor

jnewbery commented May 6, 2020

Concept ACK! I think it's very important for developer experience to be able to iterate quickly on pull requests. If someone suggests a change and the author pushes it, then the ci should run quickly so reviewers don't feel blocked.

I think it would make sense to have a valgrind run with just the unit tests, and perhaps a basic functional test, as long as that could be done in a reasonable time. I also think it'd be worthwhile doing a full valgrind test on nightly builds somewhere.

@practicalswift
Copy link
Contributor

practicalswift commented May 6, 2020

Oh, please don't remove Valgrind until we have an MSAN job in Travis. We really need automated detection of uninitialized memory reads enabled in Travis.

@promag
Copy link
Contributor

promag commented May 6, 2020

Could run the job only in master (or non PR branch) or use travis cron?

@DrahtBot DrahtBot added the Tests label May 6, 2020
@jnewbery
Copy link
Contributor

jnewbery commented May 6, 2020

@practicalswift I think it's a trade-off. valgrind tests are useful, but running all the functional tests in valgrind for every branch push in every PR is terrible for fast iteration and developer experience. I fully support doing valgrind testing, just not for every functional test in every PR.

Things like this:

image

(7 hours since I pushed and travis is still pending) mean that I can't make progress on PRs. That branch is failing because of a valgrind timeout in a test, and because iteration is so slow, that PR hasn't made any progress for two days.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 7, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jnewbery
Copy link
Contributor

jnewbery commented May 7, 2020

Travis completed in 1hr3mins. That's an improvement!

utACK fa082d0

@practicalswift
Copy link
Contributor

practicalswift commented May 7, 2020

Oh, please don't remove Valgrind until we have an MSAN job in Travis. We really need automated detection of uninitialized memory reads enabled in Travis.

@practicalswift I think it's a trade-off. valgrind tests are useful, but running all the functional tests in valgrind for every branch push in every PR is terrible for fast iteration and developer experience. I fully support doing valgrind testing, just not for every functional test in every PR.

I agree: I think we should remove Valgrind testing in Travis as long as MSAN testing is added in the same PR.

MSAN does not have this slowness problem :)

As long as we have some form of automated detection of uninitialized memory reads enabled in Travis (via Valgrind or MSAN) I'm happy :)

In summary:

  • ACK on removing Valgrind and adding MSAN.
  • NACK on removing Valgrind without adding MSAN.

@sipa
Copy link
Member

sipa commented May 7, 2020

We could also just run Valgrind on master but not all PRs, independently of when/how msan testing is enabled.

@DrahtBot DrahtBot mentioned this pull request May 7, 2020
@practicalswift
Copy link
Contributor

practicalswift commented May 7, 2020

We could also just run Valgrind on master but not all PRs, independently of when/how msan testing is enabled.

Yes, that is a very good point.

As long as we continue guarding master on bitcoin/bitcoin from uninitialized memory reads by running the unit tests, functional tests and fuzz tests using Valgrind or MSAN I'm happy :)

@laanwj
Copy link
Member

laanwj commented May 7, 2020

I think it would make sense to have a valgrind run with just the unit tests, and perhaps a basic functional test, as long as that could be done in a reasonable time.

I ike this idea. At least the initialization/shutdown sequence will be run at least once in valgrind in that case, which seems to be an effective use of CI resources.

(but removing it entirely in the mean time is fine with me)

@practicalswift
Copy link
Contributor

practicalswift commented May 7, 2020

I now noticed that this PR removes also the valgrind fuzz job too, is that intentional? Is running the fuzzing targets against our seed corpus under valgrind that slow? :(

FWIW: pulling in 7e14297 from #18288 to this PR would allow us to get both a.) good CI speed and b.) no reduction in uninitialized memory testing coverage. What are the best arguments for not doing that? :)

@maflcko
Copy link
Member Author

maflcko commented May 7, 2020

I am going to merge this to unbreak travis. This has two ACKs and is needed as a first step for anything that is done later. If someone has ideas or implementations of how this can be improved in the future, they are more than welcome to submit pull requests and ask me for review. When those improvements are ready for merge, I am happy to take them.

Just to respond to some points:

  • We could run it on master

If this means run it on master in travis, then no, this is impossible as explained in the OP. The build might happen to pass right now because I reset the failing tests and the cache happens to be intact, but a fresh build without cache takes more than 3 hours. Builds that take more than 3 hours are impossible to run on travis.

If this means to run it on master somewhere else. Then: Hell yes! The ci system has been designed to be as flexible as possible. You can run it on your raspberry pi at home, on a virtual machine at home, or even inside of docker if you know how to (hint: grep for the env var that has DANGER in it), you can also run it on any cloud ci service that supports more builds longer than 3 hours. See for example the cirrus ci config in commit 2a9e983 (just an example, anything is possible)

  • We could run just the unit tests or "basic functional tests"

I am not aware of any memory bug that was caught by just the unit test valgrind run that was not caught by any other build configuration on travis, so I think this is useless. But as I said, improvements are welcome, and if people think this is worthwhile to peruse, a follow-up improvement should be submitted.

I am not sure how to classify "basic" functional tests and this task sounds like bikeshedding to me, where if too little tests are classified as "basic", we will have an equivalent result of this pull request, or where too much tests are classified as "basic", we will have an equivalent result of before this pull request.

  • We could use MSAN instead

Sure, Concept ACK. Though, the pull is currently blocked on build system changes, which I am not capable to review.

@maflcko maflcko merged commit c1cd2b5 into bitcoin:master May 7, 2020
@maflcko maflcko deleted the 2005-RemoveTravis branch May 7, 2020 11:58
@bitcoin bitcoin locked as resolved and limited conversation to collaborators May 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants