-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deflake: mvcc storage ut #15489
deflake: mvcc storage ut #15489
Conversation
41e3667
to
2461e3c
Compare
The original flaky test shows in CI pipeline[1], but gotestsum run into a golang issue[2]. The error message is not clear from summary, like ``` {"Time":"2023-03-02T09:19:38.754394861Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1433 +0x7e4\n"} {"Time":"2023-03-02T09:19:38.754414561Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/runtime/panic.go:476 +0x32\n"} {"Time":"2023-03-02T09:19:38.754430561Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1493 +0x47\n"} {"Time":"2023-03-02T09:19:38.754482561Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:883 +0xc4\n"} {"Time":"2023-03-02T09:19:38.754497661Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:876 +0xa4\n"} {"Time":"2023-03-02T09:19:38.754512161Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:927 +0x6a\n"} {"Time":"2023-03-02T09:19:38.754567661Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.uber.org/zap/zaptest.testingWriter.Write()\n"} {"Time":"2023-03-02T09:19:38.754571261Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/go/pkg/mod/go.uber.org/zap@v1.24.0/zaptest/logger.go:130 +0x12c\n"} {"Time":"2023-03-02T09:19:38.754582861Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.uber.org/zap/zaptest.(*testingWriter).Write()\n"} {"Time":"2023-03-02T09:19:38.754597761Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.uber.org/zap/zapcore.(*ioCore).Write()\n"} {"Time":"2023-03-02T09:19:38.754600961Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/core.go:99 +0x199\n"} {"Time":"2023-03-02T09:19:38.754612761Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.uber.org/zap/zapcore.(*CheckedEntry).Write()\n"} {"Time":"2023-03-02T09:19:38.754618561Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/entry.go:255 +0x2ce\n"} {"Time":"2023-03-02T09:19:38.754630161Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.uber.org/zap.(*Logger).Info()\n"} {"Time":"2023-03-02T09:19:38.754633261Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/go/pkg/mod/go.uber.org/zap@v1.24.0/logger.go:220 +0x6a\n"} {"Time":"2023-03-02T09:19:38.754644861Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/server/v3/storage/mvcc.(*treeIndex).Compact()\n"} {"Time":"2023-03-02T09:19:38.754648461Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/server/storage/mvcc/index.go:194 +0x144\n"} {"Time":"2023-03-02T09:19:38.754664961Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/server/v3/storage/mvcc.(*store).scheduleCompaction()\n"} {"Time":"2023-03-02T09:19:38.754670161Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/server/storage/mvcc/kvstore_compaction.go:29 +0xbb\n"} {"Time":"2023-03-02T09:19:38.754681861Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/server/v3/storage/mvcc.(*store).compact.func1()\n"} {"Time":"2023-03-02T09:19:38.754690561Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/server/storage/mvcc/kvstore.go:235 +0x9e\n"} {"Time":"2023-03-02T09:19:38.754720061Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/pkg/v3/schedule.job.Do()\n"} {"Time":"2023-03-02T09:19:38.754724161Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/pkg/schedule/schedule.go:41 +0x70\n"} {"Time":"2023-03-02T09:19:38.754736161Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/pkg/v3/schedule.(*job).Do()\n"} {"Time":"2023-03-02T09:19:38.754750961Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob()\n"} {"Time":"2023-03-02T09:19:38.754754161Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/pkg/schedule/schedule.go:206 +0x101\n"} {"Time":"2023-03-02T09:19:38.754765861Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/pkg/v3/schedule.(*fifo).run()\n"} {"Time":"2023-03-02T09:19:38.754769061Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/pkg/schedule/schedule.go:187 +0x1a5\n"} {"Time":"2023-03-02T09:19:38.754780461Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/pkg/v3/schedule.NewFIFOScheduler.func1()\n"} {"Time":"2023-03-02T09:19:38.754783661Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/pkg/schedule/schedule.go:101 +0x39\n"} {"Time":"2023-03-02T09:19:38.754824061Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1493 +0x75d\n"} FAIL: (code:1): % (cd server && 'env' 'ETCD_VERIFY=all' 'go' 'test' '-v' '-json' '-short' '-timeout=3m' '--race=true' '--cpu=4' './...' '-p=2') {"Time":"2023-03-02T09:19:38.754838961Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1846 +0x99\n"} {"Time":"2023-03-02T09:19:38.754854961Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1446 +0x216\n"} {"Time":"2023-03-02T09:19:38.754893461Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1844 +0x7ec\n"} {"Time":"2023-03-02T09:19:38.754908961Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1726 +0xa84\n"} {"Time":"2023-03-02T09:19:38.754957861Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/pkg/v3/schedule.NewFIFOScheduler()\n"} {"Time":"2023-03-02T09:19:38.754961061Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/pkg/schedule/schedule.go:101 +0x3b6\n"} {"Time":"2023-03-02T09:19:38.754976161Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/server/v3/storage/mvcc.NewStore()\n"} {"Time":"2023-03-02T09:19:38.754979361Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/server/storage/mvcc/kvstore.go:111 +0x331\n"} {"Time":"2023-03-02T09:19:38.754991061Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" go.etcd.io/etcd/server/v3/storage/mvcc.TestHashByRevValue()\n"} {"Time":"2023-03-02T09:19:38.754994261Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /home/runner/work/etcd/etcd/server/storage/mvcc/hash_test.go:36 +0xa4\n"} {"Time":"2023-03-02T09:19:38.755010061Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1446 +0x216\n"} {"Time":"2023-03-02T09:19:38.755024461Z","Action":"output","Package":"go.etcd.io/etcd/server/v3/storage/mvcc","Test":"TestHashByRevValue","Output":" /opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1493 +0x47\n"} === Failed === FAIL: storage/mvcc (0.00s) === CONT testing.go:1319: race detected during execution of test FAIL FAIL go.etcd.io/etcd/server/v3/storage/mvcc 9.852s ``` After using the following command to reproduce it, we can get the error like: ```bash go test -v -p=2 --cpu=4 -count=1000 -failfast --race=true -short -timeout=30m ./ --- PASS: TestHashByRevValueLastRevision (0.12s) ================== WARNING: DATA RACE Read at 0x00c002024043 by goroutine 65745: testing.(*common).logDepth() /usr/lib/go-1.19/src/testing/testing.go:883 +0xc4 testing.(*common).log() /usr/lib/go-1.19/src/testing/testing.go:876 +0xa4 testing.(*common).Logf() /usr/lib/go-1.19/src/testing/testing.go:927 +0x6a testing.(*T).Logf() <autogenerated>:1 +0x75 go.uber.org/zap/zaptest.testingWriter.Write() /home/fuwei/go/pkg/mod/go.uber.org/zap@v1.24.0/zaptest/logger.go:130 +0x12c go.uber.org/zap/zaptest.(*testingWriter).Write() <autogenerated>:1 +0x7e go.uber.org/zap/zapcore.(*ioCore).Write() /home/fuwei/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/core.go:99 +0x199 go.uber.org/zap/zapcore.(*CheckedEntry).Write() /home/fuwei/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/entry.go:255 +0x2ce go.uber.org/zap.(*Logger).Info() /home/fuwei/go/pkg/mod/go.uber.org/zap@v1.24.0/logger.go:220 +0x6a go.etcd.io/etcd/server/v3/storage/mvcc.(*treeIndex).Compact() /home/fuwei/go/src/go.etcd.io/etcd/server/storage/mvcc/index.go:194 +0x144 go.etcd.io/etcd/server/v3/storage/mvcc.(*store).scheduleCompaction() /home/fuwei/go/src/go.etcd.io/etcd/server/storage/mvcc/kvstore_compaction.go:29 +0xbb go.etcd.io/etcd/server/v3/storage/mvcc.(*store).compact.func1() /home/fuwei/go/src/go.etcd.io/etcd/server/storage/mvcc/kvstore.go:235 +0x9e go.etcd.io/etcd/pkg/v3/schedule.job.Do() /home/fuwei/go/src/go.etcd.io/etcd/pkg/schedule/schedule.go:41 +0x70 go.etcd.io/etcd/pkg/v3/schedule.(*job).Do() <autogenerated>:1 +0x29 go.etcd.io/etcd/pkg/v3/schedule.(*fifo).executeJob() /home/fuwei/go/src/go.etcd.io/etcd/pkg/schedule/schedule.go:206 +0x101 go.etcd.io/etcd/pkg/v3/schedule.(*fifo).run() /home/fuwei/go/src/go.etcd.io/etcd/pkg/schedule/schedule.go:187 +0x1a5 go.etcd.io/etcd/pkg/v3/schedule.NewFIFOScheduler.func1() /home/fuwei/go/src/go.etcd.io/etcd/pkg/schedule/schedule.go:101 +0x39 Previous write at 0x00c002024043 by goroutine 65743: testing.tRunner.func1() /usr/lib/go-1.19/src/testing/testing.go:1433 +0x7e4 runtime.deferreturn() /usr/lib/go-1.19/src/runtime/panic.go:476 +0x32 testing.(*T).Run.func1() /usr/lib/go-1.19/src/testing/testing.go:1493 +0x47 Goroutine 65745 (running) created at: go.etcd.io/etcd/pkg/v3/schedule.NewFIFOScheduler() /home/fuwei/go/src/go.etcd.io/etcd/pkg/schedule/schedule.go:101 +0x3b6 go.etcd.io/etcd/server/v3/storage/mvcc.NewStore() /home/fuwei/go/src/go.etcd.io/etcd/server/storage/mvcc/kvstore.go:111 +0x331 go.etcd.io/etcd/server/v3/storage/mvcc.TestHashByRevValueLastRevision() /home/fuwei/go/src/go.etcd.io/etcd/server/storage/mvcc/hash_test.go:76 +0xa4 testing.tRunner() /usr/lib/go-1.19/src/testing/testing.go:1446 +0x216 testing.(*T).Run.func1() /usr/lib/go-1.19/src/testing/testing.go:1493 +0x47 Goroutine 65743 (running) created at: testing.(*T).Run() /usr/lib/go-1.19/src/testing/testing.go:1493 +0x75d testing.runTests.func1() /usr/lib/go-1.19/src/testing/testing.go:1846 +0x99 testing.tRunner() /usr/lib/go-1.19/src/testing/testing.go:1446 +0x216 testing.runTests() /usr/lib/go-1.19/src/testing/testing.go:1844 +0x7ec testing.(*M).Run() /usr/lib/go-1.19/src/testing/testing.go:1726 +0xa84 main.main() _testmain.go:265 +0x2e9 ================== ``` The schedule for compact is handled asynchronously and it might use `t.Logf` after go-test marks the case is done. And there is a comment from go-test: ```go // https://github.com/golang/go/blob/c69ff3a7d0c8bd2878662034c1cbce8613fa6f13/src/testing/testing.go#LL1580C3-L1582C16 // Do not lock t.done to allow race detector to detect race in case // the user does not appropriately synchronize a goroutine. t.done = true ``` We need to ensure that all the goroutines should be closed before case finish. REF: [1]: https://github.com/etcd-io/etcd/actions/runs/4312405975/jobs/7522924734 [2]: gotestyourself/gotestsum#310 Signed-off-by: Wei Fu <fuweid89@gmail.com>
The original testcase uses `return` statement which skips `restore` case. It's aimed to enable `restore` testcase. Signed-off-by: Wei Fu <fuweid89@gmail.com>
2461e3c
to
830d9e9
Compare
defer os.Remove(tmpPath) | ||
defer func() { | ||
b.Close() | ||
os.Remove(tmpPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the tmpPath
automatically cleaned up?
see https://pkg.go.dev/testing#T.TempDir
TempDir returns a temporary directory for the test to use. The directory is automatically removed
by Cleanup when the test and all its subtests complete. Each subsequent call to t.TempDir returns
a unique directory; if the directory creation fails, TempDir terminates the test by calling Fatal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Let me see how to change this. Update it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Please take a look. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for looking into flakes.
The tmp path is cleanup by go testing so that the `cleanup` doesn't need to call `os.Remove`. Signed-off-by: Wei Fu <fuweid89@gmail.com>
|
||
testKey := []byte("foo") | ||
testValue := []byte("bar") | ||
s.Put(testKey, testValue, lease.NoLease) | ||
|
||
w := s.NewWatchStream() | ||
w.Watch(0, testKey, nil, 0) | ||
defer w.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w.Watch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see line 45
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, Github diffs are so unreadable :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work.
Thank you @fuweid
The original flaky test shows in CI pipeline https://github.com/etcd-io/etcd/actions/runs/4312405975/jobs/7522924734, but gotestsum run into a golang issue gotestyourself/gotestsum#310. The error message is not clear from summary, like
After using the following command to reproduce it, we can get the error
like:
The schedule for compact is handled asynchronously and it might use
t.Logf
after go-test marks the case is done. And there is a commentfrom go-test:
We need to ensure that all the goroutines should be closed before case
finish.
The original testcase uses
return
statement which skipsrestore
case. It's aimed to enable
restore
testcase.@ahrtr @serathius