Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in wait_for_unlock_notify after SQLITE_BUSY #6

Closed
FiloSottile opened this issue Apr 9, 2018 · 8 comments
Closed

Deadlock in wait_for_unlock_notify after SQLITE_BUSY #6

FiloSottile opened this issue Apr 9, 2018 · 8 comments

Comments

@FiloSottile
Copy link
Contributor

I have a program accessing a db with a Pool, and during its operation I lock the database from another process. A couple SQLITE_BUSY errors are returned. Shortly after (but not immediately, like after 3 queries), the library deadlocks in wait_for_unlock_notify. It's very reproducible.

This is from a SIGQUIT:

goroutine 20 [syscall, 2 minutes]:
github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite._Cfunc_wait_for_unlock_notify(0x5406710, 0x5406690, 0x0)
	_cgo_gotypes.go:1440 +0x6b
github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite.(*Stmt).Step.func2(0x5406710, 0x5406690, 0x9)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite/sqlite.go:492 +0x6e
github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite.(*Stmt).Step(0xc42034d270, 0x2, 0xef87, 0xef87)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite/sqlite.go:492 +0x205
github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite/sqliteutil.exec(0xc42034d270, 0x0, 0xc42029d260, 0x2, 0x2, 0x0, 0x1)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite/sqliteutil/exec.go:127 +0x396
github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite/sqliteutil.Exec(0xc4200a4460, 0x4655fb6, 0x72, 0x0, 0xc42029d260, 0x2, 0x2, 0x0, 0x0)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/vendor/crawshaw.io/sqlite/sqliteutil/exec.go:77 +0xe7
github.com/FiloSottile/mostly-harmless/covfefe.(*Covfefe).execSQL.func1(0xc4200a4460, 0x464a967, 0x19)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/db.go:17 +0xa3
github.com/FiloSottile/mostly-harmless/covfefe.Run.func1(0xc420289680, 0x0, 0x0)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/covfefe.go:56 +0x1e3
github.com/FiloSottile/mostly-harmless/covfefe.(*Covfefe).execSQL(0xc420087980, 0x4655fb6, 0x72, 0xc42029d260, 0x2, 0x2, 0xcd2ad7b0084c344a, 0xc420024500)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/db.go:16 +0xf8
github.com/FiloSottile/mostly-harmless/covfefe.(*Covfefe).insertMessage(0xc420087980, 0xc4201be030, 0x0, 0x1)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/db.go:71 +0x613
github.com/FiloSottile/mostly-harmless/covfefe.(*Covfefe).Handle(0xc420087980, 0xc4201be030)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/process.go:154 +0x51d
github.com/FiloSottile/mostly-harmless/covfefe.(*Covfefe).HandleChan(0xc420087980, 0xc420074180)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/process.go:122 +0x57
github.com/FiloSottile/mostly-harmless/covfefe.Run.func2(0xc420087980, 0xc420074180)
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/covfefe.go:74 +0x43
created by github.com/FiloSottile/mostly-harmless/covfefe.Run
	/Users/filippo/src/github.com/FiloSottile/mostly-harmless/covfefe/covfefe.go:73 +0x561

(I think) I am using Pool in a fairly straightforward way, and most of the work is done by a single main goroutine. The code is at https://github.com/FiloSottile/mostly-harmless/tree/4a6ff25/covfefe

@FiloSottile
Copy link
Contributor Author

Here's the sequence of events: a few SQLITE_BUSY, then a few SQLITE_DONE, then a SQLITE_LOCKED that deadlocks on pthread_cond_wait. All this happens on the same goroutine, but with a new Pool.Get for each query (as you can see from the PRAGMA query bound to Get).

2018/04/08 23:23:28 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:23:28 sqlite3_step(INSERT INTO Users (id, handle, name, bio, first_seen) VALUES (?, ?, ?, ?, ?);) = SQLITE_DONE(not an error)
2018/04/08 23:23:28 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:23:28 sqlite3_step(INSERT INTO Tweets (id, created, user, message) VALUES (?, ?, ?, ?)) = SQLITE_CONSTRAINT
2018/04/08 23:23:51 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:23:51 sqlite3_step(INSERT INTO Messages (json, account) VALUES (?, json_array(?))) = SQLITE_BUSY
2018/04/08 23:23:53 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:23:53 sqlite3_step(INSERT INTO Messages (json, account) VALUES (?, json_array(?))) = SQLITE_BUSY
2018/04/08 23:23:55 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:23:55 sqlite3_step(INSERT INTO Messages (json, account) VALUES (?, json_array(?))) = SQLITE_BUSY
2018/04/08 23:24:59 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:24:59 sqlite3_step(INSERT INTO Messages (json, account) VALUES (?, json_array(?))) = SQLITE_DONE(not an error)
2018/04/08 23:24:59 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:24:59 sqlite3_step(INSERT INTO Tweets (id, created, user, message) VALUES (?, ?, ?, ?)) = SQLITE_DONE(not an error)
2018/04/08 23:24:59 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:24:59 sqlite3_step(INSERT INTO Users (id, handle, name, bio, first_seen) VALUES (?, ?, ?, ?, ?);) = SQLITE_DONE(not an error)
2018/04/08 23:24:59 sqlite3_step(PRAGMA foreign_keys = ON;) = SQLITE_DONE(not an error)
2018/04/08 23:24:59 sqlite3_step(UPDATE Messages SET account = json_insert(account, '$[' || json_array_length(account) || ']', ?) WHERE id = ?;) = SQLITE_LOCKED
sqlite3_unlock_notify
pthread_mutex_lock
pthread_cond_wait

I can't reproduce it with a pool size of 1, everything keeps working fine after the SQLITE_BUSY.

FiloSottile added a commit to FiloSottile/mostly-harmless that referenced this issue Apr 13, 2018
@crawshaw
Copy link
Owner

Ouch, thanks. I'll take a look at this tomorrow.

One possibility is that I'm being too imprecise on error matching. Right now the code uses wait_for_unlock_notify if it sees SQLITE_LOCKED. There may be some condition where competing with another process returns that error instead of SQLITE_BUSY, in which case we go into a wait that we don't escape from.

That would be easy to fix, look for the extended error code SQLITE_LOCKED_SHAREDCACHE. But I'd like to reproduce it before checking that in.

@FiloSottile
Copy link
Contributor Author

It might be relevant that this is a WAL database.

@crawshaw
Copy link
Owner

I had some trouble replicating this when I tried last week (but forgot to update the issue). What OS are you using?

@FiloSottile
Copy link
Contributor Author

FiloSottile commented Apr 19, 2018 via email

@ksshannon
Copy link
Contributor

I ran into a similar issue on Ubuntu 16.04, with both wal and delete journal modes.

@fasterthanlime
Copy link
Contributor

fasterthanlime commented Jun 20, 2018

Saw this too with a pool of size 2, also having a hard time coming up with a reproducible test case, but I'll attempt to post one as soon as I figure out the conditions.

edit: here's my stacktrace:

goroutine 25851 [syscall, 1 minutes]:
github.com/itchio/butler/vendor/crawshaw.io/sqlite._Cfunc_wait_for_unlock_notify(0x743a808, 0x7439780, 0x0)
	_cgo_gotypes.go:1458 +0x54
github.com/itchio/butler/vendor/crawshaw.io/sqlite.(*Stmt).Step.func2(0x743a808, 0x7439780, 0x9)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/crawshaw.io/sqlite/sqlite.go:510 +0x96
github.com/itchio/butler/vendor/crawshaw.io/sqlite.(*Stmt).Step(0xc042a0d040, 0x180, 0xc042cef0f7, 0x7)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/crawshaw.io/sqlite/sqlite.go:510 +0x140
github.com/itchio/butler/vendor/github.com/itchio/hades/sqliteutil2.exec(0xc042a0d040, 0x0, 0xc042cd6000, 0x180, 0x180, 0x0, 0x10000c042e6b300)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/github.com/itchio/hades/sqliteutil2/exec.go:131 +0x3d8
github.com/itchio/butler/vendor/github.com/itchio/hades/sqliteutil2.Exec(0xc0420c8550, 0xc042973500, 0x327, 0x0, 0xc042cd6000, 0x180, 0x180, 0xb7a260, 0xc042bb8b20)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/github.com/itchio/hades/sqliteutil2/exec.go:77 +0xc1
github.com/itchio/butler/vendor/github.com/itchio/hades.(*Context).ExecRaw(0xc0420c85f0, 0xc0420c8550, 0xc042973500, 0x327, 0x0, 0xc042cd6000, 0x180, 0x180, 0x1, 0xc0421a2e80)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/github.com/itchio/hades/exec.go:41 +0xc2
github.com/itchio/butler/vendor/github.com/itchio/hades.(*Context).Exec(0xc0420c85f0, 0xc0420c8550, 0xc0421a2e80, 0x0, 0xc042e6b2e0, 0xd812a0)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/github.com/itchio/hades/exec.go:19 +0xcb
github.com/itchio/butler/vendor/github.com/itchio/hades.(*Context).deletePagedByPK(0xc0420c85f0, 0xc0420c8550, 0xc04245a3b0, 0xd, 0xc04245a3d8, 0x2, 0xc042cd4000, 0x180, 0x200, 0xd81260, ...)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/github.com/itchio/hades/paged.go:63 +0x1f9
github.com/itchio/butler/vendor/github.com/itchio/hades.(*Context).SaveNoTransaction.func3(0xbd0340, 0xc04215c5a0, 0x16, 0xc0428fcaf0, 0xb64ca0, 0xc04215c618, 0x197, 0xc0428fcb40, 0xbc3501, 0xd87d40, ...)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/github.com/itchio/hades/save.go:235 +0x1037
github.com/itchio/butler/vendor/github.com/itchio/hades.(*Context).SaveNoTransaction.func2(0xb798e0, 0xc042e420c0, 0x82, 0x0, 0xbd0340, 0xc04215c5a0, 0x16, 0xc0428fcaf0, 0x413400, 0xc042e4f548, ...)
	C:/msys64/home/amos/Dev/go/src/github.com/itchio/butler/vendor/github.com/itchio/hades/save.go:140 +0x98a
github.com/itchio/butler/vendor/github.com/itchio/hades.(*Context).SaveNoTransaction.func3(0xb798e0, 0xc042e420c0, 0x82, 0x0, 0xbd0340, 0xc04215c5a0, 0x16, 0xc0428fcaf0, 0x0, 0x0, ...)

(snip)

This is with a pool of size 2, default options. Every other query (on the other conn presumably?) returns SQLITE_LOCKED directly.

fasterthanlime added a commit to fasterthanlime/sqlite that referenced this issue Jun 30, 2018
fasterthanlime added a commit to fasterthanlime/sqlite that referenced this issue Jun 30, 2018
@fasterthanlime
Copy link
Contributor

Merging #25 closed this issue, but @FiloSottile may want to check if the original issue cannot still be reproduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants