-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: try re-execute statement if backend responds with "prepared statement is no longer valid" #451
fix: try re-execute statement if backend responds with "prepared statement is no longer valid" #451
Conversation
2e9ce80
to
d6e0540
Compare
Thanks @vlsi for submitting this PR. I have reported the original issue on Stack Overflow as a means to document the current behaviour. When googling, I've found a few incidents of this type, and none of them was resolved or understood. This all has to do with the JDBC driver defaulting to a "prepare threshold" of 5, which is why the sixth DDL statement seems to cause cursors to expire. I believe that this kind of issue shouldn't be dealt with in the client application, specifically because the client closes each prepared statement, it is no longer the responsibility of the client to handle these stale resources. |
d6e0540
to
dc845a7
Compare
dc845a7
to
2609822
Compare
fbedb9f
to
5d43712
Compare
+1 It's very useful when you have live database migrations and multiple nodes connecting to the same database that cannot be all upgraded/restarted at the same time. To be able to use prepared statements and avoid downtime is an important feature. I'm not sure that parsing the textual error message is the best way to do it, there aren't specific error codes identifying such errors? |
We are heavily affected by this issue. We try to solve it on connection pool level (Hikari) by closing poisoned connection but fix it on driver level seems to be more proper way. |
@vladimirfx Can you please clarify which error are you going into? "Prepared statement does not exist" or "cached statement cannot change result type"? |
2609822
to
569f8b8
Compare
0403b5b
to
b6e901d
Compare
Current coverage is 61.87% (diff: 83.97%)@@ master #451 diff @@
==========================================
Files 148 150 +2
Lines 15075 15151 +76
Methods 0 0
Messages 0 0
Branches 3023 3045 +22
==========================================
+ Hits 9291 9374 +83
+ Misses 4537 4525 -12
- Partials 1247 1252 +5
|
If no objections, I will merge this in |
We hit by both but most of time "cached statement cannot change result type" . My coment is about first patch where only error text is analyzed. Current impl seems more robust. |
Can you clarify what kind of setup you are using? I'm inclined to log warnings in case the above conditions are met. |
Postgresql 9.3.13 , locale ru_RU.UTF-8 , pgjdbc 9.4.1208 , JDK 1.8 . |
Ok, cool. Some more work is required (~ #477 to support |
c013109
to
e259ce2
Compare
That is interesting. Do you alter column types? Do you just add/remove columns?
Type cache is a side issue. Please file a case for that. I'm not sure if it is pgjdbc issue, however, let's not discuss it here.
That's a fair point. It might be something to discuss with backend developers though. |
Generally, I would avoid altering a column type - complicated rename / column changes would be done over a series of steps (add / migrate / drop). You can see (part of) the upgrade script that was changing the app_user table are listed in this Stack Overflow answer: https://stackoverflow.com/a/48498194/924597 The SQL statement JOOQ is issuing that was causing the PG error is:
Some columns elided, but you get the idea - wilcard column specifier definitely not involved. I've hit this error message a few times before, but couldn't reliably reproduce until stumbling across this issue and starting to get a clue what was going on.
Yeah, for sure. Not that important anyway, just a minor devopment-time annoyance.
Where does that kind of issue get raised? |
Others have solved the zero downtime deployments but it isn't easy. Mostly
by using schema's and search paths.
Dave Cramer
…On 31 January 2018 at 04:56, Shorn ***@***.***> wrote:
@vlsi <https://github.com/vlsi>
Regarding PR: Sure I'll give it a shot - but I've not contributed to
pgjdbc before so it'll take me a while to learn my way around.
Regarding pgjdbc "traces": That's good to know. My current main usecase is
an AWS RDS postgres instance whose schema gets updated by a Flyway process
while the app-server(s) is running (aiming for zero/minimal downtime with
rapid schema evolution).
I think conservative will work well for me because I use JOOQ and all
metadata stuff is done at compile-time.
I wonder if the ResultSetMetaData traces might be why IntelliJ IDEA seems
to not update it's view of my schema, even when I disconnect/reconnect
using the GUI? I'll try putting "always" in the URL to see if it helps.
------------------------------
so you'd better avoid mixing alters and selects.
I am disappointed to hear that. How do we achieve that other than by
having downtime?
If this is true - isn't it really saying that Postgres shouldn't be used
when aiming for "zero downtime" deployments?
It seems a shame to put all that effort into transactional DDL and then
turn around and tell people not to use it.
I wonder if this is something Amazon have already tackled with Aurora
Postgres. Or even their unreleased "serverless" Aurora thing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#451 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAYz9gOj3cGCdpp4qHarPcekeDsV4G4Xks5tQDjUgaJpZM4GyMqd>
.
|
Performance appears to be unacceptable for us. We are also trying to make schema changes to a live, medium throughout system and running into issues every solution we look. We ran basic load tests with autosave=conservative. The average query time remains pretty much the same. However the worst queries get approx 15x worse. We are running at about 100-130requests per second through our application. It sounds like another solution is to Discard All after our schema change. I worry that we will be dropping users in the time between making the schema change and running discard... |
What are the aprroximate numbers here? Are you talking about an increase of 2ms to 30ms? Or an increase of 2sec to 30sec? |
Yea so here are our numbers, I realise how arbitrary they are without the code we used to generate them. Without autosave conservative: With autosave conservative: In this one the worst case was 9.5x worse. The graphs of these over time looked similar, but the autosave one was bumpy. Ie at some points perf was impacted a lot, but then otherwise it was quite similar. These weren’t out production soak/stress tests, nor were they run for an extended period, but they were enough to turn us off this solution to the cached query error. |
Holy cow ... 15 seconds added to worst case... o_O |
The more details you provide, the more odds are we could help. Do you actually make schema changes during the test? If you do, then regular mode should result in an error. Do you know which SQL is impacted? Is it something related with a single SQL having multiple execution plans? |
We aren’t running schema changes during the test. I’ll try to come up with some code to reproduce the issue that we can share. |
When I first asked on this thread (as 'laglace') for troubleshooting related to the autosave feature, I did also notice a significant bump in performance. The application in which I was working involved a lot of legacy code and several anti-patterns: one of those involved having huge transactions (we're talking about something ranging from twenty minutes to 1 hour, in the best-case scenario), with several thousands of different queries done throughout the process. PostgreSQL managed to handle the first 5 minutes of the transaction without an issue, but performance would degrade rapidly after that mark, to the point where simple SELECT queries would start to take 30 seconds. As the transaction continued, these values would increase linearly and after 10 hours, the each single query to the database would be taking about 15 minutes. With autosave disabled, performance was MUCH smoother, but would involve refactoring large chunks of code, so I never did get my hands around that, as the management was not too pleased with the idea of delays. As I don't work for that company any longer, I can't provide any queries nor EXPLAIN's, but I think this post pretty much covers my findings related to this issue. |
well there's no question that there will be an performance impact for this feature. The whole reason this feature exists is to try to mimic the behaviour of other databases. PostgreSQL does not normally allow partial commits to persist, others do. It does this by inserting a savepoint in between statements. Clearly this will have performance ramifications. I don't see any way to fix this. |
Yea I understand and agree. We will just have to find a different way to overcome this error. Maybe we’re missing something, just seems this should be an easy problem to solve to allow non disruptive schema changes to live tables. |
this is how Zalando did it
https://github.com/zalando-stups/java-sproc-wrapper
Dave Cramer
…On 6 March 2018 at 06:10, Matthew James ***@***.***> wrote:
Yea I understand and agree.
We will just have to find a different way to overcome this error. Maybe
we’re missing something, just seems this should be an easy problem to solve
to allow non disruptive schema changes to live tables.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#451 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAYz9qFqNZxvmgEa0Zm2gJCAnfx11Zkqks5tbm6fgaJpZM4GyMqd>
.
|
Thanks for the tip! |
@orionrenegade , do you know which sql fails? |
I tested “select explicit_column_list” which does fail even when only adding a column to the table. |
On 6 March 2018 at 07:54, Matthew James ***@***.***> wrote:
I tested “select explicit_column_list” which does fail even when only
adding a column to the table.
Hmm that's cool, it should survive the reparsing?
Dave Cramer
|
I wonder if that can be treated at backend side. I just thought |
On 6 March 2018 at 07:59, Vladimir Sitnikov ***@***.***> wrote:
I tested “select explicit_column_list” which does fail even when only
adding a column to the table.
I wonder if that can be treated at backend side. I just thought
explicit_list should be OK.
You would think, but recall that a schema change does change pg_attribute
so depends on what they cache.
Dave Cramer
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#451 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAYz9gxjOvJSuyXaDuqWthSbrRGsR6SLks5tbogtgaJpZM4GyMqd>
.
|
"cached plan must not change result type" does compare tuple descriptor, so I wonder what is the error message for the case when a column is added. |
@mj1618 can you get the error message from the backend ? |
Yes this is the exact error we get. |
Like I said, I don't have access to the queries any more, but both cases were problematic. (In fact, any query would be problematic under those stressing circumstances, even to tables with fewer than 100 entries.) |
@mj1618 , I've just did
and it works with no problems with PostgreSQL 9.6 That is there should be something else besides "a mere column addition" to trigger the issue. |
Ok I see your point. We were using Flyway to do the schema change, I wonder if it is executing something to cause the issue. |
After starting the I will try to write a small PoC to demonstrate. |
No you are right, my apologies, only Here is what I used:
With And worked fine. Changing to I must have got it wrong - our primary sql obfuscation tool Hibernate must be doing something different than we assumed. |
Bug report: http://stackoverflow.com/questions/34180932/error-cached-plan-must-not-change-result-type-when-mixing-ddl-with-select-via
When using "deallocate all" or "alter table", cached prepared statements might become invalid, thus we need to reexecute query in those cases
It might require some logging, however I'm not sure what is the best way to log that.
Summary (technical details):
DEALLOCATE ALL
, andDISCARD ALL
statements, and invalidates prepared statement cache if those are detected.That does not mean "deallocate all" should be used often. It just means deallocate all might be suitable for maintenance-like procedures.
Note:
deallocate all
invalidates only a single connection.prepared statement XXX does not exist
orcached statement must not change result type
observed, pgjdbc would invalidate prepared statement cache (seedeallocateEpoch++;
). In general those errors should not happen, however if observed it means something had changed behind the scenes.prepared statement XXX does not exist
andcached statement must not change result type
errors inautocommit=true
(non-transactional) mode.autosave
conservative
-- rollback to savepoint only in case of "prepared statement does not exist" and"cached plan must not change result type". Then the driver would re-execute the statement ant it would pass through
never
(default) -- never set automatic safepoint. Note: in this mode statements might still fail with "cached plan must not change result type"in autoCommit=FALSE mode
always
-- always rollback to "before statement execution" state in case of failure. This mode prevents "current transaction aborted" errors.It is similar to psql's ON_ERROR_ROLLBACK.
For
autocommit=true
the savepoint is not used.In
conservative
mode, savepoint is used only for queries that return something (seeorg.postgresql.core.v3.QueryExecutorImpl#sendAutomaticSavepoint
)