Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: shutdown schema change on two permanent errors #14025

Merged
merged 5 commits into from
Apr 11, 2017

Conversation

vivekmenezes
Copy link
Contributor

@vivekmenezes vivekmenezes commented Mar 9, 2017

sql: shutdown schema change on two permanent errors
fixes #13872
fixes #13646

In the original SQL code we would hit upon a SQL error
on the gateway node and publish pgError on the gateway
node. In distsql, an error can happen on a remote node
during SQL execution which can translate into creating
a pgError on a remote node. This change adds a protobuf
to pgError so that it can be sent along to the gateway
node.


This change is Reviewable

@tamird
Copy link
Contributor

tamird commented Mar 9, 2017

Can this go in along with the change that consumes it? In general, I'm hesitant about plumbing changes which go in without their consumer because that makes it difficult (impossible) to comment on the high-level direction.


Review status: 0 of 5 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


pkg/sql/pgwire/pgerror/errors.proto, line 17 at r1 (raw file):

// Author: Vivek Menezes (vivek@cockroachlabs.com)

syntax = "proto2";

everything here is non-nullable, perhaps use proto3? then you can remove all the options.


Comments from Reviewable

@andreimatei
Copy link
Contributor

Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/sql/pgwire/pgerror/errors.go, line 36 at r1 (raw file):

// pgError also implement the errors.causer interface, meaning that errors.Cause
// will work correctly with it.
type pgError struct {

why are we keeping both pgError and PGWireError? Why not just PGWireError?


pkg/sql/pgwire/pgerror/errors.go, line 37 at r1 (raw file):

// will work correctly with it.
type pgError struct {
	error

what does it mean for a struct to embed an interface?
Does it assert that pgError implements error? If so, let's just do that like everywhere else:

// pgError implements error
var _ error = &pgError{}

Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Yes i can do that. I'll send out an update


Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Reviewed 1 of 5 files at r1.
Review status: 1 of 5 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/sql/pgwire/pgerror/errors.go, line 36 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

why are we keeping both pgError and PGWireError? Why not just PGWireError?

I think the idea here is to continue to play well with github.com/pkg/errors (although I'm not sure why that part was removed from the comment). If that's not the case, then I'm not sure I see a reason to keep most of this.


pkg/sql/pgwire/pgerror/errors.go, line 37 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what does it mean for a struct to embed an interface?
Does it assert that pgError implements error? If so, let's just do that like everywhere else:

// pgError implements error
var _ error = &pgError{}

Kind of, but only because it follows the normal embedding rules. Embedding an interface is the same thing as adding an interface field to the struct and also promoting that interface's methods on the struct.


pkg/sql/pgwire/pgerror/errors.go, line 81 at r1 (raw file):

	}

	return &pgError{

I don't think this will work properly in all cases. Imagine a *pgError that has been wrapped by another error (using errors.Wrap for instance) and then passed here. We won't properly find the *pgError at the head of the "error cause list", so we'll append a new one.

The original idea was to append new information (code, detail, hint, etc.) to the "error cause list" on each wrap/withXYZ method call, just like errors.Wrap does. This is a rather Lispy approach that means that we never need to scan the error list when adding new information, and gives us the ability to overwrite (shadow) information for free. We can then iterate through this list using the Cause method on each error and pick out the relevant information. Because github.com/pkg/errors already structured wrapped errors in a similar cons-list style, this made the approach here work really well with that package.

Passing around a single PGWireError struct within a pgError struct that we try to mutate goes against this approach, which is why I'm starting to doubt that it makes sense to try to keep it playing well with github.com/pkg/errors. Perhaps an alternative to this is to keep the old pgError interface and map instances of it to and from the PGWireError proto only at network boundaries.


pkg/sql/pgwire/pgerror/errors_test.go, line 123 at r1 (raw file):

		{WithSourceContext(WithHint(WithDetail(WithPGCode(initErr, code), detail), hint), 0), true},
		{WithPGCode(WithDetail(WithHint(WithSourceContext(initErr, 0), hint), detail), code), true},
		{WithSourceContext(WithHint(errors.Wrap(WithDetail(WithPGCode(initErr, code), detail), wrap), hint), 0), true},

What's prompting the changes to these tests?


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Review status: 1 of 5 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/sql/pgwire/pgerror/errors.go, line 81 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think this will work properly in all cases. Imagine a *pgError that has been wrapped by another error (using errors.Wrap for instance) and then passed here. We won't properly find the *pgError at the head of the "error cause list", so we'll append a new one.

The original idea was to append new information (code, detail, hint, etc.) to the "error cause list" on each wrap/withXYZ method call, just like errors.Wrap does. This is a rather Lispy approach that means that we never need to scan the error list when adding new information, and gives us the ability to overwrite (shadow) information for free. We can then iterate through this list using the Cause method on each error and pick out the relevant information. Because github.com/pkg/errors already structured wrapped errors in a similar cons-list style, this made the approach here work really well with that package.

Passing around a single PGWireError struct within a pgError struct that we try to mutate goes against this approach, which is why I'm starting to doubt that it makes sense to try to keep it playing well with github.com/pkg/errors. Perhaps an alternative to this is to keep the old pgError interface and map instances of it to and from the PGWireError proto only at network boundaries.

When I first tried introducing this change I was very puzzled by what you had done and it took me sometime to figure out how this all worked. After I made my change and saw that very few tests broke, I figured that sticking with playing well with github.com/pkg/errors while nice is a bit of over-engineering. It makes a lot of sense if you need to stack a bunch of pgError on top of each other but that doesn't seem to be a requirement.

I'd prefer that we reduce the complexity of this as done in this change and introduce the compatibility with github.com/pkg/errors if we need it later.

This change needs more work. I'll be sending out an update later today


Comments from Reviewable

@vivekmenezes vivekmenezes changed the title pgerror: change pgError to contain a protobuf message sql: shutdown schema change on two permanent errors Mar 10, 2017
@vivekmenezes
Copy link
Contributor Author

This PR has a different title because it fixes two bugs. The change to pgerror were only needed to fix one of the bugs.
I've also changed the error type returned via distsql. While the PR is small, it does muck with protobufs and is worth more eyes. Thanks


Review status: 0 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.


Comments from Reviewable

@vivekmenezes vivekmenezes assigned tamird and andreimatei and unassigned maddyblue Mar 10, 2017
@nvanbenschoten
Copy link
Member

I'm still not sure I see where the changes to pgerror are used in the other two commits. Isn't the whole point that we can directly pull out the proto from the error?


Review status: 0 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/sql/pgwire/pgerror/errors.go, line 81 at r1 (raw file):

Previously, vivekmenezes wrote…

When I first tried introducing this change I was very puzzled by what you had done and it took me sometime to figure out how this all worked. After I made my change and saw that very few tests broke, I figured that sticking with playing well with github.com/pkg/errors while nice is a bit of over-engineering. It makes a lot of sense if you need to stack a bunch of pgError on top of each other but that doesn't seem to be a requirement.

I'd prefer that we reduce the complexity of this as done in this change and introduce the compatibility with github.com/pkg/errors if we need it later.

This change needs more work. I'll be sending out an update later today

We could maintain the full compatibility at a slight increase to runtime complexity if we either:

  • walk the entire error list in each wrap function and only add a new *pgError if one does not already exist
  • have a single WithPGError function that takes all params at once to disallow the wrapping issue
  • only use the proto when necessary and keep this file the same (although I do see a possible simplification to some of this that would allow us to get rid of the methods), somewhat like our roachpb.RetryableTxnError vs roachpb.Error distinction

I'd be very concerned with leaving this PR as is, because the addition of an intermediate errors.Wrap call between pgerror.With calls right now would subtly break things by shadowing information. This is exactly what the tests below (TestPGCode, TestDetail, etc.) were making sure did not happen, and why they started breaking after these changes.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Review status: 0 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/sql/pgwire/pgerror/errors.go, line 81 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We could maintain the full compatibility at a slight increase to runtime complexity if we either:

  • walk the entire error list in each wrap function and only add a new *pgError if one does not already exist
  • have a single WithPGError function that takes all params at once to disallow the wrapping issue
  • only use the proto when necessary and keep this file the same (although I do see a possible simplification to some of this that would allow us to get rid of the methods), somewhat like our roachpb.RetryableTxnError vs roachpb.Error distinction

I'd be very concerned with leaving this PR as is, because the addition of an intermediate errors.Wrap call between pgerror.With calls right now would subtly break things by shadowing information. This is exactly what the tests below (TestPGCode, TestDetail, etc.) were making sure did not happen, and why they started breaking after these changes.

Good point. Let me redo this change to preserve that compatibility. Sorry I didn't get back to this until now.


Comments from Reviewable

@vivekmenezes vivekmenezes force-pushed the pgwire branch 2 times, most recently from cdf3502 to 7b7f737 Compare March 17, 2017 14:00
@vivekmenezes
Copy link
Contributor Author

This change makes pgerror a lot simpler. It's now a Error protobuf that implements error


Review status: 0 of 16 files reviewed at latest revision, 5 unresolved discussions.


pkg/sql/pgwire/pgerror/errors.go, line 36 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think the idea here is to continue to play well with github.com/pkg/errors (although I'm not sure why that part was removed from the comment). If that's not the case, then I'm not sure I see a reason to keep most of this.

This now is a pgerror.Error that implements the error interface


pkg/sql/pgwire/pgerror/errors.proto, line 17 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

everything here is non-nullable, perhaps use proto3? then you can remove all the options.

Done.


pkg/sql/pgwire/pgerror/errors_test.go, line 123 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's prompting the changes to these tests?

fixed by completely overhauling the tests.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

ignore the change to ConsumerSignal thanks to
#14218

@vivekmenezes vivekmenezes force-pushed the pgwire branch 2 times, most recently from 956ed25 to 5693c13 Compare March 20, 2017 19:56
@tamird
Copy link
Contributor

tamird commented Mar 21, 2017

What's the status of this? Looks like CI failed.

@nvanbenschoten
Copy link
Member

Looks like it's ready for another round of reviews. I'll give it another look by the end of the day.

@vivekmenezes
Copy link
Contributor Author

Do take a look at it although it is failing once in a while so I still need to take a look at it later tonight.

@vivekmenezes
Copy link
Contributor Author

Hey this needs more work, I'm still on it. I had to rebase which messed it up and I need to figure out what to do.

@vivekmenezes vivekmenezes force-pushed the pgwire branch 2 times, most recently from 937d93b to 04bb74d Compare March 23, 2017 10:49
@vivekmenezes
Copy link
Contributor Author

Folks this is ready for review. Thanks!

@vivekmenezes vivekmenezes force-pushed the pgwire branch 2 times, most recently from 4edebab to 6408488 Compare March 31, 2017 00:47
@vivekmenezes
Copy link
Contributor Author

@andreimatei ready for another look.

@vivekmenezes
Copy link
Contributor Author

@andreimatei rebased and ready for review

@andreimatei
Copy link
Contributor

:lgtm:


Review status: 0 of 24 files reviewed at latest revision, 34 unresolved discussions, all commit checks successful.


pkg/sql/schema_changer_test.go, line 774 at r16 (raw file):

	// UNIQUE index we create on it fails.
	//
	// Pick a set of random rows because if we pick a deterministic set

consider using Radu's new TESTING_RELOCATE to control where rows are 😎
Or even better, control the distribution using a FakeSpanResolver.


pkg/sql/schema_changer_test.go, line 775 at r16 (raw file):

	//
	// Pick a set of random rows because if we pick a deterministic set
	// we can't be sure they will end up on a remote node. We want This

s/This/this


pkg/sql/schema_changer_test.go, line 807 at r16 (raw file):

	if _, err := sqlDB.Exec(`
CREATE UNIQUE INDEX vidx ON t.test (v);
`); !testutils.IsError(err, "duplicate key value (v)=(1) violates unique constraint") {

also check the pg error code. so hott.
If distsql.Error implemented error, I think we could also check the OriginNode and see that it's local or remote.


pkg/sql/schema_changer_test.go, line 811 at r16 (raw file):

	}

	if kvs, err := kvDB.Scan(ctx, tablePrefix, tableEnd, 0); err != nil {

I'm not completely sure what we're testing here and it seems a bit fragile. Can we instead test that the new index doesn't exist (with a SHOW)?


pkg/sql/schema_changer_test.go, line 408 at r17 (raw file):

}

func checkTableKeyCount(ctx context.Context, kvDB *client.DB, multiple int, maxValue int) error {

explain that this also includes indexes

As I was saying in the previous commit, perhaps there's better ways to test some of the things that this is used for.


pkg/sql/schema_changer_test.go, line 815 at r18 (raw file):

	if _, err := sqlDB.Exec(`
	   ALTER TABLE t.test ADD COLUMN p DECIMAL NOT NULL DEFAULT (DECIMAL '1-3');

I'm a bit confused about what we're doing here. I think we probably don't want to list every "permanent" schema change here, do we? I think the logic tests are a better place for it.
Perhaps a more useful test is checking that a pg.Error (any pg.Error) is marshalled correctly by DistSQL. This should be unrelated to schema changes. Run a query and inject some error on another node, and check that you get the correct pgcode on the gateway.


pkg/sql/distsqlrun/base.go, line 262 at r11 (raw file):

Previously, vivekmenezes wrote…

I tried changing it to a *Error and then the change started touching a lot of stuff so I gave pretty quickly :-)

leave a TODO please


pkg/sql/distsqlrun/base.go, line 631 at r11 (raw file):

Previously, vivekmenezes wrote…

I'm not sure what to do here. roachpb.Error doesn't implement error, so I've basically followed the same pattern here. Clearly the detail payload implements error. Is there a need for this one to also?

Well roachpb.Error doesn't implement error, as a judgement call, because the KV code at the time was convoluted and there was no clear separation of the layers that want error vs roachpb.Error, so removing that interface forced a separation. The same consideration doesn't necessarily apply here.
Regardless, we don't like roachpb.Error.GoError(), do we? It's abnormal to need to strip the structured character of an error and convert it into mush. That defeats the point of not implementing the error interface to begin with.

You need it here because you haven't gone all the way with this refactoring and didn't convert ProducerMetadata.Err to distsql.Error. If you leave it like this, add a TODO here saying that GoError() needs to go away once it's not needed any more.


pkg/sql/distsqlrun/base.go, line 619 at r14 (raw file):

// NewError creates an Error from an error.
func NewError(err error) *Error {

this should take the NodeId for OriginNode, no?


pkg/sql/distsqlrun/base.go, line 636 at r14 (raw file):

		return t.PGError
	}
	return nil

add a default: panic and remove the return nil


pkg/sql/distsqlrun/data.proto, line 36 at r11 (raw file):

Previously, vivekmenezes wrote…

I'm going to not implement this in this PR. It is not related.

as we discussed, it is related. Add a TODO please.


pkg/sql/distsqlrun/data.proto, line 42 at r11 (raw file):

Previously, vivekmenezes wrote…

Error seems okay since it is confined to the distsqlrun package.

I understand that the package qualifies it, but my preference would still be for the type names themselves to have more information (same for pgError.Error).


pkg/sql/distsqlrun/data.proto, line 37 at r14 (raw file):

  option (gogoproto.goproto_stringer) = false;

  // Node at which the error was generated (zero if does not apply).

"zero if does not apply" - how could it not apply?


pkg/sql/distsqlrun/outbox_test.go, line 178 at r14 (raw file):

	for i, m := range metas {
		expectedStr := fmt.Sprintf("meta %d", i)
		if m.Err.Error() != expectedStr {

can we test the OriginNode field, to replace the check for RemoteDistSQLProducerError?


pkg/sql/distsqlrun/server.go, line 181 at r14 (raw file):

	if err != nil {
		if pgErr, ok := pgerror.GetPGCause(err); ok {
			return &SimpleResponse{Error: &Error{Detail: &Error_PGError{PGError: pgErr}}}, nil

use NewError() and also set OriginNode?

Using NewError also saves the reader from wondering if ScheduleFlow and setupFlow can return pg.Error, which would sound surprising to me.


pkg/sql/distsqlrun/server.go, line 183 at r14 (raw file):

			return &SimpleResponse{Error: &Error{Detail: &Error_PGError{PGError: pgErr}}}, nil
		}
		return nil, err

don't we want to turn all errors into pg.Error with the "internal error" code?


pkg/sql/pgwire/pgerror/errors.go, line 26 at r9 (raw file):

Previously, vivekmenezes wrote…

We use Error() because it implements the error interface

I meant the proto message should be called PGError, not the method.


Comments from Reviewable

@vivekmenezes vivekmenezes force-pushed the pgwire branch 2 times, most recently from d3ce1e2 to 3843838 Compare April 10, 2017 20:02
@vivekmenezes
Copy link
Contributor Author

I tried adding a distsql test for this PR but unfortunately it is hanging. I'm keeping the schema change test in this PR as proof that it works until we can get #14770 working


Review status: 0 of 26 files reviewed at latest revision, 34 unresolved discussions, some commit checks pending.


pkg/sql/schema_changer_test.go, line 775 at r16 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/This/this

Done.


pkg/sql/schema_changer_test.go, line 811 at r16 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm not completely sure what we're testing here and it seems a bit fragile. Can we instead test that the new index doesn't exist (with a SHOW)?

We're testing that the number of keys remains the same.


pkg/sql/schema_changer_test.go, line 815 at r18 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm a bit confused about what we're doing here. I think we probably don't want to list every "permanent" schema change here, do we? I think the logic tests are a better place for it.
Perhaps a more useful test is checking that a pg.Error (any pg.Error) is marshalled correctly by DistSQL. This should be unrelated to schema changes. Run a query and inject some error on another node, and check that you get the correct pgcode on the gateway.

So I'm leaving this test in here because the distsql test I've created in a separate PR is not working. I'll update this PR with a reference to that PR.


pkg/sql/distsqlrun/base.go, line 262 at r11 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

leave a TODO please

Done.


pkg/sql/distsqlrun/base.go, line 631 at r11 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Well roachpb.Error doesn't implement error, as a judgement call, because the KV code at the time was convoluted and there was no clear separation of the layers that want error vs roachpb.Error, so removing that interface forced a separation. The same consideration doesn't necessarily apply here.
Regardless, we don't like roachpb.Error.GoError(), do we? It's abnormal to need to strip the structured character of an error and convert it into mush. That defeats the point of not implementing the error interface to begin with.

You need it here because you haven't gone all the way with this refactoring and didn't convert ProducerMetadata.Err to distsql.Error. If you leave it like this, add a TODO here saying that GoError() needs to go away once it's not needed any more.

I've rename GoError() to ErrorDetail() . For now I think that's sufficient.


pkg/sql/distsqlrun/base.go, line 619 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this should take the NodeId for OriginNode, no?

We decided to not add Origin Node id and other characteristics like processor id in this PR.


pkg/sql/distsqlrun/base.go, line 636 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

add a default: panic and remove the return nil

Done.


pkg/sql/distsqlrun/data.proto, line 36 at r11 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

as we discussed, it is related. Add a TODO please.

Done.


pkg/sql/distsqlrun/data.proto, line 42 at r11 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I understand that the package qualifies it, but my preference would still be for the type names themselves to have more information (same for pgError.Error).

I'm going to ignore this suggestion. Thanks


pkg/sql/distsqlrun/data.proto, line 37 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

"zero if does not apply" - how could it not apply?

removed. that doesn't make any sense


pkg/sql/distsqlrun/server.go, line 181 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

use NewError() and also set OriginNode?

Using NewError also saves the reader from wondering if ScheduleFlow and setupFlow can return pg.Error, which would sound surprising to me.

Done.


pkg/sql/distsqlrun/server.go, line 183 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

don't we want to turn all errors into pg.Error with the "internal error" code?

Done.


pkg/sql/pgwire/pgerror/errors.go, line 26 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I meant the proto message should be called PGError, not the method.

I don't think that's worth it because externally it will look like pgerror.PGError


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

This PR has been lingering around for a while. I'd appreciate if we can merge it before I leave. Thanks!

@andreimatei
Copy link
Contributor

:lgtm:


Review status: 0 of 26 files reviewed at latest revision, 21 unresolved discussions, some commit checks pending.


pkg/sql/distsqlrun/base.go, line 619 at r14 (raw file):

Previously, vivekmenezes wrote…

We decided to not add Origin Node id and other characteristics like processor id in this PR.

but then remove origin_node and authoritative from the proto


pkg/sql/distsqlrun/base.go, line 622 at r23 (raw file):

// NewError creates an Error from an error.
func NewError(err error) *Error {
	pgErr, ok := pgerror.GetPGCause(err)

is this correct? If we wrapped a pgwire.Error in something that's not a pgwire.Error, do we want to ignore the top-level error and use the cause?
I think we shouldn't deal with causes at all. And the same for the other uses of GetPGCause.
Unless there was a particular reason why you did it this way.


pkg/sql/distsqlrun/data.proto, line 52 at r11 (raw file):

Previously, vivekmenezes wrote…

Done.

let's remove this field if we're not using it.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Review status: 0 of 26 files reviewed at latest revision, 21 unresolved discussions, some commit checks failed.


pkg/sql/distsqlrun/base.go, line 619 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

but then remove origin_node and authoritative from the proto

Done.


pkg/sql/distsqlrun/base.go, line 622 at r23 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

is this correct? If we wrapped a pgwire.Error in something that's not a pgwire.Error, do we want to ignore the top-level error and use the cause?
I think we shouldn't deal with causes at all. And the same for the other uses of GetPGCause.
Unless there was a particular reason why you did it this way.

i think if a pgwire.Error is published on a remote node and its wrapped, it should be unwrapped before being sent over the wire, because on the other end
we do want to know which pgwire.Error was the original error. In other words the distsql logic only acts as a transportation mechanism for the error and all
the error wrappers introduced by distsql can be ignored.


pkg/sql/distsqlrun/data.proto, line 52 at r11 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

let's remove this field if we're not using it.

Done.


Comments from Reviewable

@andreimatei
Copy link
Contributor

Review status: 0 of 26 files reviewed at latest revision, 21 unresolved discussions, all commit checks successful.


pkg/sql/distsqlrun/base.go, line 622 at r23 (raw file):

Previously, vivekmenezes wrote…

i think if a pgwire.Error is published on a remote node and its wrapped, it should be unwrapped before being sent over the wire, because on the other end
we do want to know which pgwire.Error was the original error. In other words the distsql logic only acts as a transportation mechanism for the error and all
the error wrappers introduced by distsql can be ignored.

If Distsql (or anybody else) wraps the error, then it's probably doing it on purpose to lose the structured character of the cause. E.g. if we get a pgerror as part of executing some internal sql - getting a lease or writing a progress report - and we wrap that, we don't want this code to report that code for the original query.

Generally no code in distsql will wrap pgerros encountered during the execution of the user's query, will it?


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Review status: 0 of 26 files reviewed at latest revision, 21 unresolved discussions, all commit checks successful.


pkg/sql/distsqlrun/base.go, line 622 at r23 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

If Distsql (or anybody else) wraps the error, then it's probably doing it on purpose to lose the structured character of the cause. E.g. if we get a pgerror as part of executing some internal sql - getting a lease or writing a progress report - and we wrap that, we don't want this code to report that code for the original query.

Generally no code in distsql will wrap pgerros encountered during the execution of the user's query, will it?

I don't know which way to go here. While you do have a point about internal sql creating a pgerror, although I don't believe it does that. isn't pgerror by definition an "external" error that some point in the code wants to report to the outside world? In the particular case of schema changes, the backfill mechanism wants to report the pgerror to the coordinator which reports it back to the user. If distsql starts wrapping this error and we go with what you're suggesting, the schema change mechanism will start seeing the distsql error and that's not what it wants. I think the rationale here is that an "external" error is being reported and that's what should go across the wire.


Comments from Reviewable

In the original SQL code when we hit upon a SQL error
on the gateway node we would create a pgError on the gateway
node. In distsql, an error can happen on a remote node
during SQL execution which can translate into creating
a pgError on a remote node. This change changes pgError
to a protobuf so that it can be sent along to the gateway
node.
Add a pgerror.Error as an ErrorDetail to Error.

fixes cockroachdb#13646
test all the schema change backfill errors that can
occur on a remote node. Ensure that they stop the
schema change from moving forward and the database
is return to its original state.
The test breaks when the backfill runs using distsql because
each distsql node separately computes the default value to use
its for backfill. Converted to a constant to make the test
predictable.
@vivekmenezes vivekmenezes merged commit 81b478f into cockroachdb:master Apr 11, 2017
@vivekmenezes vivekmenezes deleted the pgwire branch April 11, 2017 01:51
@vivekmenezes
Copy link
Contributor Author

@andreimatei I'm happy to continue the discussion and incorporate in another PR. Thanks for the review

@andreimatei
Copy link
Contributor

Review status: 0 of 26 files reviewed at latest revision, 21 unresolved discussions, all commit checks successful.


pkg/sql/distsqlrun/base.go, line 622 at r23 (raw file):

Previously, vivekmenezes wrote…

I don't know which way to go here. While you do have a point about internal sql creating a pgerror, although I don't believe it does that. isn't pgerror by definition an "external" error that some point in the code wants to report to the outside world? In the particular case of schema changes, the backfill mechanism wants to report the pgerror to the coordinator which reports it back to the user. If distsql starts wrapping this error and we go with what you're suggesting, the schema change mechanism will start seeing the distsql error and that's not what it wants. I think the rationale here is that an "external" error is being reported and that's what should go across the wire.

This is the way I see it:
Every error is something that the code that creates it wants to report to the "outside world". Whether that outside world is the sql client is not generally something that the code creating the error can decide; instead every level along the way decides what it reports to its parent.
I don't think we should (and I don't think we do) wrap errors willy-nilly and expect the caller to be oblivious to the wrapping; when you wrap, you're aware that you're obscuring the cause (except for the cause's text message). So DistSQL shouldn't wrap anything (and it doesn't, right?).


Comments from Reviewable

@andreimatei
Copy link
Contributor

Review status: 0 of 26 files reviewed at latest revision, 21 unresolved discussions, all commit checks successful.


pkg/sql/distsqlrun/base.go, line 622 at r23 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

This is the way I see it:
Every error is something that the code that creates it wants to report to the "outside world". Whether that outside world is the sql client is not generally something that the code creating the error can decide; instead every level along the way decides what it reports to its parent.
I don't think we should (and I don't think we do) wrap errors willy-nilly and expect the caller to be oblivious to the wrapping; when you wrap, you're aware that you're obscuring the cause (except for the cause's text message). So DistSQL shouldn't wrap anything (and it doesn't, right?).

Another thing specific to pgerrors: for errors pg codes, we commonly care about the message of the error too, not just the code - cause some client drivers don't give you access to the code so people may need to look at the message. And this can also be broken by wrapping.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Review status: 0 of 26 files reviewed at latest revision, 21 unresolved discussions, all commit checks successful.


pkg/sql/distsqlrun/base.go, line 622 at r23 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Another thing specific to pgerrors: for errors pg codes, we commonly care about the message of the error too, not just the code - cause some client drivers don't give you access to the code so people may need to look at the message. And this can also be broken by wrapping.

I'm very cool with removing the need to unwrap this error, in particular we have tests at the application layer (schema changes) that will detect any bad wrapping. I'll make this change when I return.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants