-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add opentelemetry support #2152
Conversation
github.com/hashicorp/go-immutable-radix => github.com/tonistiigi/go-immutable-radix v0.0.0-20170803185627-826af9ccf0fe | ||
github.com/jaguilar/vt100 => github.com/tonistiigi/vt100 v0.0.0-20190402012908-ad4c4a574305 | ||
// genproto: corresponds to containerd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required. If this really breaks containerd worker I'm willing to drop support for it until containerd gets updated. Too many things are blocked on this and this is a too high-priority area to be permanently stuck on a 2-year old library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI seem green, so this is fine, I guess
Nice! Thanks @tonistiigi ! |
needs rebase |
51f7114
to
c27062d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the issues you've created upstream as a result of this implementation experience. I've left some comments after a quick review, but overall this looks like a simple and clean conversion from OpenTracing, which is really good to see. Do note that we're about to release v1.0.0-RC1
which will contain some changes that will impact this implementation. If you're able to join our SIG meetings on Thursdays we'd love to hear from you about your experience using the API/SDK.
func (wt *withTracer) Tracer(instrumentationName string, opts ...trace.TracerOption) trace.Tracer { | ||
return wt.tracer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having a concrete Tracer
here, I'd suggest having a TracerProvider
as this is dropping information coming from the instrumentation that should be recorded on spans created by the returned Tracer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I used Tracer
was because there was no way to get access to the current traceprovider from the context.
@@ -64,7 +64,7 @@ func ResolveClient(c *cli.Context) (*client.Client, error) { | |||
|
|||
ctx := CommandContext(c) | |||
|
|||
if span := opentracing.SpanFromContext(ctx); span != nil { | |||
if span := trace.SpanFromContext(ctx); span != nil { | |||
opts = append(opts, client.WithTracer(span.Tracer())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The span.Tracer()
method has been removed and will not be present starting with v1.0.0-RC1
. It is also less safe with OpenTelemetry to re-use a Tracer
from an incoming Span
as Tracer
can contain information related to different instrumentation. Instead, any instrumentation that needs a Tracer
should accept a TracerProvider
and use that to construct a Tracer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The span.Tracer() method has been removed and will not be present starting with
That is very surprising to me. Having access to the current tracer to create child spans inside the caller's span has been a key feature to me, as it was in opentracing to add flexible instrumentation to the packages. With this removed, I don't see a logical way how tracing would be implemented in a project like BuildKit. I do not want to use global tracer. It is a code smell and makes it so that the packages make assumptions of how tracing is organized in the final binary. If project depending on Buildkit wants to set up a global tracer, everything will still work. BuildKit's packages should be reusable on their own; in many cases, different components of it are included in other tools. The packages do not make assumptions that they only work as singletons or that you can't have two instances of an object using different tracers. I also don't want every constructor/pkg to contain code on how to initialize the tracer and carry that information. That would be a big maintenance cost with no additional value. Sometimes in BuildKit a context goes through 10+ layers of components.
Letting functions to ignore tracing if none is configured and add extra contextualized child spans if the caller code set up a tracer/parent-span seems a very logical way to manage this. Even if a project is very complicated, with different components and tracing points, the root context usually starts from a very specific location(eg. gprc client).
I looked at the issues open-telemetry/opentelemetry-go#1892 open-telemetry/opentelemetry-go#1887 and to me, if this behavior is not expected, it is an issue with the delegation of the global tracer. I saw some delegation in the global package and thought this was what it was for already. Above else, I think it shows how bad pattern using a global tracer is for synchronizing individual components.
Why would you even want to take parent span from context but use a different tracer. Doesn't it automatically mean that your data is messed up and you can't get a proper report back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you even want to take parent span from context but use a different tracer. Doesn't it automatically mean that your data is messed up and you can't get a proper report back?
The Tracer
carries information regarding the instrumenting library. This information includes the name of the instrumenting library and optionally its version and the schema identifier for attributes produced by the instrumentation. In this sense it is different from the OpenTracing tracer which, to my understanding, carried the export pipeline configuration that in OpenTelemetry is carried by the TracerProvider
.
With the Tracer
carrying this additional information it is not safe to take a Tracer
from a parent span and use it directly. It may identify a different instrumenting library and provide incorrect information regarding the attribute schema.
It seems that what you're looking for is the ability to extract a TracerProvider
from a Span
, from which a Tracer
scoped to the currently-instrumenting library could be created. We'd love to talk more about these use cases at our next SIG meeting, though I expect providing that ability would need to go through the OTEP process before being included in a new specification version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a different implementation of Span/Tracer
than the sdk/trace
? It is confusing to me why you are ok with these different implementations to share same spancontext and automatically link to it while they contain the traceid/spanid of something that you claim could be part of the wrong trace. If you want these implementations to use same spancontext key but provide a way to possibly detect conflicts between implementations, add a validation method for it. Also, isn't this the whole point of using these interfaces that I do not target a specific implementation, and my trace-points will always work, whatever tracer is configured by the caller. If my code only works with a specific implementation of tracer I would need to validate it anyway.
To have reusable packages, I need a way for my library to just define the tracing points. The library should not care how the actual tracing infrastructure is configured, if two instances of the same object use the same tracer or not etc. This is all up to the caller. Also, every constructor/function should not take and pass tracer objects around. They can of course choose to optionally take them if they want to provide override points, but the default path should just work. With this change, you are not providing a way to achieve that(without me defining my own wrapper that I can continue to pass with context, meaning packages will be incompatible with non-buildkit callers).
It seems that what you're looking for is the ability to extract a TracerProvider from a Span, from which a Tracer scoped to the currently-instrumenting library could be created
That might technically work, but if it means a traceid per spanid it makes no logical sense. I just need a way to pass tracing context through my components. You do it half-way by providing a spancontext but force me to use a global variable for the other half of what StartSpan
needs. That's not an option if you spent a lot of time designing your components so that they would not use the global state and have such limitations. Another thing I don't get is that getting access to TraceProvider
is actually dangerous. Eg. my library should not have a possibility to add span processors or shut it down. My library should only have the capability to create additional spans in the context that caller provides and that is exactly the only thing enabled by the Tracer
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have reusable packages,
An example of a package that I claim is quite useless atm. https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http#NewTransport . This is supposed to be a super reusable package, for different services, implementations of roundtrip etc. It takes the Tracer(Provider) on NewTransport
that is the context that has nothing to do with tracing. This is a background pool with its own lifecycle, the init code is nowhere near the place users would make requests, somewhere there is a cleanup routine to clean up old connections etc. None of it has anything to do with request tracing. The actual tracing configuration comes in with the request to RoundTrip
and is passed to the next RoundTrip
implementation. This works perfectly fine for maintaining the relationship of the spans that are managed by context passed to RoundTrip
. But because it doesn't read the tracer itself from context and that is taken with a side-channel there is no actual other way to use this component than to use a global tracer, meaning this component is not reusable at all and spans passed to RoundTrip
might be from a completely different tracer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you available tomorrow for a conversation with @MrAlias and I? I fear that we may be talking past each other or not operating from consistent assumptions and understandings and it may be faster to resolve that with synchronous communication. It is very important to me to understand your concerns and whether addressing them may require changes to the public API we are looking to stabilize in the near term.
if span != nil { | ||
ext.Error.Set(span, true) | ||
span.SetStatus(codes.Error, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
span
will always be non-nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except in here it will be nil if the Before
callback never gets called.
Thanks for the pointers. The I have some more interesting things in #2163 The issues I hit:
#2163 also has some packages for automatically detecting current trace environment from env and passing it. I think these packages are quite generic and would hope most projects would do something similar. But I didn't find any existing ones. I'd be happy to contribute them into some other project if there is interest.
I'm ok to join SIG to discuss any of this further. |
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
This can be done as a separate change when needed. Also should analyze if this would affect the gogo incompatibility issues with newer proto. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Upstream fixes needed for cleaner solution Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@@ -22,7 +22,7 @@ func jaegerExporter() (sdktrace.SpanExporter, error) { | |||
|
|||
endpoint := envOr("OTEL_EXPORTER_JAEGER_ENDPOINT", "http://localhost:14250") | |||
host := envOr("OTEL_EXPORTER_JAEGER_HOST", "localhost") | |||
port := envOr("OTEL_EXPORTER_JAEGER_PORT", "6832") | |||
port := envOr("OTEL_EXPORTER_JAEGER_PORT", "6831") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. iirc one is for compress and the other uncompress. I've tested that 6831 works by default.
Opentelemetry has superseded opentracing. The API is quite similar but additional collectors endpoints should allow forwarding traces from client or container in the future.
JAEGER_TRACE
env is supported like before. As well as the environment variables defined in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md . OTEL collector is also supported both via grpc and HTTP protos.Initial commit only added opentelemetry exporters and used the opentracing bridge for tracing. https://pkg.go.dev/go.opentelemetry.io/otel/bridge/opentracing . This didn't work for cross-process grpc though as the bridge only supports HTTP propagators. So I switched all tracers directly to opentelemetry and removed opentracing(+helper) dependencies completely. In some cases, the helper methods I had previously written are actually very similar to methods opentelemetry provides. Cross-process tracing will not work when one side uses the older opentracing API.
I have a list of PRs that I want to propose upstream so that some parts of this can be cleaner. Functionality-wise, this PR should be complete.
As before, using a global tracer or propagator is a forbidden pattern in buildkit codebase.