The website uses cookies. By using this site, you agree to our use of cookies as described in the Privacy Policy.
I Agree
blank_error__heading
blank_error__body
Text direction?

proposal: add tracing support #3057

Open
fuweid opened this issue on Feb 28, 2019 · 17 comments
Open

proposal: add tracing support #3057

fuweid opened this issue on Feb 28, 2019 · 17 comments
Assignees
Milestone

Comments

Copy link
Member

fuweid commented on Feb 28, 2019
edited

Proposal

There are many component systems here, such as content, snapshotter, container and tasks. And the containerd package provides smart client with powerful functionality, something like Backend for frontend. For instance, the Client.Pull is simple function call, but it already combines several calls to different components in there.

If we can add the tracing in containerd smart client side client and daemon, it will be easy to figure out the whole request call chain. When we run into some issues, the tracing data can help us to locate the problem quickly.

Document

In order to make it easy to discuss, I make a google doc for this proposal.

https://docs.google.com/document/d/12l0dEa9W8OSZdUYUNZJZ7LengDlJvO5J2ZLategEdl4/edit?usp=sharing

And welcome to comment for this.

Copy link
Contributor

I agree tracing would be helpful.
I am concerned about integrating with jaeger on the client. How data is exported should be left up to the consumer of the client.

It would be tremendously helpful to also propagate tracing over GRPC to the daemon.

Copy link
Member Author

Hi @cpuguy83, thanks for comment.

I am concerned about integrating with jaeger on the client. How data is exported should be left up to the consumer of the client.

Yes, absolutely. I agree. For client, I think it is always up to users.
But for the ctr command, in order to make it easier to use, we can create a option to use jaeger tracer. How do you think?

It would be tremendously helpful to also propagate tracing over GRPC to the daemon.

I was thinking that the containerd components don't call each other. It seems that the client is good enough. However, it is good to add that into daemon.

Copy link
Contributor

How do you think?
💯

I was thinking that the containerd components don't call each other.

Where this becomes useful is so a client can receive tracing data propagated from the daemon.

Copy link
Contributor

@kevpar - FYI.

We have been looking into this for distributed tracing as well because the runtime v2 shim calls happen out of proc and it would be amazing to flow activity id's across the gRPC boundaries for logs correlation.

Copy link
Member

Random-Liu commented on Mar 1, 2019
edited

/cc @dashpole

Who is working on tracing in Kubernetes, and also experimented tracing down to containerd.

I had an intern last fall who did a PoC of something similar. His containerd code changes are here: master...Monkeyanator:opencensus-tracing.

We ended up using OpenCensus instead of OpenTracing just because then we didn't need pick a tracing backend in open-source components, such as containerd. Then you can run the "OC Agent" to handle pushing traces to a tracing backend, and can ship the components independently.

Since containerd uses a stripped-down version of gRPC for it's shims, we explicitly added the trace context to the API, rather than try and add context propagation back in. But if we actually added it, we should do that properly. Other than that, adding tracing to containerd was pretty simple.

If you are curious how I think this should work in kubernetes, feel free to check out my proposal: kubernetes/enhancements#650.

pod_creation_trace

We used tracing to help us debug a slow start runc issue our autoscaling team sent us, so I definitely think this has value. Here is the trace from the reproduction in just containerd:

screenshot from 2018-11-20 16-47-50

Copy link
Contributor

I've spent some time specifically thinking about how to propagate traces over ttrpc.
The GRPC way (through the metadata mechanism) is not great since it necessarily requires allocating, often unnecessary, objects on every request.

Copy link
Member Author

Thanks for @dashpole and @cpuguy83 input.

I will update the proposal which includes the shim part and involved project. :)

Sounds good to me

Copy link
Contributor

It seems the industry is moving toward opencensus. @kevpar - Did some investigations on our side and thought it was the likely candidate for us to move to as well. The one thing that we dislike about it is that it seems to only export begin/end at end. Which means that if you have a panic trace you loose all data because there was no entry logging. opentracing however did show the entry trace down to the point of panic.

Copy link
Contributor

opencensus does have the momentum. However, the logging support in opentracing is better, and the API is much more flexible. opencensus is working on a logging API, but they seem to be leaning towards unstructured logging, which is unfortunate.

Copy link
Contributor

FYI, I created a rough tracing package here: https://github.com/virtual-kubelet/virtual-kubelet/blob/master/trace/trace.go

In VK we use opencensus and logrus... have interfaces for trace and logging... but these are primarily for the purposes of supporting logging through a span and not so much about providing multiple backing implementations of the tracing... but here is how it works:

ctx, span := trace.StartSpan(ctx)
log.G(ctx).WithField("foo", "bar").Debug("whatever")

In this case, the trace.StartSpan has injected it's own logger which is wrapping the logger already configured in ctx.
This will output a structured message both to the span and logrus (which is what we configure as the logger).

Copy link
Contributor

All this to say, I don't think we should be worried about what opencensus does wrt logging or not... use what's most convenient/has the most community support.

Copy link
Contributor

+1 doc looks good.

OC is very flexible and very nice that you can write exporters or even just use an agent exporter and not have to pull in a ton of dependencies in.

Copy link
Contributor

+1 for the doc. Thanks @fuweid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Assignees

kevpar

Labels
None yet
Projects
None yet
Milestone
1.5
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
Measure
Measure
Related Notes
Get a free MyMarkup account to save this article and view it later on any device.
Create account

End User License Agreement

Summary | 0 Annotation