Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding edges endpoint to public API #2793

Merged
merged 6 commits into from
May 9, 2019
Merged

Conversation

scottcarol
Copy link
Contributor

@scottcarol scottcarol commented May 3, 2019

This PR is the API portion of #2708.

It adds an endpoint to the public API to allow us to query Prometheus for edge data, in order to display identity information for connections between Linkerd proxies.

This PR ONLY includes changes to the controller and protobuf.

The corresponding CLI code for a linkerd edges query is written, but I want feedback on this portion before making the second PR.

Notes

  • Currently a request only returns data for two-sided edges for pods and deployments.
  • A request does not return data for one-sided edges, i.e. health checks or Prometheus/tap queries. This will come after two-sided edges are correctly displayed.
  • While currently, a request returns some data for other resource types supported by linkerd stat (for example, authority) it's unclear if the current results are complete/useful, I need feedback on this.
  • This PR does not include a test, when I get feedback/agreement on what's returned, I'll add that.

Prometheus querying

For each edge, we are returning:
src name - resource name
dst name - resource name
client_id - service account name for source
server_id - service account name for destination
no_tls_reason - reason for no identity, if none known

To get this info from Prometheus we are combining information from two different queries, an inbound and outbound query.
We use label_replace in Prometheus to create a key that we can match inbound to outbound queries with, so we can populate the whole edge result.

Remember that we are currently filtering out edges that are one-sided (i.e. health probes, Prometheus, tap) and also edges where we cannot definitively match the inbound/outbound queries together. That will be the next step.

To Test

  1. Update protobuf:
    If you make Protobuf changes, run:
bin/dep ensure
bin/protoc-go.sh
  1. Run the controller locally:
bin/go-run controller/cmd/public-api -log-level debug

You should see starting HTTP server on :8085

  1. Make a temporary main.go file that you can test API calls with, in the /linkerd2 directory (suggested code at bottom of this PR)

  2. In a separate terminal window, run

bin/go-run .

See what data is returned.

  1. If you are requesting edges between pods in booksapp, you can compare to the result from:
linkerd stat po --from deploy -n booksapp
linkerd stat po --to deploy -n booksapp

(Note, some of those results may be one-sided (i.e. a probe) so not all will appear)

=============

Suggested main.go file contents, you can switch out different Namespace and Type values:

package main

import (
	"context"
	"fmt"

	"github.com/linkerd/linkerd2/controller/api/public"
	pb "github.com/linkerd/linkerd2/controller/gen/public"
)

func main() {
	client, err := public.NewInternalClient("linkerd", "localhost:8085")
	if err != nil {
		panic(err)
	}

	rsp, err := client.Edges(context.Background(), &pb.StatSummaryRequest{
		Selector: &pb.ResourceSelection{
			Resource: &pb.Resource{
				Namespace: "booksapp",
				Type:      "pod",
			},
		},
		TimeWindow: "1m",
	})
	if err != nil {
		panic(err)
	}

	fmt.Printf("%+v\n", rsp)
}

@scottcarol scottcarol added the priority/P0 Release Blocker label May 3, 2019
@scottcarol scottcarol requested review from klingerf, siggy and rmars May 3, 2019 23:26
@scottcarol scottcarol self-assigned this May 3, 2019
@l5d-bot
Copy link
Collaborator

l5d-bot commented May 4, 2019

Integration test results for 90dc181: success 🎉
Log output: https://gist.github.com/acb3dffb71fb90c30856c38125bccdfe

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! The test instructions and main.go are super useful, thanks!

I hit a snag because the local public-api expects to connect prometheus, which requires a port-forward to linkerd-prometheus in k8s. Instead I just connected the main.go you provided directly to a public-api running in k8s:

$ kubectl -n linkerd port-forward svc/linkerd-controller-api 8085
$ bin/go-run .
github.com/linkerd/linkerd2
ok:<edges:<src:<type:"pod" name:"web-57b7f9db85-nq8hb" > dst:<type:"pod" name:"voting-689f845d98-lbh2x" > client_id:"web" server_id:"voting" > edges:<src:<type:"pod" name:"vote-bot-7466ffc7f7-prf8g" > dst:<type:"pod" name:"web-57b7f9db85-nq8hb" > client_id:"default" server_id:"web" > edges:<src:<type:"pod" name:"web-57b7f9db85-nq8hb" > dst:<type:"pod" name:"emoji-646ddcc5f9-dfv5t" > client_id:"web" server_id:"emoji" > >

proto/public.proto Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dadjeibaah dadjeibaah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I left a few comments about some unused code but overall this looks good. I am looking forward to one-sided edge stats.

controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
Copy link

@rmars rmars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Agree with the comments others have posted.

I had some questions about the use of label_replace. Just to confirm my understanding is correct, are we using label_replace so that for example for pod queries, pod becomes labelled destination_pod in the inbound query, and dst_pod becomes destination_pod in the outbound query? And then in processEdgeMetrics we're using that destination_pod key for the matching?

Could one also accomplish this matching by doing the queries without label_replace, and then matching them up by querying the labels directly in processEdgeMetrics? it seems like in that case label_replace isn't saving us that much code. or maybe this replacement is doing more that I'm not considering?

@@ -15,8 +14,8 @@ type edgeResult struct {
 }

 const (
-       inboundIdentityQuery  = "label_replace(sum(increase(response_total%s[1m])) by (%s,client_id), \"destination_%s\", \"$1\", \"%s\", \"(.+)\")"
-       outboundIdentityQuery = "label_replace(sum(increase(response_total%s[1m])) by (%s, dst_%s, server_id, no_tls_reason), \"destination_%s\", \"$1\", \"dst_%s\", \"(.+)\")"
+       inboundIdentityQuery  = "sum(increase(response_total%s[1m])) by (%s,client_id)"
+       outboundIdentityQuery = "sum(increase(response_total%s[1m])) by (%s, dst_%s, server_id, no_tls_reason)"
 )

 func (s *grpcServer) Edges(ctx context.Context, req *pb.StatSummaryRequest) (*pb.EdgesResponse, error) {
@@ -55,8 +54,8 @@ func (s *grpcServer) getEdges(ctx context.Context, req *pb.StatSummaryRequest, t
        labelsOutbound := labels.Merge(promDirectionLabels("outbound"))
        labelsInbound := labels.Merge(promDirectionLabels("inbound"))

-       inboundQuery := fmt.Sprintf(inboundIdentityQuery, labelsInbound, resourceType, resourceType, resourceType)
-       outboundQuery := fmt.Sprintf(outboundIdentityQuery, labelsOutbound, resourceType, resourceType, resourceType, resourceType)
+       inboundQuery := fmt.Sprintf(inboundIdentityQuery, labelsInbound, resourceType)
+       outboundQuery := fmt.Sprintf(outboundIdentityQuery, labelsOutbound, resourceType, resourceType)

        inboundResult, err := s.queryProm(ctx, inboundQuery)
        if err != nil {
@@ -77,7 +76,8 @@ func processEdgeMetrics(req *pb.StatSummaryRequest, inbound, outbound model.Vect
        dstIndex := map[model.LabelValue]model.Metric{}
        srcIndex := map[model.LabelValue][]model.Metric{}
        keys := map[model.LabelValue]struct{}{}
-       resourceLabelReplacement := "destination_" + resourceType
+       resourceReplacementInbound := resourceType
+       resourceReplacementOutbound := "dst_" + resourceType

        formatMsg := map[string]string{
                "disabled":                          "Disabled",
@@ -93,7 +93,7 @@ func processEdgeMetrics(req *pb.StatSummaryRequest, inbound, outbound model.Vect
                // the communication was one-sided (i.e. a probe or another instance where the src/dst are not both known)
                // in future the edges command will support one-sided edges
                if _, ok := sample.Metric[model.LabelName("client_id")]; ok {
-                       key := sample.Metric[model.LabelName(resourceLabelReplacement)]
+                       key := sample.Metric[model.LabelName(resourceReplacementInbound)]
                        dstIndex[key] = sample.Metric
                        keys[key] = struct{}{}
                }
@@ -102,7 +102,7 @@ func processEdgeMetrics(req *pb.StatSummaryRequest, inbound, outbound model.Vect
        for _, sample := range outbound {
                // skip any outbound results that do not have a server_id for same reason as above section
                if _, ok := sample.Metric[model.LabelName("server_id")]; ok {
-                       key := sample.Metric[model.LabelName(resourceLabelReplacement)]
+                       key := sample.Metric[model.LabelName(resourceReplacementOutbound)]
                        if _, ok := srcIndex[key]; !ok {
                                srcIndex[key] = []model.Metric{}
                        }

controller/api/public/edges.go Outdated Show resolved Hide resolved
proto/public.proto Outdated Show resolved Hide resolved
@l5d-bot
Copy link
Collaborator

l5d-bot commented May 7, 2019

Integration test results for 9aaf2ce: success 🎉
Log output: https://gist.github.com/afaa29a22c09fa685c0be24d61ef6cf3

@scottcarol
Copy link
Contributor Author

Thanks @siggy @dadjeibaah @rmars for awesome feedback! I have incorporated your changes except for the outstanding questions (I left those threads unresolved on this page)

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. a couple more comments. 👍

also, works!

$ bin/go-run .
ok:<edges:<src:<type:"pod" name:"linkerd-web-85877d47f7-wb2ts" > dst:<type:"pod" name:"linkerd-controller-64dd44c9d5-fgxk8" > client_id:"linkerd-web" server_id:"linkerd-controller" > edges:<src:<type:"pod" name:"linkerd-controller-64dd44c9d5-fgxk8" > dst:<type:"pod" name:"linkerd-prometheus-58b9c8f7bd-4wlwh" > client_id:"linkerd-controller" server_id:"linkerd-prometheus" > >

controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Show resolved Hide resolved
Copy link
Contributor

@klingerf klingerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I like this implementation a lot, and only had a few minor quibbles. It would be great to add an edges_test.go file eventually, but doesn't have to happen as part of this PR.

proto/public.proto Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
controller/api/public/edges.go Outdated Show resolved Hide resolved
@l5d-bot
Copy link
Collaborator

l5d-bot commented May 8, 2019

Integration test results for 6dda24c: success 🎉
Log output: https://gist.github.com/9b117f49f1b1dc7e5255896719de9e4f

@l5d-bot
Copy link
Collaborator

l5d-bot commented May 8, 2019

Integration test results for 6b73c05: fail 😕
Log output: https://gist.github.com/6467f93eda07065f0f4174183c2a932f

Copy link
Contributor

@klingerf klingerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐️ Great, thanks for updating! Looks good to me, but would be great to have @siggy / @rmars @dadjeibaah give it another look.

Copy link

@rmars rmars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🐑 niceeee!

@scottcarol
Copy link
Contributor Author

Thanks @klingerf @rmars for the 🚢 !

The integration tests are failing, but they are passing for me locally when I run
bin/test-run /path/to/linkerd
@siggy would this be because my branch doesn't have the latest from master?

@dadjeibaah
Copy link
Contributor

@scottcarol what errors are you getting, also, currently reviewing your PR.

@scottcarol
Copy link
Contributor Author

@dadjeibaah the integration tests are failing when they are run by l5d-bot but not when they are run on my local machine. The last line in the gist from l5d-bot is tag does not exist: gcr.io/linkerd-io/grafana:git-6b73c058

@dadjeibaah
Copy link
Contributor

Ahh, that seems unexpected. I would rerun the tests I think the build had a minor hiccup or something.

@scottcarol scottcarol force-pushed the cscott/edges-api-only branch from 6b73c05 to 6b2b576 Compare May 8, 2019 23:31
Copy link
Contributor

@dadjeibaah dadjeibaah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for updating! ⭐️ 📦

@l5d-bot
Copy link
Collaborator

l5d-bot commented May 8, 2019

Integration test results for 6b2b576: success 🎉
Log output: https://gist.github.com/a81ce974a48a0aa40ca44c525ba28168

@scottcarol
Copy link
Contributor Author

Using git commit --amend --no-edit and then force-pushing the branch triggered another run of l5d-bot and the integration tests passed. Thanks @dadjeibaah for the help with this.

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work! agree with @klingerf re: adding edges_test.go at some point. 👍 🚢

return edge, nil
}

func processEdgeMetrics(inbound, outbound model.Vector, resourceType string) []*pb.Edge {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is super nice, i wonder if we could leverage it for stats down the road. 👍

@scottcarol scottcarol merged commit 87e69bf into master May 9, 2019
@scottcarol scottcarol deleted the cscott/edges-api-only branch May 9, 2019 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P0 Release Blocker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants