-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
public-api/viz split #5560
public-api/viz split #5560
Conversation
05e07cd
to
ec908e4
Compare
7f6cc2f
to
3a7a13b
Compare
af186b1
to
2ac0956
Compare
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510). - Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients. - The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface. - ~~Moved to viz the `linkerd-data-plane` checks because they rely on Prometheus, but they're currently commented-out pending the wiring up of `linkerd viz check --proxy`.~~ Refactored the data plane checks so they don't rely on calling `ListPods` - ~~The core healthcheck unit tests dealing with the viz api have been removed and will be refactored later into a new set of viz healthcheck unit tests.~~ - The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck. - ~~The `getNamespaceOfExtensions()` function has been refactored into `viz.pkg.GetVizNamespace()` as is consumed from a few other places as well.~~ - `multicluster/cmd/check.go` also now relies on viz' api because it hits the `Gateways()` endpoint that depends on Prometheus. **In a followup, we should move that api into some MC component (the gateway itself?) so multicluster can be installed without viz** - Ditto for `multicluster/cmd/gateways.go`
- Removed the `global.prometheusUrl` config in the core values.yml. - Removed `linkerd-controller` dependencies on Prometheus. - Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).
We continue removing the linkerd-controller dependencies on viz stuff, except for things required for the tap server, which can be tackled separately afterwards. - Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server). - Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type. - Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.). - Also simplified some type names to avoid stuttering.
Here we complete the work started in #5554 ("Chart templates for new viz linkerd-metrics-api pod") by providing the Dockerfile, `main.go` command and the `bin/` scripts for building the new `linkerd-metrics-api` pod under the `linkerd-viz` namespace. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits. Finally, updated tests related to linkerd-metrics-api.
`linkerd-web` requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container. We're also updating some dependencies that still were pointing to the public-api and now should point to the viz api.
e5a199e
to
f8371c4
Compare
Changes to command files under `cli/cmd`: - Updated `endpoints.go` according to new API interface name. - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically. Changes to command files under `viz/cmd`: - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz. Other changes to have tests pass: - Added `metrics-api` to list of docker images to build in actions workflows. - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`). - Update Helm readme files.
9ec1ad6
to
8fe5e64
Compare
8fe5e64
to
f379c1e
Compare
…essary 'RawClient()'
d26bd2d
to
0e1d6d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have some more reviewing testing to do, but leaving first round of comments.
controller/api/public/grpc_server.go
Outdated
} | ||
|
||
func (s *grpcServer) SelfCheck(ctx context.Context, in *healthcheckPb.SelfCheckRequest) (*healthcheckPb.SelfCheckResponse, error) { | ||
// TODO: Reenable this check just for checking the control plane can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make this work without taking a dependency on linkerd-viz, we'd need to introduce a new API or move SelfCheck to a shared location. To be honest, I'm not sure if it's even worth it. Especially since this process doesn't even need to talk to the k8s API. Perhaps a better check would be to look for errors in the destination or identity controller logs or kubernetes event streams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think we can get rid of this TODO
I'm getting a panic when trying to use viz commands before installing the viz extension:
|
- Fixed panic when issuing viz commands when the extension is not installed. - Removed unnecessary wrapping functions `NewInternalPublicClient` and `NewExternalPublicClient`. - Created new constant in `viz/cmd/root` holding "linkerd-viz". - Moved `LinkerdControlPlaneVersionChecks` in `healthcheck.go` to its original location in that same file. - Removed commented out `SelfCheck` in the public-api's `grpc_server.go`
Most of the feedback has been addressed (still need to see if I can get rid of the tap functions in the public-api, and to have
|
...that have been superseded by the tap APIServer for a while.
Rolled back `multicluster/cmd/check.go` changes so that we keep on using the public-api client, instead of viz'. If the viz extension is not detected then the "all gateway mirrors are healthy" check will just issue a warning as show below. Otherwise, a viz client is instantiated and `Gateways()` is called on it. ``` $ linkerdf cli mc check kubernetes-api -------------- √ can initialize the client √ can query the Kubernetes API linkerd-existence ----------------- √ 'linkerd-config' config map exists √ heartbeat ServiceAccount exist √ control plane replica sets are ready √ no unschedulable pods √ controller pod is running √ can initialize the client √ can query the control plane API linkerd-multicluster -------------------- √ Link CRD exists √ Link resources are valid * target √ remote cluster access credentials are valid * target √ clusters share trust anchors * target √ service mirror controller has required permissions * target √ service mirror controllers are running * target × all gateway mirrors are healthy failed to fetch gateway metrics: could not find the linkerd-viz extension see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints √ all mirror services have endpoints √ all mirror services are part of a Link Status check results are × ```
In the last push I rolled back
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry there are a few merge conflicts after the tap-injector merge!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 🏁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the list of images in bin/docker-push
needs to be updated.
…he log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
In the last push, $ bin/go-run cli mc check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
* target
√ remote cluster access credentials are valid
* target
√ clusters share trust anchors
* target
√ service mirror controller has required permissions
* target
√ service mirror controllers are running
* target
√ all mirror services have endpoints
√ all mirror services are part of a Link
Status check results are √
$ bin/go-run cli mc check --verbose
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
DEBU[0000] Starting port forward to https://0.0.0.0:43271/api/v1/namespaces/linkerd/pods/linkerd-controller-85b69dcc46-84pdb/portforward?timeout=30s 34587:8085
DEBU[0000] Port forward initialised
DEBU[0000] Expecting API to be served over [http://localhost:34587/api/v1/]
√ can initialize the client
DEBU[0000] Making gRPC-over-HTTP call to [http://localhost:34587/api/v1/Version] []
DEBU[0000] Response from [http://localhost:34587/api/v1/Version] had headers: map[Content-Length:[56] Content-Type:[application/octet-stream] Date:[Thu, 21 Jan 2021 22:38:10 GMT]]
DEBU[0000] gRPC-over-HTTP call returned status [200 OK] and content length [56]
√ can query the control plane API
linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
* target
√ remote cluster access credentials are valid
* target
√ clusters share trust anchors
* target
√ service mirror controller has required permissions
* target
√ service mirror controllers are running
* target
DEBU[0000] Skipping check: all gateway mirrors are healthy. Reason: failed to fetch gateway metrics
√ all mirror services have endpoints
√ all mirror services are part of a Link
Status check results are √ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌮🌮🌮
…ings into a new viz component 'linkerd-metrics-api' (linkerd#5560) * Protobuf changes: - Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by linkerd#5510). - Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs. * Added chart templates for new viz linkerd-metrics-api pod * Spin-off viz healthcheck: - Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients. - The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface. - Refactored the data plane checks so they don't rely on calling `ListPods` - The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck. * Removed linkerd-controller dependency on Prometheus: - Removed the `global.prometheusUrl` config in the core values.yml. - Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (linkerd#5352). * Moved observability gRPC from linkerd-controller to viz: - Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server). - Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type. - Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.). - Also simplified some type names to avoid stuttering. * Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits. * linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container. * CLI updates and other minor things: - Changes to command files under `cli/cmd`: - Updated `endpoints.go` according to new API interface name. - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically. - Changes to command files under `viz/cmd`: - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz. - Other changes to have tests pass: - Added `metrics-api` to list of docker images to build in actions workflows. - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`). * Add retry to 'tap API service is running' check * mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
…ings into a new viz component 'linkerd-metrics-api' (linkerd#5560) * Protobuf changes: - Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by linkerd#5510). - Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs. * Added chart templates for new viz linkerd-metrics-api pod * Spin-off viz healthcheck: - Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients. - The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface. - Refactored the data plane checks so they don't rely on calling `ListPods` - The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck. * Removed linkerd-controller dependency on Prometheus: - Removed the `global.prometheusUrl` config in the core values.yml. - Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (linkerd#5352). * Moved observability gRPC from linkerd-controller to viz: - Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server). - Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type. - Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.). - Also simplified some type names to avoid stuttering. * Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits. * linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container. * CLI updates and other minor things: - Changes to command files under `cli/cmd`: - Updated `endpoints.go` according to new API interface name. - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically. - Changes to command files under `viz/cmd`: - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz. - Other changes to have tests pass: - Added `metrics-api` to list of docker images to build in actions workflows. - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`). * Add retry to 'tap API service is running' check * mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
These changes are grouped into 8 commits described and linked to as follows (I'll eventually followup with more commits addressing feedback).
1) A couple more protobuf changes:
healthcheck.proto
back from viz toproto/common
as it remains being used by the mainhealthcheck.go
library (it was moved to viz by Separate observability API #5510).viz.proto
the IP-related types and put them in/controller/gen/common/net
to be used by both the public and the viz APIs.2) Chart templates for new viz linkerd-metrics-api pod
3) Spin-off viz healthcheck:
viz/pkg/healthcheck/healthcheck.go
that wraps the originalpkg/healthcheck/healthcheck.go
while adding thevizNamespace
andvizAPIClient
fields which were removed from the corehealthcheck
. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.healthcheck.Runner
interface.Moved to viz theRefactored the data plane checks so they don't rely on callinglinkerd-data-plane
checks because they rely on Prometheus, but they're currently commented-out pending the wiring up oflinkerd viz check --proxy
.ListPods
The core healthcheck unit tests dealing with the viz api have been removed and will be refactored later into a new set of viz healthcheck unit tests.viz/cmd/check.go
have been moved toviz/pkg/healthcheck/healthcheck.go
as well, socheck.go
's sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.ThegetNamespaceOfExtensions()
function has been refactored intoviz.pkg.GetVizNamespace()
as is consumed from a few other places as well.multicluster/cmd/check.go
also now relies on viz' api because it hits theGateways()
endpoint that depends on Prometheus. In a followup, we should move that api into some MC component (the gateway itself?) so multicluster can be installed without vizmulticluster/cmd/gateways.go
4) Remove linkerd-controller dependency on Prometheus:
global.prometheusUrl
config in the core values.yml.linkerd-controller
dependencies on Prometheus.-prometheus
flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (heartbeat cron should get metrics from linkerd-viz extension when available #5352).5) Move observability gRPC from linkerd-controller to viz:
We continue removing the linkerd-controller dependencies on viz stuff, except for things required for the tap server, which can be tackled separately afterwards.
viz/metrics-api
moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).PublicAPIClient
(now called justClient
) interface. TheVizAPIClient
interface disappears as it's enough to just rely on the vizApiClient
protobuf type.controller/api/public
toviz/metrics-api
(edge.go
,stat_summary.go
, etc.).6) linkerd-metrics-api bootstrap files:
Here we complete the work started in #5554 ("Chart templates for new viz linkerd-metrics-api pod") by providing the Dockerfile,
main.go
command and thebin/
scripts for building the newlinkerd-metrics-api
pod under thelinkerd-viz
namespace.At the same time, we strip out of the public-api's
main.go
file the prometheus parameters and other no longer relevant bits.Finally, updated tests related to linkerd-metrics-api.
7) linkerd-web updates:
linkerd-web
requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.We're also updating some dependencies that still were pointing to the public-api and now should point to the viz api.
8) CLI updates and other minor things:
Changes to command files under
cli/cmd
:endpoints.go
according to new API interface name.version.go
,dashboard
anduninstall.go
to pull the viz namespace dynamically.Changes to command files under
viz/cmd
:edges.go
,routes.go
,stat.go
andtop.go
: point to dependencies that were moved from public-api to viz.Other changes to have tests pass:
metrics-api
to list of docker images to build in actions workflows.bin/fmt
exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point:viz/metrics-api
).Closes #5328