Add MVP of AWS AutoDiscovery #1936

o1oo11oo · 2023-09-07T15:59:53Z

Description

This is the MVP of the AWS AutoDiscovery, which implements a starting point for automatically monitoring AWS resources. The MVP only tracks containers running on ECS, everything else will come at a later point. See #1894 for more details.

As mentioned in the linked issue, excluded from the scope of the MVP are:

AWS state synchronization and initial sync
Local message buffering and reordering (EventBridge does not guarantee order)
K8s health check endpoint
Scans in different namespace
Proper retry and requeuing for requests that resulted in k8s errors

Closes #1894

Checklist

Test your changes as thoroughly as possible before you commit them. Preferably, automate your test by unit/integration tests.
Make sure that all your commits are signed-off and that you are added to the Contributors file.
Make sure that all CI finish successfully.
Optional (but appreciated): Make sure that all commits are Verified.

netlify · 2023-09-07T15:59:59Z

✅ Deploy Preview for docs-securecodebox canceled.

Name	Link
🔨 Latest commit	`b6ba7cd`
🔍 Latest deploy log	https://app.netlify.com/sites/docs-securecodebox/deploys/65325f12088f41000895a52e

auto-discovery/cloud-aws/README.md

auto-discovery/cloud-aws/main.go

auto-discovery/cloud-aws/templates/service/service.yaml

auto-discovery/cloud-aws/main.go

auto-discovery/cloud-aws/templates/rbac/serviceaccount.yaml

auto-discovery/cloud-aws/suite_test.go

auto-discovery/cloud-aws/README.md

This is a WIP implementation of the AWS autodiscovery, showing basic functionality for getting changes from AWS and updating kubernetes resources. At this point only Scans are created, not ScheduledScans, and they are never deleted either. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Previously this package used client-go, but all other parts of the secureCodeBox use the client from controller-runtime. Using the same client everywhere simplifies maintenance. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

"Register" applies more to CustomResourceDefinitions rather than the actual CustomResources themselves. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Decouple the AWS and Kubernetes implementations by using an interface for the Kubernetes Reconciler. This should simplify testing later. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Previously the AWS monitor would only react to RUNNING state changes, now it also detects STOPPED events and also forwards those to the kubernetes Reconciler. This is still at a Task-level, which works, but is somewhat generic, there might be ScheduledScans that would not be needed anymore if the containers have already stopped. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Ignore errors that come from duplicate scan creation/deletion, as those might result from the same container being used multiple times. Of course this means there is still no proper handling of multiple containers, which might lead to scans getting deleted too early. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Remember how many containers using one image are running and only delete the scan when the last one has been deleted. This prevents the scan from getting deleted after the first instance of an image stops and instead defers it till the last one stops. Kind of moot at this point because this MVP still uses scans instead of scheduled scans, but needed for heading in the right direction. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

The point of the AutoDiscovery is to continuously monitor the resources it tracks. This is done by creating ScheduledScans when they appear and deleting them when they disappear. For SBOMs this is not particularly useful, in most cases (excluding upgrades to trivy-sbom), generating an SBOM for the same image and digest should result in the same SBOM. Generating only Scans for SBOM targets instead of ScheduledScans would make everything a lot more complicated though, so they are also covered by ScheduledScans. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Previously the AutoDiscovery would track the state of containers by comparing the lastStatus of the whole task. This worked somewhat, but ran into problems quickly. Technically it created too many scheduled scans, if some of a task's containers stopped before the others the scheduled scans would be kept till the whole task was finished. "RUNNING" events could come in multiple times, any time after the containers had started and the task changed without them stopping. There was no way to differentiate between a new state change, or one seen before. This led to containers getting counted too often and the scans never getting deleted. In this version the state of the individual containers is tracked, which allows more fine-grained scan creation, but also brings other problems: - very short-lived containers never have the status RUNNING, therefore detect that they ran by storing which containers were pending - we don't want scheduled scans around for containers that already sent a STOPPED so those are one-off scans, but those should also get cleaned up at some point - although unlikely, containers might go backwards in the lifecycle, either because something is acting weird or because we were not running or missed events for other reasons, so these also need to be handled somewhat sensibly - one of these cases not really handled right now is a STOPPED for a container never seen before because containers are not stored forever so most likely it is just a container we already saw a STOPPED for In general this version works decently well with most setups, there are some rough edges with the one-off scans and image reference handling though. Since containers can be specified with or without a tag, those will currently be handled as different images, even if both use latest. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Instead of manual print statements use proper logging. The other autodiscovery also uses the controller-runtime integrated zap logger, so to keep things simple also use it here. The loglevels are a bit different though, not sure how they work when using it with kubebuilder, but the kubernetes AutoDiscovery uses loglevels a lot higher than what gets printed using normal options. Therefore this only uses Info and V(1) aka Debug. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Use envtest to test the kubernetes part of the service against an actual kubernetes control plane. This is the same way the kubernetes autodiscovery and the operator are tested, but here it is used without controller-gen or kustomize because this is not a kubebuilder project without any custom resources. The current tests are only a small demonstration to connect to the envtest control plane. It is not clear yet where the envtest connection should be made, in the tests for the kubernetes module, or the tests for main. To me it seems like the tests for the kubernetes module should test only that, but the tests for main should test the combination of the aws and kubernetes parts, and then both test suites need to access a kubernetes control plane. If envtest supports multiple different ones that could work, otherwise it might need to be shared. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

The initial iteration of the tests already used envtest to test the behaviour of the implementation against an actual kubernetes API server, but tested only the kubernetes side of the service. This restructures the tests to cover the whole service: - main_test contains the integation tests for the service, a running AWSMonitor gets fed fake messages through a mocked sqs connection and sends requests to the reconciler, which then sets up ScheduledScans in the envtest k8s API server, where the tests check the results - aws_test contains unit tests for aws only functions - kubernetes_test contins unit tests for kubernetes only functions Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Some functions were only exported to be accessible from the tests, but it makes more sense to keep them private and use whitebox tests instead. For that the unit tests now use the non _test package. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Instead of redefining all the types for the contents of the messages from AWS, use the existing types from the SDK to unmarshal messages. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Previously the ScheduledScans the autodiscovery would create were hardcoded to trivy-sbom-image. Even for an MVP this is not that helpful though. This makes the scans and some other options configurable in much the same way they are for the kubernetes autodiscovery, by reading from a yaml config file which gets supplied to a commandline flag. The kubernetes autodiscovery with its kubebuilder structure uses [ComponentConfig][1] for this, which is deprecated and creates options to be consumed by the manager, which this project does not use. To keep the whole experience similar, the [kubernetes yaml package][2] is used to provide the same parsing behaviour. It converts the file to json first, which means structs do not need yaml tags in addition to json tags. To keep code duplication minimal, the cloud autodiscovery reuses parts of the kubernetes autodiscovery for the config. The ScanConfigs are part of configv1 and the whole templating code is imported to provide the same convenience of templated scan definitions. Some functions that are privately defined in the controllers are copied over, in the future these might get relocated to a shared util package. The auto-discovery-cloud-aws-config.yaml file contains an example config that can be used for local development. Other than for the kubernetes autodiscovery just using this file is not enough though, because the settings to connect to the SQS queue on AWS are individual. Those can also be specified through environment variables, which override the values specified in the config file. [1]: https://kubebuilder.io/component-config-tutorial/tutorial [2]: https://pkg.go.dev/sigs.k8s.io/yaml Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

To properly react to OS signals, use the signal handler provided by controller-runtime and the WithContext functions from the AWS SDK. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Make sure the scans we create can actually be created, even though the cloud autodiscovery does not care about this at the moment. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Add the necessary structure for making this a proper auto discovery service: Makefile, Dockerfile, Helm Chart. The files are adapted from the Kubernetes AutoDiscovery. Since the Cloud-AWS AutoDiscovery does not need to monitor resources in kubernetes and only creates scans, it does not need a ClusterRole, a Role and RoleBinding for the namespace it creates the scans in is enough. The current set of permissions is probably too large, this can be restricted further later. For connecting to AWS there are no good defaults, so other than the Kubernetes AutoDiscovery, there cannot be a completely ready dev-config. To ease that, the Cloud-AWS AutoDiscovery allows specifying AWS connection options in multiple ways, either through the config or as environment variables. When the helm-deploy Makefile target is used, values for SQS_QUEUE_URL and AWS_REGION get picked up from the shell's environment and set as Helm values. AWS credentials need to be provided as a Kubernetes secret, which can be configured through values under awsAuthentication. Credential values are optional, the service can also connect to AWS using only part of the credentials or through IAM roles if it is running on AWS itself. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Write basic information about the Cloud-AWS AutoDiscovery. This especially includes how to set up the necessary resources in AWS so that the AutoDiscovery can be used. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Feels like there should be integration tests as well, the Kubernetes AutoDiscovery only runs them together with the PullSecretsExtractor. At the moment there are none though, so this will have to do. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Straight copy-paste from the Kubernetes AutoDiscovery with updated snapshot. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

This simplifies debugging the AutoDiscovery service when running it independenly. Unfortunately it seems to be impossible to get setup-envtest to install the binaries to a predictable path, the only argument it accepts is --bin-dir, but it still appends /k8s/<os things>. Moving the binaries manually is a bit of a pain. Vscode cannot set env variables from script outputs though, so just running setup-envtest also does not work. To solve this, a preLaunchTask is used, which invokes a Makefile target to create an environment file which contains the path setup-envtest comes up with, which can then be read by vscode. Kind of unnecessary, but works. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Sometimes when containers on ECS exit very quickly, they never enter the RUNNING status and instead go directly from PENDING to STOPPED. Previously the AutoDiscovery would track all the containers that were PENDING and create individual scans for those that transitioned to STOPPED. This was a bit of a hack though, because then those scans would never get cleaned up and it would require additional configuration. This instead removes the detection of short lived containers for the MVP, it might get added back at a later point. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Expand the envtest integration tests so that they send multiple messages to simulate the whole lifecycle of an ECS task to verify that every event gets handled correctly. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

_service_ is already a defined term in kubernetes, the file named service.yaml before contains the deployment. The name was a straight copy from the kubernetes AutoDiscovery, where it is namend manager.yaml, with the manager -> service rename that this implementation uses. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

The docs did not make clear enough that the AWS Cloud AutoDiscovery is in a prerelease state and while it monitors AWS/ECS, it runs on Kubernetes, in or out of AWS. This is added using the [GitHub NOTE/WARNING syntax][1], since that is the main place where this info is displayed, it is not used for the docs on the website. On DockerHub and ArtifactHub it should be ok. The order/structure of the README was not the best, it started with the (default and autogenerated) helm install command, which would fail anyway because the whole AWS setup is missing. The AWS setup was only described after that. This reverses the order and moves all that setup into a Prerequisites section, and also adds another helm command to the Deployment section, that shows how to actually send the necessary values to the deployment. Unfortunately it is not possible to get rid of the default deployment command without altering all the templates. This also clarifies some details, if all values are necessary and what the AWS cost calculations are based on. [1]: https://github.com/orgs/community/discussions/16925 Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Technically the namespace the scans get created in can be configured by setting config.kubernetes.namespace, but outside of development it is of low practical value, because the Helm chart only defines a Role and a RoleBinding for the namespace the AutoDiscovery gets installed in. To allow it to create scans in a different namespace, more Roles/RoleBindings or a ClusterRole and a ClusterRoleBinding are needed. These could be created manually but it is not the nicest way to do this, so officially configuring the namespace is not supported for now. To achieve this, the namespace is removed from values.yaml, so that it is hidden in the documentation, but it can still be set from the config passed with --config, which is needed for local development. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Standard or good golang project layout seems to be quite controversial, there is the [Standard Go Project Layout][1], which comes with an [extensive disussion][2] if this is encouraged or not. There are also [blogposts][3] by go team members and the official [Organizing a Go module][4] docs. The takeaway I got was that a cmd dir is good for the executables and a pkg dir is fine if it makes the repository structure better, and with all the helm and docker files this is certainly the case. [1]: https://github.com/golang-standards/project-layout [2]: golang-standards/project-layout#117 [3]: https://eli.thegreenplace.net/2019/simple-go-project-layout-with-modules/ [4]: https://go.dev/doc/modules/layout Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Move docker reference normalization to the kubernetes side, it fits better there. Several places of the code need different parts of the docker reference of the running containers. This simplifies the access to those properties and uses them in a more consistent way. The scans now only use the actual image name for their name, so that there is space for information other than docker-io-. The hashmap of running containers now uses a normalized reference string instead of the ImageInfo objects, which makes it less error prone with the included reference information. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Technically they are independent from the cloud provider, so they should have generic names. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Give names to the loggers the aws and kubernetes sides receive to more clearly spearate the messages and log more information from the received aws messages. Unfortunately go does not allow simple iterator/map statements to access the properties that should be logged, after all for loops are all everyone could ever need. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Previously the code was a bit sparsely commented, this adds some more. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

The initial set of kubernetes permissions for the role in the helm template was adapted from the kubernetes AutoDiscovery, which is based on kubebuilder. The AWS Cloud AutoDiscovery manually connects to kubernetes and uses a lot fewer API permissions than the kubernetes AutoDiscovery. If future features need more permissions they can be added back again. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

o1oo11oo added enhancement New feature or request auto-discovery go Pull requests that update Go code labels Sep 7, 2023

o1oo11oo self-assigned this Sep 7, 2023

o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch 3 times, most recently from c3744bf to a45cce0 Compare September 22, 2023 14:11

o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch from a45cce0 to ebc7491 Compare September 26, 2023 17:36

o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch 2 times, most recently from 96d190d to db64253 Compare October 12, 2023 17:54

o1oo11oo marked this pull request as ready for review October 13, 2023 07:36

o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch 3 times, most recently from b7807ad to f66452f Compare October 13, 2023 17:19

J12934 reviewed Oct 18, 2023

View reviewed changes

o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch 2 times, most recently from fc06ada to 7635a9d Compare October 19, 2023 16:56

o1oo11oo changed the title ~~Add initial WIP version of AWS autodiscovery~~ Add MVP of AWS AutoDiscovery Oct 19, 2023

o1oo11oo requested a review from J12934 October 19, 2023 17:05

o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch from 7635a9d to ca73a55 Compare October 19, 2023 17:18

Lukas Fischer added 9 commits October 20, 2023 12:57

secureCodeBox#1894 Use proper names for k8s CRUD functions

f3ab9c3

"Register" applies more to CustomResourceDefinitions rather than the actual CustomResources themselves. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

secureCodeBox#1894 Use interface for Reconciler

264d2d3

Decouple the AWS and Kubernetes implementations by using an interface for the Kubernetes Reconciler. This should simplify testing later. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

Lukas Fischer added 23 commits October 20, 2023 12:57

secureCodeBox#1894 Use types from the aws sdk where possible

2f0cfc3

Instead of redefining all the types for the contents of the messages from AWS, use the existing types from the SDK to unmarshal messages. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

secureCodeBox#1894 Use controller-runtime signal handler

e6e3dbf

To properly react to OS signals, use the signal handler provided by controller-runtime and the WithContext functions from the AWS SDK. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

secureCodeBox#1894 Create ScanType during the integration tests

a463894

Make sure the scans we create can actually be created, even though the cloud autodiscovery does not care about this at the moment. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

secureCodeBox#1894 Add docs for Cloud-AWS AutoDiscovery

96dd9c6

Write basic information about the Cloud-AWS AutoDiscovery. This especially includes how to set up the necessary resources in AWS so that the AutoDiscovery can be used. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

secureCodeBox#1894 Add helm unittests for the AWS AutoDiscovery

d1b3d02

Straight copy-paste from the Kubernetes AutoDiscovery with updated snapshot. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

secureCodeBox#1894 Remove AWS references from the k8s structs

56275f2

Technically they are independent from the cloud provider, so they should have generic names. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

secureCodeBox#1894 Add more comments

965c2d0

Previously the code was a bit sparsely commented, this adds some more. Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>

J12934 approved these changes Oct 20, 2023

View reviewed changes

o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch from ca73a55 to e63bfaa Compare October 20, 2023 10:57

o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch from e63bfaa to b6ba7cd Compare October 20, 2023 11:05

o1oo11oo enabled auto-merge (rebase) October 20, 2023 11:44

o1oo11oo merged commit 0489d6e into secureCodeBox:main Oct 20, 2023

o1oo11oo deleted the feat/cloud-autodiscovery-aws branch October 20, 2023 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MVP of AWS AutoDiscovery #1936

Add MVP of AWS AutoDiscovery #1936

o1oo11oo commented Sep 7, 2023 •

edited

Loading

netlify bot commented Sep 7, 2023 •

edited

Loading

Add MVP of AWS AutoDiscovery #1936

Add MVP of AWS AutoDiscovery #1936

Conversation

o1oo11oo commented Sep 7, 2023 • edited Loading

Description

Checklist

netlify bot commented Sep 7, 2023 • edited Loading

✅ Deploy Preview for docs-securecodebox canceled.

o1oo11oo commented Sep 7, 2023 •

edited

Loading

netlify bot commented Sep 7, 2023 •

edited

Loading