add AWX role #8226

cben · 2018-05-02T10:49:07Z

Goal: The recently added autoheal role (#7753) needs an AWX (or Ansible Tower) to execute playbooks in response to problems.
Currently you need to bring your own and set many config vars to point to it.
Ideally we'll want a way to launch both together, with less config (but unclear if that should will be the default).

This PR is a first step — adds a role to launch an AWX but does not config autoheal to use it.

Based on upstream AWX playbooks,
those keep changing so I'm trying to not change them too much so I can keep them in sync in future.

open questions:

should everything here be called awx / openshift_awx / openshift_autoheal_awx ?
I picked awx as for now it's an optional thing, not part of normal install, but not sure that makes sense.
UPDATE: the way it's likely to get used will be deployed together with autoheal in [WIP] Add openshift_autoheal_deploy_awx var to deploy an awx for autoheal #8549, which overrides to deploy into openshift-autoheal project.
which AWX version do we want? I used 1.0.3 as starting point, but can move up/down.
1.0.4 & 1.0.5 will also need rabbitmq and etcd; latest 1.0.6 needs rabbitmq 3.7 but no etcd.
origin vs enterprise AWX images? no idea, I didn't find any tower nor awx on RH container catalog?
(also downstream rabbitmq image? again, no downstream image, except rhosp12/openstack-rabbitmq)
should I set cpu & mem requests/limits? can steal from later awx templates (they set quite high, 3 cpu total).
security!
- awx user: admin, password: ~~password~~ randomized, stored into a secret.
- RABBITMQ_ERLANG_COOKIE, RABBITMQ_DEFAULT_PASS — randomized, stored into a secret. IIUC, rabbitmq is not exposed outside the pod in 1.0.3 but became exposed later.
- awx-web-svc serves insecure HTTP, tls terminated only in router :-(
  Turns out awx can't yet serve https itself (Ingress not working for AWX Kubernetes installation ansible/awx#1781 (comment), Add SSL Termination to standalone docker deployment ansible/awx#1549 and others), current practice is put a separate nginx or other proxy in front (although awx_web container already includes an nginx!).
  => Asking upstream & deferring this for now. I don't know if we need to add auth-proxy anyway (?), if we do that could cover TLS as well.

Which of these are blockers for merge? Can I iterate in later PRs?

@jhernand @zgalor @elad661 @ironcladlou please review.

openshift-ci-robot · 2018-05-02T10:49:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cben
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: michaelgugino

Assign the PR to them by writing /assign @michaelgugino in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vrutkovs · 2018-05-02T10:52:21Z

roles/awx/tasks/install.yml

+- name: Deploy and Activate Postgres
+  shell: >-
+    {{ openshift_client_binary }} project {{ awx_project }};
+    {{ openshift_client_binary }} process -n openshift postgresql-persistent


Could oc_process module be used here?

I tried initially and to be honest I don't fully remember why I dropped it :-)
Hmm, from oc_process.py:

in conjunction with 'create: True' the process will be piped to | oc create -f

That doesn't sound idempotent. I guess I need to wrap it in a condition to create unless exists?
Then what if you upgrade openshift and the builtin postgresql-persistent template changes but you won't pick up changes. So I think | oc apply is better?
Or maybe I'm looking at this wrong, and upgrading a postgres in place is actually undesired?

Perhaps this should be combined with @elad661's suggestion - create a new template and then use oc_apply to create new objects from the template. There should already be a postgresql template

Do you mean the oc_apply role instead of oc apply command?

Yes, oc_apply role would make it cleaner.

Switched from j2 to an openshift template.

Used one oc_configmap because the main payload is a fixed python file (getting param by env var instead of textual templating) and I felt it's much cleaner to just have it as a file.

The two existing oc_apply wrappers I found (roles/openshift_metrics/tasks/oc_apply.yaml, roles/openshift_provisioners/tasks/oc_apply.yaml) added value is reporting "changed" correctly. They seem to expect a single object, as they check resourceVersion before/after. I'd have to complicate that approach to iterate the multiple objects the template creates.
=> Instead, I'm running oc apply directly, formulated a changed_when that looks at output messages. Tested with/without change and with several errors (both in template and apply stage).

elad661

Few minor comments. Open questions remain open...

elad661 · 2018-05-02T11:03:07Z

roles/awx/tasks/install.yml

+
+- name: Template Openshift AWX Deployment
+  template:
+    src: deployment.yml.j2


I think the current pattern in openshift-ansible is to move away from jinja templates and into OpenShift Templates

The need for templating here is actually very little; most objects are fixed, and a few env vars could be converted to secretKeyRef.

I've also considered replacing the template with a series of oc_service, oc_configmap, ..., oc_obj modules.

I'm happy to go either way, just need someone to tell me...

elad661 · 2018-05-02T11:05:09Z

roles/awx/templates/deployment.yml.j2

+  labels:
+    name: awx-web-svc
+spec:
+  type: "NodePort"


Is there any specific reason for using NodePort here?

no idea :) copied from https://github.com/ansible/awx/blob/1.0.3/installer/openshift/templates/deployment.yml.j2#L76

Thanks for noticing, it does sound undesired, I'll look into it.

removed, works with a regular ClusterIP service just as well.

elad661 · 2018-05-02T11:06:57Z

roles/awx/templates/deployment.yml.j2

+    targetPort: http
+  tls:
+    insecureEdgeTerminationPolicy: Allow
+    termination: edge


I think a lot of components use reencrypt and expect the pod itself to server https with an automatically generated certificate

ironcladlou · 2018-05-02T14:27:17Z

Haven't yet had a chance to look very closely at this yet, but one thing that stood out immediately is that you should probably be using serialized OpenShift Templates for all your assets rather than Jinja templates. Recently discussed this in the autoheal role PR. New component roles should be very loosely coupled to Ansible.

ironcladlou

Templating is looking good, added some comments re: security of secrets

ironcladlou · 2018-05-15T20:41:33Z

roles/awx/files/awx-deployment.yml

+              - mountPath: /etc/tower
+                name: awx-application-config
+            env:
+              - name: DATABASE_PASSWORD


Environment is not a secure place for secrets... can awx consume the password from a file on disk?

It probably can — I can mount the secret as volume, and modify settings.py to read the files.

But can you explain the threat model?
k8s API returns such env vars with valueFrom: secretKeyRef:, it does not expose the effective value.
Are you thinking of with oc exec access into the pod — I think then it's game over anyway?
Does the kubelet leak them in some way?

Many of the builtin templates use such secretKeyRef env vars for keys, passwords, etc.:
https://github.com/openshift/openshift-ansible/search?q=secretKeyRef
including the postgresql-persistent template this role uses:

openshift-ansible/roles/openshift_examples/files/examples/v3.10/db-templates/postgresql-persistent-template.json

Lines 123 to 127 in 5c1207c

"name": "POSTGRESQL_PASSWORD",

"valueFrom": {

"secretKeyRef": {

"key": "database-password",

"name": "${DATABASE_SERVICE_NAME}"

A lot of the issues with secrets in env were covered way back in the original design discussion, but I wouldn't be surprised if various enhancements have been made since then that I'm just unaware of to reduce the risks.

In this case I probably wouldn't go too far out of my way to introduce file support where none exists, but I would default to avoiding environment when possible (as it's generally an additional risk with no clear added value outside shim use cases).

No can do, not with current awx images.
Turns out they look at POSTGRESQL_PASSWORD, AWX_ADMIN_USER, AWX_ADMIN_PASSWORD in shell scripts (that are baked into the image) before settings.py (that we control) comes into play.
Can't even attempt equivalent logic in settings.py later, because if we omit AWX_ADMIN_PASSWORD env var, we might later create a superuser with password specified via a file, but an insecure admin/password superuser will be created as well.

I'll start a discussion upstream in awx about various ways (including this) learnt from here to improve their k8s/openshift deployment security, and see if they want PRs for any of those.

ironcladlou · 2018-05-15T20:42:32Z

roles/awx/files/awx-deployment.yml

+                value: postgresql
+              - name: DATABASE_PORT
+                value: "5432"
+              - name: DATABASE_PASSWORD


Environment is not a secure place for secrets... can pgsql consume the password from a file on disk?

ironcladlou · 2018-05-15T20:42:52Z

roles/awx/files/awx-deployment.yml

+                  secretKeyRef:
+                    name: admin-credentials
+                    key: username
+              - name: AWX_ADMIN_PASSWORD


As above re: environment/secrets

ironcladlou · 2018-05-15T20:43:09Z

roles/awx/files/awx-deployment.yml

+                value: rabbitmq
+              - name: RABBITMQ_DEFAULT_USER
+                value: awx
+              - name: RABBITMQ_DEFAULT_PASS


As above re: environment/secrets

for RABBITMQ_ERLANG_COOKIE & RABBITMQ_DEFAULT_PASS it's even worse — I left them hard-wired to fixed known values! (as do upstream awx playbooks)
AFAICT rabbitmq container has no ports exposed outside the pod — does that sound sane or should I randomize those just in case?

Even if you want to use fixed defaults, I'd make the passwords required parameters and move the defaulting logic out into the installer bits

Randomized these 2 as well, stored in secrets. It seems in later AWX version rabbitmq is exposed and these will become important (?)
RABBITMQ_ERLANG_COOKIE could be mounted, not sure about RABBITMQ_DEFAULT_PASS (indirectly by putting it in rabbitmq.conf?); not gonna bother with this until I get guidance from awx where to invest effort.

ironcladlou · 2018-05-15T20:45:26Z

roles/awx/tasks/install.yml

+- name: Deploy and Activate Postgres
+  # Note: NAMESPACE param is where to look for postgresql ImageStream.
+  shell: >-
+    {{ openshift_client_binary }} project {{ awx_project }};


I'd breakout namespace creation into a discrete task

This did not create the project, only switched to it to control where oc apply instantiates (this template omits namespace:).
=> Discovered I can use the cleaner oc apply -n ..., switched to that.

cben · 2018-05-30T13:15:27Z

/retest

openshift-ci-robot · 2018-05-30T19:28:22Z

@cben: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/system-containers	`d47d171`	link	`/test system-containers`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

https://github.com/ansible/awx/tree/1.0.3/installer/openshift

Example output when creating: "stderr": "", "stdout": "deployment.extensions \"awx\" created\nservice \"awx-web-svc\" created\nroute.route.openshift.io \"awx-web-svc\" created", Example output when re-applying without change: "stderr": "", "stdout": "deployment.extensions \"awx\" unchanged\nservice \"awx-web-svc\" unchanged\nroute.route.openshift.io \"awx-web-svc\" unchanged", Example output when re-applying with one change: "stderr": "", "stdout": "deployment.extensions \"awx\" configured\nservice \"awx-web-svc\" unchanged\nroute.route.openshift.io \"awx-web-svc\" unchanged", failed_when doesn't seem necessary, already works correctly. Tested: - invalid yaml syntax (which breaks `oc template`) - invalid content - missing name - trying to create in non-existent namespace (which breaks `oc apply`) and they all result in "rc": 1 and error in stderr.

This file is not part of openshift-ansible code, it's mounted into awx and just happens to be written in Python.

- Renamed secret from 'awx' to 'admin-credentials' for clarity - Added 'username' key - Renamed 'admin_password' key to 'password'

Friendlier to running awx in same namespace with autoheal.

- RABBITMQ_ERLANG_COOKIE could be mounted at /var/lib/rabbitmq/.erlang.cookie instead of an env var: https://github.com/docker-library/rabbitmq/blob/d7096266dfb047fce1eff89bd759e3ab55d779f3/3.7/docker-entrypoint.sh#L164-L176 However https://hub.docker.com/_/rabbitmq/ doc only mentions RABBITMQ_ERLANG_COOKIE and RABBITMQ_DEFAULT_PASS env var.

cben · 2018-07-01T08:24:30Z

@ironcladlou this got stuck, how do we move forward?

I believe I addressed your comments except for env vars — they can't be passed via secrets with existing awx image.
I asked @matburt if they're interested in upstream improvements (=> sure, yes) but then our team moved on to different project and I haven't had time to contribute anything :( Anyway, from the various ways to improve ansible/awx security, env vars are the least impactful.

We still want to slowly push autoheal forward. What's needed from your perspective to merge this (and then #8549)?

ironcladlou · 2018-07-06T15:26:34Z

I think we need to establish who owns autoheal generally going forward (it's not me). @derekwaynecarr, does monitoring now own this? Some other new team?

cben · 2018-11-04T09:13:18Z

@oourfali Will we or anyone else proceed with autoheal?
For now I'm closing this as abandoned. If it's to be resurrected, would need to rebase and update to recent AWX, a non-negligible effort...

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 2, 2018

openshift-ci-robot requested review from michaelgugino and vrutkovs May 2, 2018 10:49

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 2, 2018

cben mentioned this pull request May 2, 2018

AWX role ironcladlou/openshift-ansible#34

Closed

5 tasks

vrutkovs reviewed May 2, 2018

View reviewed changes

elad661 reviewed May 2, 2018

View reviewed changes

cben force-pushed the awx branch 7 times, most recently from 2fed963 to e2b73c7 Compare May 15, 2018 11:14

ironcladlou suggested changes May 15, 2018

View reviewed changes

cben mentioned this pull request May 28, 2018

[WIP] Add openshift_autoheal_deploy_awx var to deploy an awx for autoheal #8549

Closed

cben force-pushed the awx branch 2 times, most recently from 2c52d7c to fd2b621 Compare May 30, 2018 13:14

cben changed the title ~~[WIP] add AWX role~~ add AWX role May 30, 2018

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 30, 2018

cben force-pushed the awx branch from fd2b621 to c90f9fe Compare May 30, 2018 17:25

cben added 5 commits June 10, 2018 17:25

awx role, using original templates from awx 1.0.3

fc93df6

https://github.com/ansible/awx/tree/1.0.3/installer/openshift

Hardwire pg_database, pg_username, pg_port vars

a4468a6

s/awx_openshift_project/awx_project/

ae1f7ea

Convert deployment.yml.j2 to a Template

54e5314

cben added 11 commits June 10, 2018 17:25

Replace configmap.yml.j2 with oc_configmap module

014b084

Use regular ClusterIP service instead of NodePort

8bc0ad3

randomize AWX_ADMIN_PASSWORD

c1060f2

pylint skip awx's settings.py

9368f08

This file is not part of openshift-ansible code, it's mounted into awx and just happens to be written in Python.

Rename foreign settings.py to avoid pylint, flake8 etc

1b79e5a

This file is not part of openshift-ansible code, it's mounted into awx and just happens to be written in Python.

add meta/main.yml to awx role

5b1f6d5

Make awx admin password secret suitable as credentialsRef for autoheal

a9e06bc

- Renamed secret from 'awx' to 'admin-credentials' for clarity - Added 'username' key - Renamed 'admin_password' key to 'password'

Use oc apply -n ... instead of oc project ...; oc apply

61fdfea

Rename secret admin-credentials -> awx-admin-credentials

84739ff

Friendlier to running awx in same namespace with autoheal.

Document awx-admin-credentials username is unsafe to change

785ed15

cben force-pushed the awx branch from c90f9fe to f9562b1 Compare June 10, 2018 14:25

cben closed this Nov 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add AWX role #8226

add AWX role #8226

cben commented May 2, 2018 •

edited

Loading

openshift-ci-robot commented May 2, 2018

vrutkovs May 2, 2018

cben May 2, 2018

vrutkovs May 2, 2018

cben May 2, 2018

vrutkovs May 2, 2018

cben May 15, 2018 •

edited

Loading

elad661 left a comment

elad661 May 2, 2018

cben May 2, 2018

elad661 May 2, 2018

cben May 2, 2018

cben May 15, 2018

elad661 May 2, 2018

ironcladlou commented May 2, 2018

ironcladlou left a comment

ironcladlou May 15, 2018

cben May 16, 2018

ironcladlou May 16, 2018

cben May 23, 2018

ironcladlou May 15, 2018

ironcladlou May 15, 2018

ironcladlou May 15, 2018

cben May 16, 2018

ironcladlou May 16, 2018

cben May 30, 2018

ironcladlou May 15, 2018

cben May 16, 2018

cben commented May 30, 2018

openshift-ci-robot commented May 30, 2018 •

edited

Loading

cben commented Jul 1, 2018

ironcladlou commented Jul 6, 2018

cben commented Nov 4, 2018

	"name": "POSTGRESQL_PASSWORD",
	"valueFrom": {
	"secretKeyRef": {
	"key": "database-password",
	"name": "${DATABASE_SERVICE_NAME}"

add AWX role #8226

add AWX role #8226

Conversation

cben commented May 2, 2018 • edited Loading

openshift-ci-robot commented May 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cben May 15, 2018 • edited Loading

Choose a reason for hiding this comment

elad661 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ironcladlou commented May 2, 2018

ironcladlou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cben commented May 30, 2018

openshift-ci-robot commented May 30, 2018 • edited Loading

cben commented Jul 1, 2018

ironcladlou commented Jul 6, 2018

cben commented Nov 4, 2018

cben commented May 2, 2018 •

edited

Loading

cben May 15, 2018 •

edited

Loading

openshift-ci-robot commented May 30, 2018 •

edited

Loading