Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add add_replica plan #166

Merged
merged 5 commits into from
Jun 25, 2021
Merged

Conversation

timidri
Copy link
Contributor

@timidri timidri commented Jun 17, 2021

Add add_replica plan

@timidri timidri force-pushed the SOLARCH-432-peadm-plan-replace-missing-replica branch from 938b35a to d388618 Compare June 22, 2021 13:49
@timidri timidri changed the title Add replace_replica plan (WIP) Add add_replica plan (WIP) Jun 22, 2021
@timidri
Copy link
Contributor Author

timidri commented Jun 22, 2021

@reidmv The puppet infrastructure provision replica step is failing with:

[root@litmus-2129873944d29150 ~]# puppet infrastructure provision replica litmus-7ecb27cf5ba29302.c.ia-content.internal --verbose
Pinning litmus-7ecb27cf5ba29302.c.ia-content.internal to PE Infrastructure Agent

Running puppet on litmus-7ecb27cf5ba29302.c.ia-content.internal. Run 'puppet job show 15' to monitor detailed progress.
Overall job status: failed...
Error: Error during orchestrated puppet run on litmus-7ecb27cf5ba29302.c.ia-content.internal.
Warning: Could not fetch the run report from the orchestration service.
More information about this Puppet run (job ID 15) might be available in the run report stored on litmus-7ecb27cf5ba29302.c.ia-content.internal at /opt/puppetlabs/puppet/cache/state/last_run_report.yaml.
Error: Puppet failed on one or more nodes.
Status:       failed
Job type:     Puppet run
Start time:   06/22/2021 01:55:53 PM
Finish time:  06/22/2021 01:55:58 PM
Duration:     5 sec
User:         admin
Enforce env:  false
Run mode:     Enforcement
Nodes:        1

Target:
--nodes 'litmus-7ecb27cf5ba29302.c.ia-content.internal'

FAILED RUNS (1/1)
--------------------------------------------------------------------------
litmus-7ecb27cf5ba29302.c.ia-content.internal
    Error running puppet on litmus-7ecb27cf5ba29302.c.ia-content.internal: The Puppet run failed in an unexpected way

On the replica, a puppet run produces this:

[root@litmus-7ecb27cf5ba29302 ~]# puppet agent -t
Error: Unable to connect to server from server_list setting: Request to https://litmus-2129873944d29150.c.ia-content.internal:8140/status/v1/simple/master failed after 0.094 seconds: SSL_connect returned=1 errno=0 state=error: sslv3 alert certificate unknown
Wrapped exception:
SSL_connect returned=1 errno=0 state=error: sslv3 alert certificate unknown
Error: Could not run Puppet configuration client: Could not select a functional puppet server from server_list: 'litmus-2129873944d29150.c.ia-content.internal:8140'

The contents of puppet.conf:

[main]
server = litmus-2129873944d29150.c.ia-content.internal
certname = litmus-7ecb27cf5ba29302.c.ia-content.internal
# This file can be used to override the default puppet settings.
# See the following links for more details on what settings are available:
# - https://puppet.com/docs/puppet/latest/config_important_settings.html
# - https://puppet.com/docs/puppet/latest/config_about_settings.html
# - https://puppet.com/docs/puppet/latest/config_file_main.html
# - https://puppet.com/docs/puppet/latest/configuration.html
user = pe-puppet
group = pe-puppet
environment_timeout = unlimited
module_groups = base+pe_only

[agent]
server_list = litmus-2129873944d29150.c.ia-content.internal:8140

[master]
node_terminus = classifier
storeconfigs = true
storeconfigs_backend = puppetdb
reports = puppetdb
certname = litmus-7ecb27cf5ba29302.c.ia-content.internal
always_retry_plugins = false
disable_i18n = false
versioned_environment_dirs = true

@reidmv reidmv force-pushed the SOLARCH-432-peadm-plan-replace-missing-replica branch 2 times, most recently from e389779 to ee4d45c Compare June 24, 2021 23:50
@reidmv reidmv self-requested a review June 24, 2021 23:54
Copy link
Contributor

@reidmv reidmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is good to merge as-is. Squash on merge – no need for more than one commit for this in history.

  • It supports adding a replica to a Standard deployment, or to a Large deployment
  • It has not been validated for adding a replica to an Extra Large deployment
  • It is missing a classification update step, to ensure PEAdm's four node groups are correctly configured. This is not needed when replacing a failed replica with a new host of the same name, but will be needed for adding a replica to a cluster which did not have one at install time

I think we should park this where it's at, validated for replacing a missing a replica on Standard or Large clusters which previously had a working replica. Perhaps add a comment to the top of the file (Puppet Strings format?) indicating these limitations.

We should circle back to validating against Extra Large after we produce a peadm::add_postgresql plan. A ready-to-go PostgreSQL server to pair with is important to being able to reliably test the plan.

Then, we should come back and add a mechanism to ensure classification is up to date after running an add_ plan, whether for replica or for postgresql server. New ticket.

@timidri
Copy link
Contributor Author

timidri commented Jun 25, 2021

@reidmv I'll run last tests on Standard and Large - didn't see Large succeed yet.
How can we validate that the plan is only run in supported circumstances (not XL, existing certname for replica?)

@timidri
Copy link
Contributor Author

timidri commented Jun 25, 2021

@reidmv Also, if we don't support XL for now, should we remove the replica_postgresql_host parameter?

@timidri timidri marked this pull request as ready for review June 25, 2021 12:09
@timidri timidri requested a review from a team as a code owner June 25, 2021 12:09
@timidri timidri changed the title Add add_replica plan (WIP) Add add_replica plan Jun 25, 2021
@reidmv reidmv force-pushed the SOLARCH-432-peadm-plan-replace-missing-replica branch from 0cc5474 to 3e43b14 Compare June 25, 2021 16:25
reidmv and others added 5 commits June 25, 2021 09:31
This lets plans pass [] in addition to undef to peadm target inputs to
indicate no target. This is useful when building argument lists
automatically and passing forward to peadm plans.

Co-authored-by: Dimitri Tischenko <dimitri@puppet.com>
For the pending peadm::add_replica plan
Co-authored-by: Reid Vandewiele <reid@puppet.com>
This helps validate that the plan is syntactically valid, uses data
correctly, etc. Useful for potential refactors.

Co-authored-by: Dimitri Tischenko <dimitri@puppet.com>
Update the Github Actions workflow for peadm::add_replica to support
optional ssh-debugging and to use the latest LTS by default

Co-authored-by: Dimitri Tischenko <dimitri@puppet.com>
@reidmv reidmv force-pushed the SOLARCH-432-peadm-plan-replace-missing-replica branch from 3e43b14 to 8226dff Compare June 25, 2021 16:31
Copy link
Contributor

@reidmv reidmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased for clarity and extracted a discovered unrelated change to a new PR, #171.

LGTM! 👍

@reidmv reidmv merged commit 39c9f72 into main Jun 25, 2021
@reidmv reidmv deleted the SOLARCH-432-peadm-plan-replace-missing-replica branch June 25, 2021 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants