Skip to content

Commit

Permalink
[New Scheduler] Run scheduler (#5194)
Browse files Browse the repository at this point in the history
* Add Akka-cluster dependency

* Update configurations to run the new scheduler.

* Add gRPC handlers for activations.

* Update Ansible scripts to run the new scheduler.

* Increase the queue creation request timeout.

* Add scheduler the ansible role.

* Fix typo.

* Change the loglevel config to logback's one.

* Change the topic name

* Remove unnecessary configs

* Add a guide how to deploy the new scheduler.

* Make ActorSystem for each test bind to a free port.
  • Loading branch information
style95 authored Feb 11, 2022
1 parent 8be1505 commit 5332e6d
Show file tree
Hide file tree
Showing 22 changed files with 698 additions and 33 deletions.
52 changes: 52 additions & 0 deletions ansible/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,58 @@ ansible-playbook -i environments/$ENVIRONMENT prereq.yml

**Hint:** During playbook execution the `TASK [prereq : check for pip]` can show as failed. This is normal if no pip is installed. The playbook will then move on and install pip on the target machines.

### [Optional] Enable the new scheduler

You can enable the new scheduler of OpenWhisk.
It will run one more component called "scheduler" and ETCD.

#### Configure service providers for the scheduler
You can update service providers for the scheduler as follows.

**common/scala/src/main/resources**
```
whisk.spi {
ArtifactStoreProvider = org.apache.openwhisk.core.database.CouchDbStoreProvider
ActivationStoreProvider = org.apache.openwhisk.core.database.ArtifactActivationStoreProvider
MessagingProvider = org.apache.openwhisk.connector.kafka.KafkaMessagingProvider
ContainerFactoryProvider = org.apache.openwhisk.core.containerpool.docker.DockerContainerFactoryProvider
LogStoreProvider = org.apache.openwhisk.core.containerpool.logging.DockerToActivationLogStoreProvider
LoadBalancerProvider = org.apache.openwhisk.core.loadBalancer.FPCPoolBalancer
EntitlementSpiProvider = org.apache.openwhisk.core.entitlement.FPCEntitlementProvider
AuthenticationDirectiveProvider = org.apache.openwhisk.core.controller.BasicAuthenticationDirective
InvokerProvider = org.apache.openwhisk.core.invoker.FPCInvokerReactive
InvokerServerProvider = org.apache.openwhisk.core.invoker.FPCInvokerServer
DurationCheckerProvider = org.apache.openwhisk.core.scheduler.queue.ElasticSearchDurationCheckerProvider
}
.
.
.
```

#### Enable the scheduler
- Make sure you enable the scheduler by configuring `scheduler_enable`.

**ansible/environments/local/group_vars**
```yaml
scheduler_enable: true
```
#### [Optional] Enable ElasticSearch Activation Store
When you use the new scheduler, it is recommended to use ElasticSearch as an activation store.
**ansible/environments/local/group_vars**
```yaml
db_activation_backend: ElasticSearch
elastic_cluster_name: <your elasticsearch cluster name>
elastic_protocol: <your elasticsearch protocol>
elastic_index_pattern: <your elasticsearch index pattern>
elastic_base_volume: <your elasticsearch volume directory>
elastic_username: <your elasticsearch username>
elastic_password: <your elasticsearch username>
```
You can also refer to this guide to [deploy OpenWhisk using ElasticSearch](https://github.com/apache/openwhisk/blob/master/ansible/README.md#using-elasticsearch-to-store-activations).
### Deploying Using CouchDB
- Make sure your `db_local.ini` file is [setup for](#setup) CouchDB then execute:

Expand Down
6 changes: 6 additions & 0 deletions ansible/environments/local/hosts.j2.ini
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ invoker0 ansible_host=172.17.0.1 ansible_connection=local
invoker1 ansible_host=172.17.0.1 ansible_connection=local
{% endif %}

[schedulers]
scheduler0 ansible_host=172.17.0.1 ansible_connection=local
{% if mode is defined and 'HA' in mode %}
scheduler1 ansible_host=172.17.0.1 ansible_connection=local
{% endif %}

; db group is only used if db.provider is CouchDB
[db]
172.17.0.1 ansible_host=172.17.0.1 ansible_connection=local
Expand Down
62 changes: 60 additions & 2 deletions ansible/group_vars/all
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ jmx:
rmiBasePortController: 16000
basePortInvoker: 17000
rmiBasePortInvoker: 18000
basePortScheduler: 21000
rmiBasePortScheduler: 22000
user: "{{ jmxuser | default('jmxuser') }}"
pass: "{{ jmxuser | default('jmxpass') }}"
jvmCommonArgs: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.password.file=/home/owuser/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/home/owuser/jmxremote.access"
Expand Down Expand Up @@ -221,6 +223,8 @@ invoker:
keystore:
password: "{{ invoker_keystore_password | default('openwhisk') }}"
name: "{{ __invoker_ssl_keyPrefix }}openwhisk-keystore.p12"
container:
creationMaxPeek: "{{ container_creation_max_peek | default(500) }}"
reactiveSpi: "{{ invokerReactive_spi | default('') }}"
serverSpi: "{{ invokerServer_spi | default('') }}"

Expand Down Expand Up @@ -278,6 +282,9 @@ db:
invoker:
user: "{{ db_invoker_user | default(lookup('ini', 'db_username section=invoker file={{ playbook_dir }}/db_local.ini')) }}"
pass: "{{ db_invoker_pass | default(lookup('ini', 'db_password section=invoker file={{ playbook_dir }}/db_local.ini')) }}"
scheduler:
user: "{{ db_scheduler_user | default(lookup('ini', 'db_username section=scheduler file={{ playbook_dir }}/db_local.ini')) }}"
pass: "{{ db_scheduler_pass | default(lookup('ini', 'db_password section=scheduler file={{ playbook_dir }}/db_local.ini')) }}"
artifact_store:
backend: "{{ db_artifact_backend | default('CouchDB') }}"
activation_store:
Expand Down Expand Up @@ -435,8 +442,9 @@ metrics:

user_events: "{{ user_events_enabled | default(false) | lower }}"

durationChecker:
timeWindow: "{{ duration_checker_time_window | default('1 d') }}"
zeroDowntimeDeployment:
enabled: "{{ zerodowntime_deployment_switch | default(true) }}"
solution: "{{ zerodowntime_deployment_solution | default('apicall') }}"

etcd:
version: "{{ etcd_version | default('v3.4.0') }}"
Expand All @@ -463,13 +471,63 @@ etcd_connect_string: "{% set ret = [] %}\
{% endfor %}\
{{ ret | join(',') }}"


__scheduler_blackbox_fraction: 0.10

watcher:
eventNotificationDelayMs: "{{ watcher_notification_delay | default('5000 ms') }}"

durationChecker:
timeWindow: "{{ duration_checker_time_window | default('1 d') }}"

enable_scheduler: "{{ scheduler_enable | default(false) }}"

scheduler:
protocol: "{{ scheduler_protocol | default('http') }}"
dir:
become: "{{ scheduler_dir_become | default(false) }}"
confdir: "{{ config_root_dir }}/scheduler"
basePort: 14001
grpc:
basePort: 13001
tls: "{{ scheduler_grpc_tls | default(false) }}"
maxPeek: "{{ scheduler_max_peek | default(128) }}"
heap: "{{ scheduler_heap | default('2g') }}"
arguments: "{{ scheduler_arguments | default('') }}"
instances: "{{ groups['schedulers'] | length }}"
username: "{{ scheduler_username | default('scheduler.user') }}"
password: "{{ scheduler_password | default('scheduler.pass') }}"
akka:
provider: cluster
cluster:
basePort: 25520
host: "{{ groups['schedulers'] | map('extract', hostvars, 'ansible_host') | list }}"
bindPort: 3551
# at this moment all schedulers are seed nodes
seedNodes: "{{ groups['schedulers'] | map('extract', hostvars, 'ansible_host') | list }}"
loglevel: "{{ scheduler_loglevel | default(whisk_loglevel) | default('INFO') }}"
extraEnv: "{{ scheduler_extraEnv | default({}) }}"
dataManagementService:
retryInterval: "{{ scheduler_dataManagementService_retryInterval | default('1 second') }}"
inProgressJobRetentionSecond: "{{ scheduler_inProgressJobRetentionSecond | default('20 seconds') }}"
managedFraction: "{{ scheduler_managed_fraction | default(1.0 - (scheduler_blackbox_fraction | default(__scheduler_blackbox_fraction))) }}"
blackboxFraction: "{{ scheduler_blackbox_fraction | default(__scheduler_blackbox_fraction) }}"
queueManager:
maxSchedulingTime: "{{ scheduler_maxSchedulingTime | default('20 second') }}"
maxRetriesToGetQueue: "{{ scheduler_maxRetriesToGetQueue | default(13) }}"
queue:
# the queue's state Running timeout, e.g. if have no activation comes into queue when Running, the queue state will be changed from Running to Idle and delete the decision algorithm actor
idleGrace: "{{ scheduler_queue_idleGrace | default('20 seconds') }}"
# the queue's state Idle timeout, e.g. if have no activation comes into queue when Idle, the queue state will be changed from Idle to Removed
stopGrace: "{{ scheduler_queue_stopGrace | default('20 seconds') }}"
# the queue's state Paused timeout, e.g. if have no activation comes into queue when Paused, the queue state will be changed from Paused to Removed
flushGrace: "{{ scheduler_queue_flushGrace | default('60 seconds') }}"
gracefulShutdownTimeout: "{{ scheduler_queue_gracefulShutdownTimeout | default('5 seconds') }}"
maxRetentionSize: "{{ scheduler_queue_maxRetentionSize | default(10000) }}"
maxRetentionMs: "{{ scheduler_queue_maxRetentionMs | default(60000) }}"
maxBlackboxRetentionMs: "{{ scheduler_queue_maxBlackboxRetentionMs | default(300000) }}"
throttlingFraction: "{{ scheduler_queue_throttlingFraction | default(0.9) }}"
durationBufferSize: "{{ scheduler_queue_durationBufferSize | default(10) }}"
deployment_ignore_error: "{{ scheduler_deployment_ignore_error | default('False') }}"
dataManagementService:
retryInterval: "{{ scheduler_dataManagementService_retryInterval | default('1 second') }}"
5 changes: 5 additions & 0 deletions ansible/openwhisk.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,17 @@
# playbook (currently cloudant.yml or couchdb.yml).
# It assumes that wipe.yml have being deployed at least once.

- import_playbook: etcd.yml
when: enable_scheduler

- import_playbook: kafka.yml
when: not lean

- import_playbook: controller.yml

- import_playbook: scheduler.yml
when: enable_scheduler

- import_playbook: invoker.yml
when: not lean

Expand Down
15 changes: 15 additions & 0 deletions ansible/roles/controller/tasks/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,21 @@
env: "{{ env | combine(mongodb_env) }}"
when: db.artifact_store.backend == "MongoDB"

- name: setup scheduler env
set_fact:
scheduler_env:
"CONFIG_whisk_etcd_hosts": "{{ etcd_connect_string }}"
"CONFIG_whisk_etcd_lease_timeout": "{{ etcd.lease.timeout }}"
"CONFIG_whisk_etcd_pool_threads": "{{ etcd.pool_threads }}"
"CONFIG_whisk_scheduler_grpc_tls": "{{ scheduler.grpc.tls | default('false') | lower }}"
"CONFIG_whisk_scheduler_maxPeek": "{{ scheduler.maxPeek }}"
when: enable_scheduler

- name: merge scheduler env
set_fact:
env: "{{ env | combine(scheduler_env) }}"
when: enable_scheduler

- name: populate volumes for controller
set_fact:
controller_volumes:
Expand Down
15 changes: 15 additions & 0 deletions ansible/roles/invoker/tasks/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,21 @@
env: "{{ env | combine(mongodb_env) }}"
when: db.artifact_store.backend == "MongoDB"

- name: setup scheduler env
set_fact:
scheduler_env:
"CONFIG_whisk_etcd_hosts": "{{ etcd_connect_string }}"
"CONFIG_whisk_etcd_lease_timeout": "{{ etcd.lease.timeout }}"
"CONFIG_whisk_etcd_pool_threads": "{{ etcd.pool_threads }}"
"CONFIG_whisk_scheduler_dataManagementService_retryInterval": "{{ scheduler.dataManagementService.retryInterval }}"
"CONFIG_whisk_invoker_containerCreation_maxPeek": "{{ invoker.container.creationMaxPeek }}"
when: enable_scheduler

- name: merge scheduler env
set_fact:
env: "{{ env | combine(scheduler_env) }}"
when: enable_scheduler

- name: include plugins
include_tasks: "{{ inv_item }}.yml"
with_items: "{{ invoker_plugins | default([]) }}"
Expand Down
24 changes: 24 additions & 0 deletions ansible/roles/schedulers/tasks/clean.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
# Remove scheduler containers.

- name: get scheduler name
set_fact:
scheduler_name: "{{ name_prefix ~ host_group.index(inventory_hostname) }}"

- name: remove scheduler
docker_container:
name: "{{ scheduler_name }}"
state: absent
ignore_errors: "True"

- name: remove scheduler log directory
file:
path: "{{ whisk_logs_dir }}/{{ scheduler_name }}"
state: absent
become: "{{ logs.dir.become }}"

- name: remove scheduler conf directory
file:
path: "{{ scheduler.confdir }}/{{ scheduler_name }}"
state: absent
become: "{{ scheduler.dir.become }}"
Loading

0 comments on commit 5332e6d

Please sign in to comment.