Skip to content

Commit

Permalink
Docs update (#3147)
Browse files Browse the repository at this point in the history
* Minor fixes of existing docs (Billing, Notifications)

* (Issue #3059) 'Data access audit' docs and RN were added

* (Issue #3122) Docs 'Runs shifting in case of Insufficient capacity' were added

* (Issue #3131) System Jobs docs and RN were added

* 'Cluster run usage' docs and RN were added

* (Issue #3098) 'Runs cost layers in Billing' docs and RN were added

* (Issue #3098) 'Runs cost layers in Billing' docs minor update
  • Loading branch information
NShaforostov authored Apr 24, 2023
1 parent 3940cba commit 865f3da
Show file tree
Hide file tree
Showing 90 changed files with 456 additions and 32 deletions.
42 changes: 40 additions & 2 deletions docs/md/manual/11_Manage_Runs/11._Manage_Runs.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@
- [Completed cluster runs](#completed-cluster-runs)
- [Run information page](#run-information-page)
- [General information](#general-information)
- [Nested runs](#nested-runs)
- [Cluster run usage](#cluster-run-usage)
- [Maintenance](#maintenance)
- [Instance](#instance)
- [Parameters](#parameters)
- [Tasks](#tasks)
Expand Down Expand Up @@ -169,14 +172,49 @@ This section displays general information about a run:

**Nested runs** list is displaying only for master runs.
It is the list with short informations about cluster child-runs:
![CP_v.0.15_ReleaseNotes](../../release_notes/v.0.15/attachments/RN015_NestedRunsIcons_1.png)
![CP_ManageRuns](attachments/ManageRuns_45.png)

Each child-run record contains:

- State icons with help tooltips when hovering over them
- Pipeline name and version or docker image and version
- Run time duration

Similar as a parent-run state, states for nested runs are automatically updated without page refreshing. To open any child-run log page - click its name in the list.
Similar as a parent-run state, states for nested runs are automatically updated without page refreshing. To open any child-run logs page - click its name in the list.

If there are several nested runs, only the first couple are displayed at the parent-run logs page.
To view all nested runs - click the corresponding hyperlink:
![CP_ManageRuns](attachments/ManageRuns_46.png)
The full list of nested runs for the selected parent-run will be opened, e.g.:
![CP_ManageRuns](attachments/ManageRuns_47.png)

##### Cluster run usage

User can view the cluster usage at the parent-run logs page - near the **Nested runs** label, number of nested runs active at the moment is displayed:
![CP_ManageRuns](attachments/ManageRuns_48.png)

> If the cluster run was completed - here, the summary number of nested runs launched during the cluster run is displayed, e.g.:
> ![CP_ManageRuns](attachments/ManageRuns_53.png)
Also, user can view how the cluster usage has been changing during the run of this cluster.
This is especially useful information for auto-scaled clusters, as the number of worker nodes in such clusters can vary greatly over time.

To view the cluster usage - click the corresponding hyperlink near the number of active nested runs:
![CP_ManageRuns](attachments/ManageRuns_49.png)
The chart pop-up will be opened, e.g.:
![CP_ManageRuns](attachments/ManageRuns_50.png)

The chart shows a cluster usage - number of all active instances (including the master node) of the current cluster over time.

To view details - **hover** over the chart point, e.g.:
![CP_ManageRuns](attachments/ManageRuns_51.png)
In this case, summary info about the number of active cluster instances in a specific moment will be displayed.

To view which runs exactly were active in the cluster (including the master node) in a specific moment - **click** the chart point, e.g.:
![CP_ManageRuns](attachments/ManageRuns_52.png)
You can click any run ID in such a tooltip - the corresponding run's logs page will be opened.

> Please note, the cluster usage chart is available for completed cluster runs as well.
#### Maintenance

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,7 @@ Settings in this tab contains default Launch parameters:
| **`launch.container.cpu.resource`** | |
| **`launch.container.memory.resource.policy`** | |
| **`launch.container.memory.resource.request`** | |
| **`launch.insufficient.capacity.message`** | Defines the text displayed in the run logs in case when an instance requested by the user is missing due to `InsufficientInstanceCapacity` error (means insufficient capacity in the selected Cloud Region) |
| **`launch.run.visibility`** | Allow to view foreign runs based on pipeline permissions (value `INHERIT`) or restrict visibility of all non-owner runs (value `OWNER`) |
| **`launch.dind.enable`** | Enables Docker in Docker functionality |
| **`launch.dind.container.vars`** | Allows to specify the variables, which will be passed to the DIND container (if they are set for the host environment) |
Expand Down Expand Up @@ -349,6 +350,16 @@ The settings in this tab contain parameters and actions that are performed depen
| **`system.external.services.endpoints`** | |
| **`system.log.line.limit`** | |

### System Jobs

Here settings for [System jobs](12.15._System_jobs.md) can be found:

| Setting name | Description |
|---|---|
| **`system.jobs.pipeline.id`** | The ID of the prepared system pipeline that contains system jobs scripts |
| **`system.jobs.scripts.location`** | The path to the system scripts directory inside the pipeline code. Default value is `src/system-jobs` |
| **`system.jobs.output.pipeline.task`** | The name of the task at the **Run logs** page of the system pipeline that is launched for the system job. Task contains system job output results. Default value is `SystemJob` |

### User Interface

Here different user interface settings can be found:
Expand Down
43 changes: 43 additions & 0 deletions docs/md/manual/12_Manage_Settings/12.11._Advanced_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
- [Setup swap files for the Cloud VMs](#setup-swap-files-for-the-cloud-vms)
- [Home storage for each user](#home-storage-for-each-user)
- [Seamless authentication in the Cloud Provider](#seamless-authentication-in-cloud-provider)
- [Switching of regions for jobs in case of insufficient capacity](#switching-of-cloud-regions-for-launched-jobs-in-case-of-insufficient-capacity)

> User shall have **ROLE\_ADMIN** to configure system-level settings.
Expand Down Expand Up @@ -332,3 +333,45 @@ In the example below, we will create the Profile with Read/Write access to `AWS`
![CP_AdvancedFeatures](attachments/AdvancedFeatures_37.png)
22. But if you try to get access to the existing object on which the policy (specified at step 7) doesn't allow access - you will be rejected, e.g.:
![CP_AdvancedFeatures](attachments/AdvancedFeatures_38.png)

***

## Switching of Cloud Regions for launched jobs in case of insufficient capacity

> Please note this functionality is currently available only for `AWS`
If there are not enough instances of specified type to launch a run in one region - Cloud Pipeline can automatically try to launch identical instance in other region(s) of the same Cloud Provider.

This behaviour is defined by the special Cloud Region setting - "**Run shift policy**":
![CP_AdvancedFeatures](attachments/AdvancedFeatures_39.png)

The switching region procedure looks like:

1. User launches a job.
If during the run initialization, an instance requested by user is missing due to `InsufficientInstanceCapacity` error (that means run failed with insufficient capacity in the selected region) - next steps below will be performed:
![CP_AdvancedFeatures](attachments/AdvancedFeatures_40.png)
**_Note_**: the displayed text for this error can be configured by admin via the System preference **`launch.insufficient.capacity.message`**
![CP_AdvancedFeatures](attachments/AdvancedFeatures_47.png)
2. Possibility to switch the current region is checking - option "**Run shift policy**" shall be previously enabled:
![CP_AdvancedFeatures](attachments/AdvancedFeatures_41.png)
3. Possibility to switch to any vacant region from the same Cloud Provider is checking - option "**Run shift policy**" shall be previously enabled for the vacant region, e.g.:
![CP_AdvancedFeatures](attachments/AdvancedFeatures_42.png)
4. Current run is being automatically stopped. `InsufficientInstanceCapacity` error is displayed at the **Run logs** page as the failure reason:
![CP_AdvancedFeatures](attachments/AdvancedFeatures_43.png)
5. A new run is being automatically launched - in the vacant Cloud Region. You can view info about that new run in the tile at the **Run logs** page of the original run.
Also, the task `RestartPipelineRun` appears for the original run - in its logs, the information about shifting run is displayed as well:
![CP_AdvancedFeatures](attachments/AdvancedFeatures_44.png)
6. At the **Run logs** page of the switched (new) run, there is also a link to the original run:
![CP_AdvancedFeatures](attachments/AdvancedFeatures_45.png)
7. If a new instance is not available with a new region - steps 1-5 will be performed in one more region as long as there are regions of the same Cloud Provider with the enabled option "**Run shift policy**".

Restrictions of this feature:

- available only for on-demand runs
- available only for runs that have not any Cloud dependent parameters (parameter is Cloud dependent if it, for example, contains some storage path)
- not supported for worker or cluster runs

> For a run that does not meet these restrictions - in case of `InsufficientInstanceCapacity` error, original run will be just terminated during the region's shifting process initialization.
> New run in any other region will not be launched. The reason
of the failure will be shown in the `RestartPipelineRun` task logs, e.g. for a cluster run shifting attempt:
> ![CP_AdvancedFeatures](attachments/AdvancedFeatures_46.png)
20 changes: 18 additions & 2 deletions docs/md/manual/12_Manage_Settings/12.12._System_logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,24 @@ Multi-select is supported.

#### Type filter

To restrict the list of logs for a certain log message type(s) - use the **Type** control. You may select the desired type from the dropdown list.
**_Note_**: currently, only `security` type is available.
To restrict the list of logs for a certain log message type(s) - use the **Type** control.
You may select the desired type from the dropdown list. Multi-select is supported.

The following types can be here:
![CP_SystemLogs](attachments/SystemLogs_10.png)

- `security` - logs related to security events - e.g. authentication in the Platform/services or access to objects (granting permissions)
![CP_SystemLogs](attachments/SystemLogs_9.png)
- `audit` - logs related to any access to the data stored in the object storages. All operations (_READ_/_WRITE_/_DELETE_) excluding only listing are being logged.
The following sources are being logged:
- data access operations from the Platform GUI. These logs are accumulated from services `api-srv` or `gui`. Operations may be like these - via the GUI in an Object storage, user opened a file-preview, user created a new file, user deleted a file, etc.:
![CP_SystemLogs](attachments/SystemLogs_11.png)
- data access operations from the `pipe` CLI. These logs are accumulated from the `pipe-cli` service. Operations may be like these - via the `pipe` CLI in a console or web SSH-terminal, user opened a file content from an Object storage, user copied/moved a file to an Object storage, etc.:
![CP_SystemLogs](attachments/SystemLogs_12.png)
- data access operations performed in mounted Object storages - all Object storages that mounted via the Platform for using in runs (`~/cloud-data` folder in each run) or storages that were mounted manually by user via [`pipe storage mount`](../14_CLI/14.3._Manage_Storage_via_CLI.md#mount-a-storage) command. These logs are accumulated from the `pipe-mount` service. Operations may be like these - user mounted an Object storage as a folder and uploaded a file into the mounted folder, user opened the web SSH-terminal and read a file content from one of the Object storages mounted into `~/cloud-data` folder, etc.:
![CP_SystemLogs](attachments/SystemLogs_13.png)
- `storage lifecycle` - logs related to management events with [storage lifecycle rules](../08_Manage_Data_Storage/8.10._Storage_lifecycle.md#create-transition-rule) (_CREATE_/_EDIT_/_DELETE_ operations for rules), e.g.:
![CP_SystemLogs](attachments/SystemLogs_14.png)

#### Show service account events

Expand Down
129 changes: 129 additions & 0 deletions docs/md/manual/12_Manage_Settings/12.15._System_jobs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# 12.15. System jobs

> User shall have **ROLE\_ADMIN** to any access to the System jobs.
- [Overview](#overview)
- [Configuration](#configuration)
- [Create system job](#create-a-new-system-job)
- [System jobs panel](#system-jobs-panel)
- [Run system job and view results](#run-a-script-and-view-results)

System jobs allow admins to create and easily launch system scripts for different needs:

- to get some system statistics or system information about the current Platform state (for example, collect information about all storages that have specific size, list all unattached EBS volumes, set some s3 bucket policy to all storages, etc.)
- to create some sort of automation scripts - with help of `Kubectl`, `pipe` CLI, Cloud Pipeline API, Cloud CLI

## Overview

System jobs solution uses the existing Cloud-Pipeline infrastructure, to reduce number of preparation steps to be done to get desire output.
In a nutshell, the approach to perform the System jobs is the following:

1. There are:
- prepared system pipeline that contains system jobs scripts. Admin can add new scripts or edit/delete existing ones. Also, pipeline config contains:
- `Kubernetes` service account to perform `kubectl` commands from such pipeline during the system job run
- special assign policy that allows to assign the pipeline to one of the running system node (`MASTER` node, for example). It is convenient as no additional instances (waiting or initializing ones) are required to perform a job
- prepared special docker image that includes pre-installed packages such as system packages (`curl`, `nano`, `git`, etc.), `kubectl`, `pipe` CLI, Cloud CLI (`AWS`/`Azure`/`GCP`), `LustreFS` client
2. When admin launches a system job - the system instance according to specified assign policy (`MASTER` instance, by default) is found for the system job pipeline performing
3. At the selected system instance, the docker-container is launched from the special docker-image for system jobs
4. In the launched docker-container, the system job script is being performed

### Configuration

The following System Preferences are currently used to configure the System jobs behaviour:

- **`system.jobs.pipeline.id`** - ID of the prepared system pipeline that contains system jobs scripts
- **`system.jobs.scripts.location`** - path to the system scripts directory inside the pipeline repo. Default value is `src/system-jobs`
- **`system.jobs.output.pipeline.task`** - name of the task at the **Run logs** page of the system pipeline that is launched for the system job. Task contains system job output results. Default value is `SystemJob`

For example:
![CP_SystemJobs](attachments/SystemJobs_16.png)

### Create a new System job

To create a new System job, admin shall:

1. Open the pipeline defined in the [**`system.jobs.pipeline.id`**](#configuration) System Preference, e.g.:
![CP_SystemJobs](attachments/SystemJobs_17.png)
2. Open the **CODE** tab inside the pipeline:
![CP_SystemJobs](attachments/SystemJobs_18.png)
Navigate to the folder defined as the folder for system jobs scripts (specified via [**`system.jobs.scripts.location`**](#configuration) System Preference)
3. In the opened folder, you can view previously created scripts:
![CP_SystemJobs](attachments/SystemJobs_19.png)
4. Add a new system script - you can create it manually or upload the existing one from the local workstation.
We will create a new script manually.
Click the "**+ NEW FILE**" button:
![CP_SystemJobs](attachments/SystemJobs_20.png)
5. In the appeared pop-up, specify a new script name and commit message for the pipeline changes (_optionally_).
For our example, we will create a simple bash script that lists `s3` object storages using AWS Cloud CLI or outputs storage content if the storage name was specified as a parameter. So, the script will be called `storages_listing`:
![CP_SystemJobs](attachments/SystemJobs_21.png)
Click the **OK** button to confirm.
6. Click the just-created file to edit it:
![CP_SystemJobs](attachments/SystemJobs_22.png)
7. In the appeared pop-up, specify the script itself and save changes:
![CP_SystemJobs](attachments/SystemJobs_23.png)
Specify a commit message, e.g.:
![CP_SystemJobs](attachments/SystemJobs_24.png)
8. New system script is created:
![CP_SystemJobs](attachments/SystemJobs_25.png)

## System jobs panel

Entry point for the usage of existing System script jobs is the **System jobs** subtab of the **SYSTEM MANAGEMENT** tab in the System settings:
![CP_SystemJobs](attachments/SystemJobs_01.png)

Here, admin can view the whole list of stored system scripts, select any script and launch it or observe script's runs history.

This panel contains:
![CP_SystemJobs](attachments/SystemJobs_02.png)

- **a** - list of all stored scripts. Click any to select it
- **b** - selected script name
- **c** - list of selected script's runs history
- **d** - button to refresh runs history
- **e** - button to launch the script:
- click the button itself to launch the script as is
- click **v** button near to launch the script with specifying parameters
![CP_SystemJobs](attachments/SystemJobs_03.png)
- **f** - view output logs of the script's specific launch
- **g** - view run's details that used for the script launch

### Run a script and view results

> For our example, we will use a simple bash script created in the section [above](#create-a-new-system-job).
To run a script from the **System jobs** panel:

1. Click the script in the list.
2. Click the **LAUNCH** button to perform a script as is (without parameters), e.g.:
![CP_SystemJobs](attachments/SystemJobs_04.png)
3. Just-launched script run will appear in the runs history:
![CP_SystemJobs](attachments/SystemJobs_05.png)
Jobs states are similar to [pipelines states](../06_Manage_Pipeline/6._Manage_Pipeline.md#pipeline-runs-states).
4. Once the script is performed, the state will be changed to **Success**:
![CP_SystemJobs](attachments/SystemJobs_06.png)
5. Click the script run's row or the button **LOG** to view the script performing output:
![CP_SystemJobs](attachments/SystemJobs_07.png)
6. Script logs will appear:
![CP_SystemJobs](attachments/SystemJobs_08.png)
7. If needed, you may download these logs as a text file by click the corresponding button - **DOWNLOAD**:
![CP_SystemJobs](attachments/SystemJobs_09.png)

To run a script with parameters:

1. Click the script in the list.
2. Click the **v** button near the **LAUNCH** and select the item **Launch with parameters**, e.g.:
![CP_SystemJobs](attachments/SystemJobs_10.png)
3. In the appeared form, specify parameters for the script separated by spaces (in the format `<parameter_1> <parameter_2> ...`) and click the **LAUNCH** button:
![CP_SystemJobs](attachments/SystemJobs_11.png)
4. Just-launched script will appear in the runs history:
![CP_SystemJobs](attachments/SystemJobs_12.png)
5. Script performing logs can be viewed in the same way as was described in the example above.

To view run's details that was used for the script launch:

1. Click the **DETAILS** button in the script run's row:
![CP_SystemJobs](attachments/SystemJobs_13.png)
2. The **Run logs** page will be opened:
![CP_SystemJobs](attachments/SystemJobs_14.png)
3. Here you can also view system job results - click the task `SystemJob` (default name, it can be changed via the System Preference [**`system.jobs.output.pipeline.task`**](#configuration)):
![CP_SystemJobs](attachments/SystemJobs_15.png)
Loading

0 comments on commit 865f3da

Please sign in to comment.