Skip to content

Commit

Permalink
Docs: Improve capitalization, style, and formatting in metrics (arang…
Browse files Browse the repository at this point in the history
…odb#18385)

* Docs: Improve capitalization, style, and formatting in metrics

* Spelling in DocuBlocks

* Apply suggestions from code review

* Update Documentation/Metrics/arangodb_scheduler_queue_time_violations_total.yaml

Co-authored-by: Jan <jsteemann@users.noreply.github.com>

---------

Co-authored-by: Paula Mihu <97217318+nerpaula@users.noreply.github.com>
Co-authored-by: Jan <jsteemann@users.noreply.github.com>
Co-authored-by: Vadim <vadim@arangodb.com>
  • Loading branch information
4 people authored Mar 27, 2023
1 parent 35e3037 commit e932e80
Show file tree
Hide file tree
Showing 116 changed files with 199 additions and 189 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ used only in the context of server monitoring.
@RESTRETURNCODE{200}
This API will return HTTP 200 in case the server is up and running and usable for
arbitrary operations, is not set to read-only mode and is currently not a follower
in case of an active failover setup.
in case of an Active Failover deployment setup.

@RESTRETURNCODE{503}
HTTP 503 will be returned in case the server is during startup or during shutdown,
is set to read-only mode or is currently a follower in an active failover setup.
is set to read-only mode or is currently a follower in an Active Failover deployment setup.

In addition, HTTP 503 will be returned in case the fill grade of the scheduler
queue exceeds the configured high-water mark (adjustable via startup option
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,20 @@ method other than `POST`, then an *HTTP 405 METHOD NOT ALLOWED* is returned.
};
@END_EXAMPLE_ARANGOSH_RUN

The result consists of a `list` object of hot backups by their `id`, where `id` uniquely identifies a specific hot backup, `version` depicts the version of ArangoDB, which was used to create any individual hot backup and `datetime` displays the time of creation of the hot backup. Further parameters are the size of the backup in bytes as `sizeInBytes`, the number of individual data files as `nrFiles`, the number of db servers at time of creation as `nrDBServers`, the number of backup parts, which are found on the currently reachable db servers as `nrPiecesPresent`. If the backup was created allowing inconsistences, it is so denoted as `potentiallyInconsistent`. The `available` boolean parameter is tightly connected to the backup to be present and ready to be restored on all db servers. It is `true` except, when the number of db servers currently reachable does not match to the number of db servers listed in the backup.
Should the backup be encrypted the sha256 hashes of the user secrets are published here. This will allow you to use the correct
user secret for the encryption-at-rest feature to be able to restore the backup.
The result consists of a `list` object of hot backups by their `id`, where `id`
uniquely identifies a specific hot backup, `version` depicts the version of
ArangoDB, which was used to create any individual hot backup and `datetime`
displays the time of creation of the hot backup. Further parameters are the size
of the backup in bytes as `sizeInBytes`, the number of individual data files as
`nrFiles`, the number of DB-Servers at time of creation as `nrDBServers`, the
number of backup parts, which are found on the currently reachable DB-Servers as
`nrPiecesPresent`. If the backup was created allowing inconsistences, it is so
denoted as `potentiallyInconsistent`. The `available` boolean parameter is
tightly connected to the backup to be present and ready to be restored on all
DB-Servers. It is `true` except, when the number of DB-Servers currently
reachable does not match to the number of DB-Servers listed in the backup.
Should the backup be encrypted, the SHA-256 hashes of the user secrets are
published here. This allows you to use the correct user secret for the
encryption at rest feature to be able to restore the backup.

@endDocuBlock
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ Queries the statistics of the given DB-Server
is returned when everything went well.

@RESTRETURNCODE{400}
the parameter DBserver was not given or is not the ID of a DB-Server
The `DBserver` parameter was not specified or is not the ID of a DB-Server.

@RESTRETURNCODE{403}
server is not a DB-Server.
The specified server is not a DB-Server.

@endDocuBlock
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,8 @@ Please note that keys are only guaranteed to be truly ascending in single
server deployments and for collections that only have a single shard (that includes
collections in a OneShard database).
The reason is that for collections with more than a single shard, document keys
are generated on coordinator(s). For collections with a single shard, the document
keys are generated on the leader DB server, which has full control over the key
are generated on Coordinator(s). For collections with a single shard, the document
keys are generated on the leader DB-Server, which has full control over the key
sequence.

@RESTSTRUCT{allowUserKeys,post_api_collection_opts,boolean,required,}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ The id of the batch.
@RESTDESCRIPTION
Deletes the existing dump batch, allowing compaction and cleanup to resume.

**Note**: on a Coordinator, this request must have the query parameter
*DBserver* which must be an ID of a DB-Server.
**Note**: on a Coordinator, this request must have a `DBserver`
query parameter which must be an ID of a DB-Server.
The very same request is forwarded synchronously to that DB-Server.
It is an error if this attribute is not bound in the Coordinator case.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ The response is a JSON object with the following attributes:
- *state*: additional leader state information (only present if the
`state` URL parameter was set to `true` in the request)

**Note**: on a Coordinator, this request must have the query parameter
*DBserver* which must be an ID of a DB-Server.
**Note**: on a Coordinator, this request must have a `DBserver`
query parameter which must be an ID of a DB-Server.
The very same request is forwarded synchronously to that DB-Server.
It is an error if this attribute is not bound in the Coordinator case.

Expand All @@ -38,7 +38,7 @@ It is an error if this attribute is not bound in the Coordinator case.
is returned if the batch was created successfully.

@RESTRETURNCODE{400}
is returned if the ttl value is invalid or if *DBserver* attribute
is returned if the TTL value is invalid or if the `DBserver` attribute
is not specified or illegal on a Coordinator.

@RESTRETURNCODE{405}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,8 @@ server, the following additional steps need to be carried out:
response will be empty and clients can go to sleep for a while and try again
later.

**Note**: on a Coordinator, this request must have the query parameter
*DBserver* which must be an ID of a DB-Server.
**Note**: on a Coordinator, this request must have a `DBserver`
query parameter which must be an ID of a DB-Server.
The very same request is forwarded synchronously to that DB-Server.
It is an error if this attribute is not bound in the Coordinator case.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ the provided ttl value.

If the batch's ttl can be extended successfully, the response is empty.

**Note**: on a Coordinator, this request must have the query parameter
*DBserver* which must be an ID of a DB-Server.
**Note**: on a Coordinator, this request must have a `DBserver`
query parameter which must be an ID of a DB-Server.
The very same request is forwarded synchronously to that DB-Server.
It is an error if this attribute is not bound in the Coordinator case.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ description: |
has registered.
This metric was named `arangodb_agency_cache_callback_count` in
previous versions of ArangoDB.
Note that on single servers this metrics will only have a non-zero value
in "active failover" deployment mode.
Note that on single servers this metric only has a non-zero value
in the Active Failover deployment mode.
threshold: |
This number will usually be very low, something like 2 or 3.
This number is usually very low, something like `2` or `3`.
troubleshoot: |
If this number is considerably higher, this should be investigated.
Please contact support.
6 changes: 3 additions & 3 deletions Documentation/Metrics/arangodb_agency_callback_number.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ description: |
registered, including Agency cache callbacks.
This metric was named `arangodb_agency_callback_count` in previous versions
of ArangoDB.
Note that on single servers this metrics will only have a non-zero value
in "active failover" deployment mode.
Note that on single servers this metric only has a non-zero value
in the Active Failover deployment mode.
threshold: |
This number will usually be very low, something like 2 or 3.
This number is usually very low, something like `2` or `3`.
troubleshoot: |
If this number is considerably higher, this should be investigated.
Please contact support.
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ exposedBy:
description: |
This metric was named `arangodb_agency_callback_registered` in previous versions
of ArangoDB.
Note that on single servers this metrics will only have a non-zero value
in "active failover" deployment mode.
Note that on single servers this metric only has a non-zero value
in the Active Failover deployment mode.
6 changes: 3 additions & 3 deletions Documentation/Metrics/arangodb_agency_log_size_bytes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ exposedBy:
- agent
description: |
Size of the Agency's in-memory part of replicated log in bytes.
The replicated log will grow in memory until a certain number of
log entries have been accumulated. Then the in-memory log will
be compacted. The number of in-memory log entries to keep before
The replicated log grows in memory until a certain number of
log entries have been accumulated. Then the in-memory log is
compacted. The number of in-memory log entries to keep before
log compaction kicks in can be controlled via the startup option
`--agency.compaction-keep-size`.
2 changes: 1 addition & 1 deletion Documentation/Metrics/arangodb_agency_read_ok_total.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ description: |
Number of Agency read operations which were successful (i.e. completed
without any error). Successful reads can only be executed on the leader, so
this metric is supposed to increase only on Agency leaders, but not on
followers. Read requests that are executed on followers will be rejected
followers. Read requests that are executed on followers are rejected
and can be tracked via the metric `arangodb_agency_read_no_leader_total`.
This metric was named `arangodb_agency_read_ok` in previous
versions of ArangoDB.
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@ exposedBy:
- agent
description: |
Agency supervision replication time histogram. Whenever the Agency
supervision carries out changes, it will write them to the leader's log
and replicate the changes to followers. This metric provides a histogram
supervision carries out changes, it writes them to the leader's log
and replicates the changes to followers. This metric provides a histogram
of the time it took to replicate the supervision changes to followers.
2 changes: 1 addition & 1 deletion Documentation/Metrics/arangodb_agency_write_ok_total.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ description: |
Number of Agency write operations which were successful (i.e. completed
without any error). Successful writes can only be executed on the leader, so
this metric is supposed to increase only on Agency leaders, but not on
followers. Write requests that are executed on followers will be rejected
followers. Write requests that are executed on followers are rejected
and can be tracked via the metric `arangodb_agency_write_no_leader_total`.
This metric was named `arangodb_agency_write_ok` in previous
versions of ArangoDB.
6 changes: 3 additions & 3 deletions Documentation/Metrics/arangodb_aql_global_memory_usage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ exposedBy:
description: |
Total memory usage of all AQL queries currently executing.
The granularity of this metric is steps of 32768 bytes. The current
memory usage of all AQL queries will be compared against the configured
memory usage of all AQL queries is compared against the configured
limit in the `--query.global-memory-limit` startup option.
If the startup option has a value of `0`, then no global memory limit
will be enforced. If the startup option has a non-zero value, queries
will be aborted once the total query memory usage goes above the configured
are enforced. If the startup option has a non-zero value, queries
are aborted once the total query memory usage goes above the configured
limit.
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ description: |
Total number of times the global query memory limit threshold was reached.
This can happen if all running AQL queries in total try to use more memory than
configured via the `--query.global-memory-limit` startup option.
Every time this counter will increase, an AQL query will have aborted with a
Every time this counter increases, an AQL query aborted with a
"resource limit exceeded" error.
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ description: |
a single query tried to allocate more memory than configured in the query's
`memoryLimit` attribute or the value configured via the startup option
`--query.memory-limit`.
Every time this counter will increase, an AQL query will have aborted with a
Every time this counter increases, an AQL query aborted with a
"resource limit exceeded" error.
2 changes: 1 addition & 1 deletion Documentation/Metrics/arangodb_aql_slow_query_time.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ description: |
Execution time histogram for slow AQL queries, in seconds.
Queries are considered "slow" if their execution time is above the
threshold configured in the startup options `--query.slow-threshold`
or `--query.slow-streaming-threshold`, resp.
or `--query.slow-streaming-threshold`, respectively.
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ exposedBy:
- single
description: |
Total amount of time it took to acquire collection/shard locks for
write operations, summed up for all collections/shards. Will not be increased
write operations, summed up for all collections/shards. Does not increase
for any read operations.
The value is measured in microseconds.
troubleshoot: |
In case this value is considered too high, check if there are AQL queries
or transactions that use exclusive locks on collections, and try to reduce them.
Operations using exclusive locks may lock out other queries/transactions temporarily,
which will lead to an increase in lock acquisition time.
which leads to an increase in lock acquisition time.
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ exposedBy:
- agent
- single
description: |
Histogram of the collection/shard lock acquisition times. Locks will be acquired for
Histogram of the collection/shard lock acquisition times. Locks are acquired for
all write operations, but not for read operations.
The values here are measured in seconds.
troubleshoot: |
In case these values are considered too high, check if there are AQL queries
or transactions that use exclusive locks on collections, and try to reduce them.
Operations using exclusive locks may lock out other queries/transactions temporarily,
which will lead to an increase in lock acquisition times.
which leads to an increase in lock acquisition times.
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ exposedBy:
- coordinator
description: |
Number of transactions using sequential locking of collections to avoid deadlocking.
By default, a Coordinator will try to lock all shards of a collection in parallel.
By default, a Coordinator tries to lock all shards of a collection in parallel.
This approach is normally fast but can cause deadlocks with other transactions that
lock the same shards in a different order. In case such a deadlock is detected, the
Coordinator will abort the lock round and start a new one that locks all shards in
sequential order. This will avoid deadlocks, but has a higher setup overhead.
Coordinator aborts the lock round and starts a new one that locks all shards in
sequential order. This avoids deadlocks, but has a higher setup overhead.
troubleshoot: |
In case this value is increasing, check if there are AQL queries or transactions that
use exclusive locks on collections, and try to reduce them.
Operations using exclusive locks may lock out other queries/transactions temporarily,
which will lead can lead to (temporary) deadlocks in case the queries/transactions
which can lead to (temporary) deadlocks in case the queries/transactions
are run on multiple shards on different servers.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ exposedBy:
- single
description: |
Number of timeouts when trying to acquire collection exclusive locks.
This counter will be increased whenever an exclusive collection lock
This counter increases whenever an exclusive collection lock
cannot be acquired within the configured lock timeout.
troubleshoot: |
In case this value is considered too high, check if there are AQL queries
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ exposedBy:
- single
description: |
Number of timeouts when trying to acquire collection write locks.
This counter will be increased whenever a collection write lock
This counter increases whenever a collection write lock
cannot be acquired within the configured lock timeout.
This can only happen if writes on a collection are locked out by
other operations on the collection that use an exclusive lock. Writes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ description: |
`ClusterComm`.
threshold: |
Because of idle timeouts, the total number of connections ever created
will grow. However, under high load, most connections should usually
grows. However, under high load, most connections should usually
be reused and a fast growth of this number can indicate underlying
connectivity issues.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ exposedBy:
- single
description: |
Total number of read-only transactions, which allow for dirty reads
(read from followers). This metric will only be collected for
(read from followers). This metric is only collected for
transactions on Coordinators in a cluster. Other instances may expose
the value as 0.
the value as `0`.
2 changes: 1 addition & 1 deletion Documentation/Metrics/arangodb_flush_subscriptions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ exposedBy:
- single
description: |
This metric exposes the number of currently active flush subscriptions.
Flush subscriptions can be created by arangosearch links and by background
Flush subscriptions can be created by `arangosearch` View links and by background
index creation.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ description: |
Servers in a cluster periodically send their heartbeats to
the Agency to report their own liveliness. This counter gets
increased whenever sending such a heartbeat fails. In the single
server, this counter is only used in active failover mode.
server, this counter is only used in the Active Failover deployment mode.
threshold: |
It is a bad sign for health if heartbeat transmissions fail. This can
lead to failover actions which are ultimately bad for the service.
Expand Down
3 changes: 2 additions & 1 deletion Documentation/Metrics/arangodb_heartbeat_send_time_msec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ exposedBy:
description: |
Histogram of times required to send heartbeats. For every heartbeat
sent the time is measured and an event is put into the histogram.
In the single server, this counter is only used in active failover mode.
In the single server, this counter is only used in the Active Failover
deployment mode.
threshold: |
It is a bad sign for health if heartbeat transmissions are not fast.
If there are heartbeats which frequently take longer than a few hundred
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ complexity: simple
exposedBy:
- dbserver
description: |
Database servers execute reconciliation actions to let the cluster converge
DB-Servers execute reconciliation actions to let the cluster converge
to the desired state. Actions are created, registered, queued and executed.
Once they are done they will eventually be removed.
Once they are done, they are eventually removed.
This metric counts the number of actions that are done and have been removed.
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ complexity: advanced
exposedBy:
- dbserver
description: |
Database servers execute reconciliation actions to let the cluster converge
DB-Servers execute reconciliation actions to let the cluster converge
to the desired state. Actions are created, registered, queued and executed.
Once they are done they will eventually be removed.
Once they are done, they are eventually removed.
This metric counts the number of actions that have been created but found to
be a duplicate of a already queued action.
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ complexity: simple
exposedBy:
- dbserver
description: |
Database servers execute reconciliation actions to let the cluster converge
DB-Servers execute reconciliation actions to let the cluster converge
to the desired state. Actions are created, registered, queued and executed.
Once they are done they will eventually be removed.
Once they are done, they are eventually removed.
Those action can fail for different reasons. This metric counts the failed
actions and can thus provide hints to investigate a malfunction.
Loading

0 comments on commit e932e80

Please sign in to comment.