Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fixes for metrics #13807

Merged
merged 7 commits into from
Mar 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
devel
-----

* Fix shortName labels in metrics, in particular for agents.

* Fix a race in LogAppender::haveAppenders.
`haveAppenders` is called as part of audit logging. It accesses internal maps
but previously did not hold a lock while doing so.
Expand Down
6 changes: 3 additions & 3 deletions Documentation/Metrics/allMetrics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -655,7 +655,7 @@
temporarily, \nwhich will lead to an increase in lock acquisition times.\n"
type: histogram
unit: s
- category: Transaction
- category: Transactions
complexity: advanced
description: 'Number of transactions using sequential locking of collections to
avoid deadlocking.
Expand Down Expand Up @@ -687,7 +687,7 @@
run on multiple shards on different servers.\n"
type: counter
unit: number
- category: Transaction
- category: Transactions
complexity: medium
description: 'Number of timeouts when trying to acquire collection exclusive locks.

Expand All @@ -711,7 +711,7 @@
for the same locks.\n"
type: counter
unit: number
- category: Transaction
- category: Transactions
complexity: medium
description: "Number of timeouts when trying to acquire collection write locks.\nThis
counter will be increased whenever a collection write lock\ncannot be acquired
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ help: |
Number of transactions using sequential locking of collections to avoid deadlocking.
unit: number
type: counter
category: Transaction
category: Transactions
complexity: advanced
exposedBy:
- coordinator
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ help: |
Number of timeouts when trying to acquire collection exclusive locks.
unit: number
type: counter
category: Transaction
category: Transactions
complexity: medium
exposedBy:
- dbserver
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ help: |
Number of timeouts when trying to acquire collection write locks.
unit: number
type: counter
category: Transaction
category: Transactions
complexity: medium
exposedBy:
- dbserver
Expand Down
1 change: 1 addition & 0 deletions arangod/Agency/State.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1049,6 +1049,7 @@ bool State::loadOrPersistConfiguration() {
}
}
_agent->id(uuid);
ServerState::instance()->setId(uuid);

auto ctx = std::make_shared<transaction::StandaloneContext>(*_vocbase);
SingleCollectionTransaction trx(ctx, "configuration", AccessMode::Type::WRITE);
Expand Down
10 changes: 8 additions & 2 deletions arangod/Cluster/ServerState.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -789,10 +789,16 @@ bool ServerState::registerAtAgencyPhase1(AgencyComm& comm, ServerState::RoleEnum
}

std::string ServerState::getShortName() const {
if (_role == ROLE_AGENT) {
return getId().substr(0, 13);
}
std::stringstream ss; // ShortName
auto num = getShortId();
size_t width = std::max(std::to_string(num + 1).size(), static_cast<size_t>(4));
ss << roleToAgencyKey(getRole()) << std::setw(width) << std::setfill('0') << num + 1;
if (num == 0) {
return std::string{}; // not yet known
}
size_t width = std::max(std::to_string(num).size(), static_cast<size_t>(4));
ss << roleToAgencyKey(getRole()) << std::setw(width) << std::setfill('0') << num;
return ss.str();
}

Expand Down
11 changes: 9 additions & 2 deletions arangod/RestServer/MetricsFeature.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -456,8 +456,15 @@ void MetricsFeature::toPrometheus(std::string& result, bool v2) const {

std::lock_guard<std::recursive_mutex> guard(_lock);
if (_globalLabels.find("shortname") == _globalLabels.end()) {
_globalLabels.try_emplace("shortname", ServerState::instance()->getShortName());
changed = true;
std::string shortName = ServerState::instance()->getShortName();
// Very early after a server start it is possible that the
// short name is not yet known. This check here is to prevent
// that the label is permanently empty if metrics are requested
// too early.
if (!shortName.empty()) {
_globalLabels.try_emplace("shortname", shortName);
changed = true;
}
}
if (_globalLabels.find("role") == _globalLabels.end() &&
ServerState::instance() != nullptr &&
Expand Down
10 changes: 10 additions & 0 deletions utils/generateAllMetricsDocumentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@

import os, re, sys

# Some data:
categoryNames = ["Health", "AQL", "Transactions", "Foxx", "Pregel", \
"Statistics", "Replication", "Disk", "Errors", \
"RocksDB", "Hotbackup", "k8s", "Connectivity", "Network",\
"V8", "Agency", "Scheduler", "Maintenance", "kubearangodb"]

# Check that we are in the right place:
lshere = os.listdir(".")
if not("arangod" in lshere and "arangosh" in lshere and \
Expand Down Expand Up @@ -118,6 +124,10 @@
if not isinstance(y["exposedBy"], list):
print("YAML file '" + filename + "' has an attribute 'exposedBy' whose value must be a list but isn't.")
bad = True
if not bad:
if not y["category"] in categoryNames:
print("YAML file '" + filename + "' has an unknown category '" + y["category"] + "', please fix.")
bad = True

if bad:
missing = True
Expand Down