Skip to content

Commit

Permalink
Apply global AQL query memory limit by default (arangodb#13800)
Browse files Browse the repository at this point in the history
* Apply global AQL query memory limit by default

* Introduce metrics for AQL query memory limit violations:
  - `arangodb_aql_global_query_memory_limit_reached`: Total number of times the
    global query memory limit was violated.
  - `arangodb_aql_local_query_memory_limit_reached`: Total number of times a
    local query memory limit was violated.

* Set the default value for `--query.global-memory-limit` to around 90% of RAM,
  so that a global memory limit is now effective by default.

  The default global memory limit value is calculated by a formula depending on
  the amount of available RAM and will result in the following values for
  common RAM sizes:

  RAM:            0      (0MiB)  Limit:            0   unlimited, %mem:  n/a
  RAM:    134217728    (128MiB)  Limit:     33554432     (32MiB), %mem: 25.0
  RAM:    268435456    (256MiB)  Limit:     67108864     (64MiB), %mem: 25.0
  RAM:    536870912    (512MiB)  Limit:    255013683    (243MiB), %mem: 47.5
  RAM:    805306368    (768MiB)  Limit:    510027366    (486MiB), %mem: 63.3
  RAM:   1073741824   (1024MiB)  Limit:    765041049    (729MiB), %mem: 71.2
  RAM:   2147483648   (2048MiB)  Limit:   1785095782   (1702MiB), %mem: 83.1
  RAM:   4294967296   (4096MiB)  Limit:   3825205248   (3648MiB), %mem: 89.0
  RAM:   8589934592   (8192MiB)  Limit:   7752415969   (7393MiB), %mem: 90.2
  RAM:  17179869184  (16384MiB)  Limit:  15504831938  (14786MiB), %mem: 90.2
  RAM:  25769803776  (24576MiB)  Limit:  23257247908  (22179MiB), %mem: 90.2
  RAM:  34359738368  (32768MiB)  Limit:  31009663877  (29573MiB), %mem: 90.2
  RAM:  42949672960  (40960MiB)  Limit:  38762079846  (36966MiB), %mem: 90.2
  RAM:  68719476736  (65536MiB)  Limit:  62019327755  (59146MiB), %mem: 90.2
  RAM: 103079215104  (98304MiB)  Limit:  93028991631  (88719MiB), %mem: 90.2
  RAM: 137438953472 (131072MiB)  Limit: 124038655509 (118292MiB), %mem: 90.2
  RAM: 274877906944 (262144MiB)  Limit: 248077311017 (236584MiB), %mem: 90.2
  RAM: 549755813888 (524288MiB)  Limit: 496154622034 (473169MiB), %mem: 90.2

* added "introducedIn" attribute

* added unit test for counters
  • Loading branch information
jsteemann authored and elfringham committed Apr 20, 2021
1 parent 6b1d8da commit 727a7b3
Show file tree
Hide file tree
Showing 12 changed files with 502 additions and 10 deletions.
32 changes: 32 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,38 @@
devel
-----

* Introduce metrics for AQL query memory limit violations:
- `arangodb_aql_global_query_memory_limit_reached`: Total number of times the
global query memory limit was violated.
- `arangodb_aql_local_query_memory_limit_reached`: Total number of times a
local query memory limit was violated.

* Set the default value for `--query.global-memory-limit` to around 90% of RAM,
so that a global memory limit is now effective by default.

The default global memory limit value is calculated by a formula depending on
the amount of available RAM and will result in the following values for
common RAM sizes:

RAM: 0 (0MiB) Limit: 0 unlimited, %mem: n/a
RAM: 134217728 (128MiB) Limit: 33554432 (32MiB), %mem: 25.0
RAM: 268435456 (256MiB) Limit: 67108864 (64MiB), %mem: 25.0
RAM: 536870912 (512MiB) Limit: 255013683 (243MiB), %mem: 47.5
RAM: 805306368 (768MiB) Limit: 510027366 (486MiB), %mem: 63.3
RAM: 1073741824 (1024MiB) Limit: 765041049 (729MiB), %mem: 71.2
RAM: 2147483648 (2048MiB) Limit: 1785095782 (1702MiB), %mem: 83.1
RAM: 4294967296 (4096MiB) Limit: 3825205248 (3648MiB), %mem: 89.0
RAM: 8589934592 (8192MiB) Limit: 7752415969 (7393MiB), %mem: 90.2
RAM: 17179869184 (16384MiB) Limit: 15504831938 (14786MiB), %mem: 90.2
RAM: 25769803776 (24576MiB) Limit: 23257247908 (22179MiB), %mem: 90.2
RAM: 34359738368 (32768MiB) Limit: 31009663877 (29573MiB), %mem: 90.2
RAM: 42949672960 (40960MiB) Limit: 38762079846 (36966MiB), %mem: 90.2
RAM: 68719476736 (65536MiB) Limit: 62019327755 (59146MiB), %mem: 90.2
RAM: 103079215104 (98304MiB) Limit: 93028991631 (88719MiB), %mem: 90.2
RAM: 137438953472 (131072MiB) Limit: 124038655509 (118292MiB), %mem: 90.2
RAM: 274877906944 (262144MiB) Limit: 248077311017 (236584MiB), %mem: 90.2
RAM: 549755813888 (524288MiB) Limit: 496154622034 (473169MiB), %mem: 90.2

* The old metrics API contains the following gauges which should actually be
counters:
* arangodb_scheduler_jobs_dequeued
Expand Down
38 changes: 38 additions & 0 deletions Documentation/Metrics/allMetrics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -461,6 +461,44 @@
name: arangodb_aql_global_memory_usage
type: gauge
unit: bytes
- category: AQL
complexity: simple
description: "Total number of times the global query memory limit threshold was
reached.\nThis can happen if all running AQL queries in total try to use more
memory than\nconfigured via the `--query.global-memory-limit` startup option.\nEvery
time this counter will increase, an AQL query will have aborted with a \n\"resource
limit exceeded\" error.\n"
exposedBy:
- coordinator
- dbserver
- agent
- single
help: 'Number of times the global query memory limit threshold was reached.
'
introducedIn: '3.8'
name: arangodb_aql_global_query_memory_limit_reached
type: counter
unit: number
- category: AQL
complexity: simple
description: "Total number of times a local query memory limit threshold was reached,
i.e.\na single query tried to allocate more memory than configured in the query's\n`memoryLimit`
attribute or the value configured via the startup option\n`--query.memory-limit`.\nEvery
time this counter will increase, an AQL query will have aborted with a \n\"resource
limit exceeded\" error.\n"
exposedBy:
- coordinator
- dbserver
- agent
- single
help: 'Number of times a local query memory limit threshold was reached.
'
introducedIn: '3.8'
name: arangodb_aql_local_query_memory_limit_reached
type: counter
unit: number
- category: AQL
complexity: simple
description: 'Execution time histogram for all AQL queries, in seconds.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: arangodb_aql_global_query_memory_limit_reached
introducedIn: "3.8"
help: |
Number of times the global query memory limit threshold was reached.
unit: number
type: counter
category: AQL
complexity: simple
exposedBy:
- coordinator
- dbserver
- agent
- single
description: |
Total number of times the global query memory limit threshold was reached.
This can happen if all running AQL queries in total try to use more memory than
configured via the `--query.global-memory-limit` startup option.
Every time this counter will increase, an AQL query will have aborted with a
"resource limit exceeded" error.
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: arangodb_aql_local_query_memory_limit_reached
introducedIn: "3.8"
help: |
Number of times a local query memory limit threshold was reached.
unit: number
type: counter
category: AQL
complexity: simple
exposedBy:
- coordinator
- dbserver
- agent
- single
description: |
Total number of times a local query memory limit threshold was reached, i.e.
a single query tried to allocate more memory than configured in the query's
`memoryLimit` attribute or the value configured via the startup option
`--query.memory-limit`.
Every time this counter will increase, an AQL query will have aborted with a
"resource limit exceeded" error.
51 changes: 42 additions & 9 deletions arangod/RestServer/QueryRegistryFeature.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,14 @@ using namespace arangodb::options;

namespace {

uint64_t defaultMemoryLimit(uint64_t available) {
uint64_t defaultMemoryLimit(uint64_t available, double reserveFraction, double percentage) {
if (available == 0) {
// we don't know how much memory is available, so we cannot do any sensible calculation
return 0;
}

// this function will produce the following results for some
// common available memory values:
// this function will produce the following results for a reserveFraction of 0.2 and a
// percentage of 0.75 for some common available memory values:
//
// Available memory: 0 (0MiB) Limit: 0 unlimited, %mem: n/a
// Available memory: 134217728 (128MiB) Limit: 33554432 (32MiB), %mem: 25.0
Expand All @@ -77,14 +77,35 @@ uint64_t defaultMemoryLimit(uint64_t available) {
// Available memory: 274877906944 (262144MiB) Limit: 164926744167 (157286MiB), %mem: 60.0
// Available memory: 549755813888 (524288MiB) Limit: 329853488333 (314572MiB), %mem: 60.0

// 20% of RAM will be considered as a reserve
uint64_t reserve = static_cast<uint64_t>(available * 0.2);
// for a reserveFraction of 0.05 and a percentage of 0.95 it will produce:
//
// Available memory: 0 (0MiB) Limit: 0 unlimited, %mem: n/a
// Available memory: 134217728 (128MiB) Limit: 33554432 (32MiB), %mem: 25.0
// Available memory: 268435456 (256MiB) Limit: 67108864 (64MiB), %mem: 25.0
// Available memory: 536870912 (512MiB) Limit: 255013683 (243MiB), %mem: 47.5
// Available memory: 805306368 (768MiB) Limit: 510027366 (486MiB), %mem: 63.3
// Available memory: 1073741824 (1024MiB) Limit: 765041049 (729MiB), %mem: 71.2
// Available memory: 2147483648 (2048MiB) Limit: 1785095782 (1702MiB), %mem: 83.1
// Available memory: 4294967296 (4096MiB) Limit: 3825205248 (3648MiB), %mem: 89.0
// Available memory: 8589934592 (8192MiB) Limit: 7752415969 (7393MiB), %mem: 90.2
// Available memory: 17179869184 (16384MiB) Limit: 15504831938 (14786MiB), %mem: 90.2
// Available memory: 25769803776 (24576MiB) Limit: 23257247908 (22179MiB), %mem: 90.2
// Available memory: 34359738368 (32768MiB) Limit: 31009663877 (29573MiB), %mem: 90.2
// Available memory: 42949672960 (40960MiB) Limit: 38762079846 (36966MiB), %mem: 90.2
// Available memory: 68719476736 (65536MiB) Limit: 62019327755 (59146MiB), %mem: 90.2
// Available memory: 103079215104 (98304MiB) Limit: 93028991631 (88719MiB), %mem: 90.2
// Available memory: 137438953472 (131072MiB) Limit: 124038655509 (118292MiB), %mem: 90.2
// Available memory: 274877906944 (262144MiB) Limit: 248077311017 (236584MiB), %mem: 90.2
// Available memory: 549755813888 (524288MiB) Limit: 496154622034 (473169MiB), %mem: 90.2

// reserveFraction% of RAM will be considered as a reserve
uint64_t reserve = static_cast<uint64_t>(available * reserveFraction);

// minimum reserve memory is 256MB
reserve = std::max<uint64_t>(reserve, static_cast<uint64_t>(256) << 20);

double f = double(1.0) - (double(reserve) / double(available));
double dyn = (double(available) * f * 0.75);
double dyn = (double(available) * f * percentage);
if (dyn < 0.0) {
dyn = 0.0;
}
Expand Down Expand Up @@ -123,6 +144,10 @@ DECLARE_GAUGE(
std::to_string(ResourceMonitor::chunkSize) + " bytes steps");
DECLARE_GAUGE(arangodb_aql_global_memory_limit, uint64_t,
"Total memory limit for all AQL queries combined [bytes]");
DECLARE_COUNTER(arangodb_aql_global_query_memory_limit_reached,
"Number of global AQL query memory limit violations");
DECLARE_COUNTER(arangodb_aql_local_query_memory_limit_reached,
"Number of local AQL query memory limit violations");

QueryRegistryFeature::QueryRegistryFeature(application_features::ApplicationServer& server)
: ApplicationFeature(server, "QueryRegistry"),
Expand All @@ -138,8 +163,8 @@ QueryRegistryFeature::QueryRegistryFeature(application_features::ApplicationServ
_smartJoins(true),
_parallelizeTraversals(true),
#endif
_queryGlobalMemoryLimit(0),
_queryMemoryLimit(defaultMemoryLimit(PhysicalMemory::getValue())),
_queryGlobalMemoryLimit(defaultMemoryLimit(PhysicalMemory::getValue(), 0.1, 0.90)),
_queryMemoryLimit(defaultMemoryLimit(PhysicalMemory::getValue(), 0.2, 0.75)),
_queryMaxRuntime(aql::QueryOptions::defaultMaxRuntime),
_maxQueryPlans(aql::QueryOptions::defaultMaxNumberOfPlans),
_queryCacheMaxResultsCount(0),
Expand All @@ -165,7 +190,11 @@ QueryRegistryFeature::QueryRegistryFeature(application_features::ApplicationServ
_globalQueryMemoryUsage(
server.getFeature<arangodb::MetricsFeature>().add(arangodb_aql_global_memory_usage{})),
_globalQueryMemoryLimit(
server.getFeature<arangodb::MetricsFeature>().add(arangodb_aql_global_memory_limit{})) {
server.getFeature<arangodb::MetricsFeature>().add(arangodb_aql_global_memory_limit{})),
_globalQueryMemoryLimitReached(
server.getFeature<arangodb::MetricsFeature>().add(arangodb_aql_global_query_memory_limit_reached{})),
_localQueryMemoryLimitReached(
server.getFeature<arangodb::MetricsFeature>().add(arangodb_aql_local_query_memory_limit_reached{})) {
setOptional(false);
startsAfter<V8FeaturePhase>();

Expand Down Expand Up @@ -402,6 +431,10 @@ void QueryRegistryFeature::updateMetrics() {
GlobalResourceMonitor const& global = GlobalResourceMonitor::instance();
_globalQueryMemoryUsage = global.current();
_globalQueryMemoryLimit = global.memoryLimit();

auto stats = global.stats();
_globalQueryMemoryLimitReached = stats.globalLimitReached;
_localQueryMemoryLimitReached = stats.localLimitReached;
}

void QueryRegistryFeature::trackQueryStart() noexcept {
Expand Down
2 changes: 2 additions & 0 deletions arangod/RestServer/QueryRegistryFeature.h
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ class QueryRegistryFeature final : public application_features::ApplicationFeatu
Gauge<uint64_t>& _runningQueries;
Gauge<uint64_t>& _globalQueryMemoryUsage;
Gauge<uint64_t>& _globalQueryMemoryLimit;
Counter& _globalQueryMemoryLimitReached;
Counter& _localQueryMemoryLimitReached;
};

} // namespace arangodb
Expand Down
18 changes: 18 additions & 0 deletions lib/Basics/GlobalResourceMonitor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,24 @@ std::int64_t GlobalResourceMonitor::memoryLimit() const noexcept {
std::int64_t GlobalResourceMonitor::current() const noexcept {
return _current.load(std::memory_order_relaxed);
}

/// @brief number of times the global and any local limits were reached
GlobalResourceMonitor::Stats GlobalResourceMonitor::stats() const noexcept {
Stats stats;
stats.globalLimitReached = _globalLimitReachedCounter.load(std::memory_order_relaxed);
stats.localLimitReached = _localLimitReachedCounter.load(std::memory_order_relaxed);
return stats;
}

/// @brief increase the counter for global memory limit violations
void GlobalResourceMonitor::trackGlobalViolation() noexcept {
_globalLimitReachedCounter.fetch_add(1, std::memory_order_relaxed);
}

/// @brief increase the counter for local memory limit violations
void GlobalResourceMonitor::trackLocalViolation() noexcept {
_localLimitReachedCounter.fetch_add(1, std::memory_order_relaxed);
}

/// @brief increase global memory usage by <value> bytes. if increasing exceeds the
/// memory limit, does not perform the increase and returns false. if increasing
Expand Down
24 changes: 23 additions & 1 deletion lib/Basics/GlobalResourceMonitor.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,14 @@ class alignas(64) GlobalResourceMonitor final {
public:
constexpr GlobalResourceMonitor()
: _current(0),
_limit(0) {}
_limit(0),
_globalLimitReachedCounter(0),
_localLimitReachedCounter(0) {}

struct Stats {
std::uint64_t globalLimitReached;
std::uint64_t localLimitReached;
};

/// @brief set the global memory limit
void memoryLimit(std::int64_t value) noexcept;
Expand All @@ -48,6 +55,15 @@ class alignas(64) GlobalResourceMonitor final {

/// @brief return the current global memory usage
std::int64_t current() const noexcept;

/// @brief number of times the global and any local limits were reached
Stats stats() const noexcept;

/// @brief increase the counter for global memory limit violations
void trackGlobalViolation() noexcept;

/// @brief increase the counter for local memory limit violations
void trackLocalViolation() noexcept;

/// @brief increase global memory usage by <value> bytes. if increasing exceeds the
/// memory limit, does not perform the increase and returns false. if increasing
Expand Down Expand Up @@ -79,6 +95,12 @@ class alignas(64) GlobalResourceMonitor final {
/// @brief maximum allowed global memory limit for all tracked operations combined.
/// a value of 0 means that there will be no global limit enforced.
std::int64_t _limit;

/// @brief number of times the global memory limit was reached
std::atomic<std::uint64_t> _globalLimitReachedCounter;

/// @brief number of times a local memory limit was reached
std::atomic<std::uint64_t> _localLimitReachedCounter;
};

} // namespace arangodb
Expand Down
4 changes: 4 additions & 0 deletions lib/Basics/ResourceUsage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ void ResourceMonitor::increaseMemoryUsage(std::uint64_t value) {
// revert the change that we already made to the instance's own counter.
rollback();

// track local limit violation
_global.trackLocalViolation();
// now we can safely signal an exception
THROW_ARANGO_EXCEPTION(TRI_ERROR_RESOURCE_LIMIT);
}
Expand All @@ -141,6 +143,8 @@ void ResourceMonitor::increaseMemoryUsage(std::uint64_t value) {
// the allocation would exceed the global maximum value, so we need to roll back.
rollback();

// track global limit violation
_global.trackGlobalViolation();
// now we can safely signal an exception
THROW_ARANGO_EXCEPTION_MESSAGE(TRI_ERROR_RESOURCE_LIMIT, "global memory limit exceeded");
}
Expand Down
Loading

0 comments on commit 727a7b3

Please sign in to comment.