Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RocksDB transactions memory tracking #19313

Merged
merged 131 commits into from
Sep 5, 2023
Merged

RocksDB transactions memory tracking #19313

merged 131 commits into from
Sep 5, 2023

Conversation

cpjulia
Copy link
Contributor

@cpjulia cpjulia commented Jun 26, 2023

Scope & Purpose

Enterprise companion PR: https://github.com/arangodb/enterprise/pull/1311

  • Track memory usage for write operations in AQL queries.
    The memory usage will be reported in the query's peakMemoryUsage
    stats return value, and also be tracked against the global metric
    arangodb_aql_global_memory_usage. The memory usage of write
    operations in AQL queries was previously not tracked and not accounted
    for in AQL queries' own memory usage. The memory used for write operations
    also was not accounted for when setting a memory limit for AQL queries.
    The memory used for write operations is now properly tracked, including in-memory
    write batches and locked keys in RocksDB (for conflict checking).
    With that change, the memory usage reported for AQL queries that perform
    write operations may be significantly higher than before, but the values
    now will reflect the reality a lot better. This has the consequence that
    previously working AQL write queries can now fail because their actual memory
    usage hits the configured per-query or global memory limit threshold.
    In these cases it is useful to either revisit the per-query and/or global
    memory limit settings, turn on intermediate commits for the query, or adjust
    the query so that it uses less memory.

  • Track memory usage for write transactions that are not AQL queries.
    The memory usage of non-AQL write operations will be tracked in the new
    metrics arangodb_transactions_rest_memory_usage and
    arangodb_transactions_internal_memory_usage.
    The former metric contains the memory usage of user-initiated write operations,
    e.g. single or multi-document writes via the document API, JavaScript transactions,
    streaming transactions or index creation. The latter metric contains the
    memory usage of internal operations, such as background statistics gathering,
    TTL index maintenance, replication etc.
    Memory usage updates will be reported to these metrics whenever a write
    operation increases or decreases its memory usage by 8kb or more. We do
    this to save lots of small updates of the metrics values, which would be very
    contended otherwise.

  • Track memory usage of index creation operations. Writes to indexes are
    performed in in-memory write batches until they are written to storage. The
    memory usage of these in-memory write batches was previously untracked.

  • To make changes more testable, each write operation in the code is tagged with
    a transaction::OperationOrigin object. This categorizes a write operation into
    either AQL, REST, or internal. The OperationOrigin object is created whenever
    a transaction::Context is created first. The context will be responsible for returning
    the original OperationOrigin even for nested operations, e.g. if an AQL query
    (OperationOrigin: AQL) is started inside a streaming transaction (OperationOrigin: REST),
    the returned origin for the nested AQL query will still be REST.
    There is an internal API (not publicly announced nor supported) for fetching the
    meta data of the most recent write operations, e.g. their origins and their memory
    usage. This API is used for testing only. It is only available in maintainer mode and
    deactivated in release mode. There is no plan to make this API available as a feature.

  • 💩 Bugfix
  • 🍕 New feature
  • 🔥 Performance improvement
  • 🔨 Refactoring/simplification

Checklist

  • Tests
    • Regression tests
    • C++ Unit tests
    • integration tests
    • resilience tests
  • 📖 CHANGELOG entry made
  • 📚 documentation written (release notes, API changes, ...)
  • Backports
    • Backport for 3.11: -
    • Backport for 3.10: -
    • Backport for 3.9: -

Related Information

@cla-bot cla-bot bot added the cla-signed label Jun 26, 2023
@cpjulia cpjulia added this to the devel milestone Jun 26, 2023
@cla-bot
Copy link

cla-bot bot commented Aug 28, 2023

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

@cla-bot
Copy link

cla-bot bot commented Aug 31, 2023

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

@cla-bot
Copy link

cla-bot bot commented Aug 31, 2023

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

@cla-bot
Copy link

cla-bot bot commented Aug 31, 2023

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

@cla-bot
Copy link

cla-bot bot commented Sep 4, 2023

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

Copy link
Member

@neunhoef neunhoef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far. See one more comment.
I will now do another experiment and compare the new metrics with the memory profiling.

CHANGELOG Show resolved Hide resolved
@cla-bot
Copy link

cla-bot bot commented Sep 4, 2023

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

@cla-bot
Copy link

cla-bot bot commented Sep 4, 2023

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

@neunhoef
Copy link
Member

neunhoef commented Sep 4, 2023

I have done a quick check using the following tortureTrx.js script:

var nrTrx = 100;
var c = db._create("c");
var internal = require("internal");
var l = [];
for (let i = 0; i < 1000; ++i) {
  l.push({Hallo:i, s: internal.genRandomAlphaNumbers(100)});
}
var t = [];
var cc = [];
for (let i = 0; i < nrTrx; ++i) {
  t.push(db._createTransaction({collections:{write:["c"]}}));
  cc.push(t[i].collection(c));
  print(i, "generated", new Date());
}
for (let j = 0; j < 15; ++j) {
  for (let i = 0; i < nrTrx; ++i) {
    cc[i].insert(l);
    print("j=", j, "i=", i, new Date());
  }
}
print("Preparing to commit...10s...");
require("internal").wait(10);
print("Committing...");
for (let i = 0; i < nrTrx; ++i) {
  t[i].commit();
  print(i, new Date());
}

This opens a large amount of transactions and keeps them busy for a while. I have then checked the output of the new metrics and compared it with the a heap difference profile between "before" and "afterwards". So far, there still seems to be a discrepancy. We will investigate this tomorrow.

Copy link
Member

@neunhoef neunhoef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I ran examples of large and many ongoing transactions and found the following:

When I compared the actual memory usage (heap profiling) to the one reported by the new transaction memory use metric, there were a few discrepancies:

  1. The exact memory accounting is still a bit off (not surprising, because we rely on estimates about the exact behaviour of the RocksDB internal data structures). In particular, the memory used by the PointLockManager is still underestimated.

  2. Furthermore, I found that the PointLockManager does not return all its memory after a huge amount of points were locked (for some large transactions). This is, because an absl::flat_hash_map is used which does not automatically shrink, even if all locks are released. We should fix this behaviour, since it can use to a large overuse of RAM.

However, I suggest that we fine tune this in a separate PR to get this one out of the way.
The rest is only fine tuning.

@MBkkt
Copy link
Contributor

MBkkt commented Sep 5, 2023

@neunhoef

This is, because an absl::flat_hash_map is used which does not automatically shrink

std::unordered_map also doesn't shrink automatically.
If we need automatically release memory for nodes (same as std::unordered_map) it's possible to use absl::node_hash_map

* Added startup option `--transaction.streaming-max-transaction-size` to make
  the maximum size of streaming transactions configurable.
  Previously the maximum size of streaming transactions was hard-coded to
  128 MiB. This value is also used as default value for the option now.
@cla-bot
Copy link

cla-bot bot commented Sep 5, 2023

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

@neunhoef
Copy link
Member

neunhoef commented Sep 5, 2023

Latest changes: LGTM.

@neunhoef neunhoef merged commit a0c9ade into devel Sep 5, 2023
@neunhoef neunhoef deleted the feature/fuse-664-666 branch September 5, 2023 13:52
CryptoNinjaGeek added a commit that referenced this pull request Sep 12, 2023
* Added transaction memory tracking with rocksdb memory tracking

* Intermediate commit (not compiling) for transaction classification in all places

* Added distinction to transaction types everywhere

* Removed git markers

* Removed DECLARE_GAUGE unused

* Updated transaction classification for arangod

* Intermediate commit for tests and documentation

* Added extension to files

* Intermediate commit for separating transaction type hints and classification in tests

* Intermediate commit for transaction classification in tests

* Intermediate commit for transaction classification in tests

* Addressed suggestions and intermediate commit for tests

* wire memory tracking to AQL queries

* write CHANGELOG entry

* Added INTERNAL classification for transactions in tests

* make arangodbtests compile

* refactor

* make PR surface smaller

* reduce PR surface

* fix metrics documentation

* update branch and track memory usage

* added metrics descriptions

* slightly simplify test

* fix compilation errors

* fix transaction memory usage stats

* refactor code

* implement buffering for AQL transaction metrics updates

* simplify PR

* clean up metrics

* add memory accounting for copied write batches

* larger refactoring

* small refactoring

* bigger refactoring

* remove unknowns

* remove unknown origin

* expose memory usage for testing

* refactoring and test

* fix compilation of tests

* lint

* refactoring

* fix test

* refactoring

* fix tests

* remove LOG_DEVELs

* updated CHANGELOG

* added code comments

* add some tests

* add test

* added tests for memory usage tracking

* track memory usage during index creation

* fix linting

* track indexing memory usage for unique indexes

* fix compile error

* Added startup option `--transaction.streaming-max-transaction-size`

* Added startup option `--transaction.streaming-max-transaction-size` to make
  the maximum size of streaming transactions configurable.
  Previously the maximum size of streaming transactions was hard-coded to
  128 MiB. This value is also used as default value for the option now.

* fix CHANGELOG discrepancy

---------

Co-authored-by: jsteemann <jsteemann@users.noreply.github.com>
Co-authored-by: CryptoNinjaGeek <86222482+CryptoNinjaGeek@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants