Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blogpost on "monitoring FerretDB performance using Coroot" #4279

Merged
merged 22 commits into from
Jul 9, 2024

Conversation

Fashander
Copy link
Member

Description

Closes FerretDB/engineering#168.

Readiness checklist

  • I added/updated unit tests (and they pass).
  • I added/updated integration/compatibility tests (and they pass).
  • I added/updated comments and checked rendering.
  • I made spot refactorings.
  • I updated user documentation.
  • I ran task all, and it passed.
  • I ensured that PR title is good enough for the changelog.
  • (for maintainers only) I set Reviewers (@FerretDB/core), Milestone (Next), Labels, Project and project's Sprint fields.
  • I marked all done items in this checklist.

@Fashander Fashander added the blog/marketing Marketing (and releases) blog posts label May 9, 2024
@Fashander Fashander added this to the v1.22.0 milestone May 9, 2024
@Fashander Fashander requested a review from a team May 9, 2024 05:04
@Fashander Fashander self-assigned this May 9, 2024
@Fashander Fashander requested review from AlekSi and ptrfarkas as code owners May 9, 2024 05:04
@Fashander Fashander enabled auto-merge (squash) May 9, 2024 05:04
Copy link
Contributor

mergify bot commented May 9, 2024

Marketing blog posts should be reviewed by @ptrfarkas and @AlekSi.

Copy link

codecov bot commented May 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.48%. Comparing base (ef7f275) to head (08872af).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4279      +/-   ##
==========================================
- Coverage   74.30%   71.48%   -2.82%     
==========================================
  Files         327      327              
  Lines       22674    22674              
==========================================
- Hits        16847    16209     -638     
- Misses       4600     5246     +646     
+ Partials     1227     1219       -8     

see 41 files with indirect coverage changes

Flag Coverage Δ
filter-true 64.02% <ø> (-3.24%) ⬇️
hana-1 0.00% <ø> (-3.62%) ⬇️
integration 64.02% <ø> (-3.24%) ⬇️
mongodb-1 5.30% <ø> (ø)
postgresql-1 42.40% <ø> (ø)
postgresql-2 ?
postgresql-3 42.13% <ø> (+0.01%) ⬆️
postgresql-4 43.76% <ø> (+0.07%) ⬆️
postgresql-5 45.12% <ø> (ø)
sqlite-1 41.55% <ø> (ø)
sqlite-2 ?
sqlite-3 41.32% <ø> (-0.02%) ⬇️
sqlite-4 42.87% <ø> (ø)
sqlite-5 44.36% <ø> (ø)
unit 33.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@Fashander Fashander added the trust PRs that can access Actions secrets label May 9, 2024
Comment on lines 13 to 19
Effective real-time monitoring is a critical aspect of any infrastructure.
[Coroot](https://coroot.com/) is an open source observability platform that can provide real-time monitoring and visibility into a [FerretDB](https://www.ferretdb.com/) setup.

<!--truncate-->

Effective real-time monitoring is a critical aspect of any infrastructure.
[Coroot](https://coroot.com/) is an open source observability platform that can provide real-time monitoring and visibility into a [FerretDB](https://www.ferretdb.com/) setup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we repeat it twice?

CleanShot 2024-05-14 at 21 07 02@2x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one wasn't solved

## Setting up Coroot for FerretDB monitoring

Since Coroot uses eBPF, you need the right environment before setting it up.
The most recent versions of the Linux kernel (v 4.16 and above) should be compatible since they offer at least minimal eBPF support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The most recent versions of the Linux kernel (v 4.16 and above) should be compatible since they offer at least minimal eBPF support.
The most recent versions of the Linux kernel (v4.16 and above) should be compatible since they offer at least minimal eBPF support.

Comment on lines 137 to 140
The Coroot dashboard provides the full details on all components.

At first glance, we can see a memory leak on the `ferretdb` and `postgres` databases.
That suggests that allocated memory is not being efficiently reused or deallocated, causing the total memory usage to grow progressively as the services operate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… so we want to publish a blog post that casually mentions that FerretDB and PostgreSQL have memory leaks just like that?

Comment on lines 165 to 167
Using distributed tracing, Coroot provides a heat map showing operation requests, their status, durations, and details.

![Latency](/img/blog/ferretdb-coroot/07-latency.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we write something that has nothing to do with FerretDB?

Comment on lines 169 to 171
The above image shows how response time for the `ferretdb` increased progressively over time.
It shows that the system takes a long time to handle queries.
That should prompt us to take additional measures to improve performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wat

@Fashander Fashander had a problem deploying to cloudflare-dev-blog June 3, 2024 09:59 — with GitHub Actions Failure
@Fashander Fashander requested a review from AlekSi June 3, 2024 09:59
@AlekSi AlekSi had a problem deploying to cloudflare-dev-blog June 3, 2024 14:38 — with GitHub Actions Failure
Copy link
Member

@AlekSi AlekSi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build fails. Not much I could review.

website/static/img/blog/.DS_Store Outdated Show resolved Hide resolved
You also get CPU, memory, storage, network, and log management metrics.

To get started with FerretDB, [see our documentation](https://docs.ferretdb.io/).
And if you want to contact the team for help or have any questions, [contact us on Slack](https://join.slack.com/t/ferretdb/shared_invite/zt-zqe9hj8g-ZcMG3~5Cs5u9uuOPnZB8~A).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked about many times before, we should not use that link because it may change. We should link to the community section in our docs or in README

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use `kebab-case-with-dashes` instead of `snake_case_with_underscores` or spaces

Alex, it is your responsibility to enforce that guide. And you are not even following it yourself.

Copy link
Member

@AlekSi AlekSi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There is a disconnect between images and words.
  2. It is not clear what part comes from eBPF.


Since Coroot is deployed locally, you can access it at http://localhost:8080/.

Depending on your setup, you may need to modify the Docker compose `yaml` file and configure Prometheus to pick up FerretDB metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changes would be needed in the Docker compose (sic) yaml file?


![memory usage](/img/blog/ferretdb-coroot/memory-metrics.png)

Looking at the memory usage metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Looking at the memory usage metrics.
Looking at the memory usage metrics,

But then, I'm not sure why we say that at all. It looks like we edited that part (that previously described a memory leak) without stopping and thinking if we need that part at all.

Comment on lines 102 to 104
![CPU dashboard 1](/img/blog/ferretdb-coroot/cpu-metrics-1.png)

![CPU dashboard 1](/img/blog/ferretdb-coroot/cpu-metrics-2.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those graphs show Prometheus requests to FerretDB to gather metrics, not FerretDB metrics


![CPU dashboard 1](/img/blog/ferretdb-coroot/cpu-metrics-2.png)

In the images, the FerretDB instance indicates a peak Requests Per Second (RPS) of 0.07 with a consistent 2ms latency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And those requests are coming from?..

@AlekSi AlekSi modified the milestones: v1.22.0, v1.23.0 Jun 25, 2024

The image below shows a typical dashboard design using Grafana to display some of the metrics, including total client requests and responses, client connection durations, memory usage, CPU usage, and overall instance health.

![Grafana dashboard for FerretDB Prometheus metrics 1](/img/blog/ferretdb-coroot/grafana-prometheus.png)
Copy link
Member

@chilagrow chilagrow Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just quick comment, before going through the entire doc. These images are very small to tell what's going on. Is there a way to capture part of it with interesting metrics or allow zooming or something? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image for Grafana is just to show the dashboard setup rather than specific metrics. Others are more specific. But yes, this is something to look at actually. We may enable images to be clickable/zoomable via some plugin addition to Docusuaurus.

ptrfarkas
ptrfarkas previously approved these changes Jul 9, 2024
@AlekSi AlekSi disabled auto-merge July 9, 2024 18:22
@AlekSi AlekSi merged commit 82c2445 into FerretDB:main Jul 9, 2024
24 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blog/marketing Marketing (and releases) blog posts trust PRs that can access Actions secrets
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants