Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Correct worker metrics to provide metrics across all dimensions #136

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jackatbancast
Copy link

We currently have datetime and status as dimensions for the CloudFlare Worker invocation metrics GraphQL query.

This causes CloudFlare workers to output a number of datapoints for each unique tuple of dimensions, i.e. across several datetimes and statuses.

For worker requests and errors this is fine as we .Add(...) to those counters, but for quantiles we use .Set(...) which causes us to only export the last datapoint quantiles, which may not representative of the entire dataset.

Removing the datetime for the query provides pre-aggregated datapoints for us to process.

Further to this, datetime is not currently being captured in the response data structure.


status is a useful dimension to disambiguate internal errors and the impact on performance across them. This change adds status to all of the worker metrics exported to reflect that, and to ensure we're not only using a single datapoint.

We currently have `datetime` and `status` as dimensions for the
CloudFlare Worker invocation metrics GraphQL query.

This causes CloudFlare workers to output a number of datapoints for
each unique tuple of dimensions, across several `datetime`s.

For worker requests and errors this is fine as we `.Add(...)` to those
counters, but for quantiles this causes us to only export the last
datapoints quantiles, which may be an outlier.

---

Removing the `datetime` for the query provides pre-aggregated
datapoints for us to process.

Further to this, `datetime` is not currently being captured in the
response data structure.
This change adds `status` to the CloudFlare worker metrics being
exported.

This fixes an issue where, similar to `datetime` previously, we are
getting multiple metric series from the combinations of unique
dimensions.

This resulted in only the last datapoint series being exported from
the `cloudflare-exporter` for the CloudFlare workers.

Additionally this provides more information as the `status` can be
used to differentiate the internal errors seen in CloudFlare workers.
@jackatbancast jackatbancast changed the title Fixup worker metrics fix: Correct worker metrics to provide metrics across all dimensions Oct 25, 2024
@jackatbancast
Copy link
Author

cc @haad if you have a chance to look at this it'd be appreciated as the worker metrics are currently a little incomplete/misleading.

@jackatbancast
Copy link
Author

cc @martinhaus if you also have some time to look at this it'd be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant