Skip to content

Commit

Permalink
Merge pull request #641 from grafana/resync-upstream
Browse files Browse the repository at this point in the history
Resync upstream prometheus
  • Loading branch information
pracucci authored Jun 7, 2024
2 parents ef8f745 + 40c9c28 commit fce8e33
Show file tree
Hide file tree
Showing 48 changed files with 1,802 additions and 806 deletions.
13 changes: 9 additions & 4 deletions .github/workflows/container_description.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ on:
push:
paths:
- "README.md"
- "README-containers.md"
- ".github/workflows/container_description.yml"
branches: [ main, master ]

Expand All @@ -17,7 +18,7 @@ jobs:
if: github.repository_owner == 'prometheus' || github.repository_owner == 'prometheus-community' # Don't run this workflow on forks.
steps:
- name: git checkout
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
- name: Set docker hub repo name
run: echo "DOCKER_REPO_NAME=$(make docker-repo-name)" >> $GITHUB_ENV
- name: Push README to Dockerhub
Expand All @@ -29,15 +30,17 @@ jobs:
destination_container_repo: ${{ env.DOCKER_REPO_NAME }}
provider: dockerhub
short_description: ${{ env.DOCKER_REPO_NAME }}
readme_file: 'README.md'
# Empty string results in README-containers.md being pushed if it
# exists. Otherwise, README.md is pushed.
readme_file: ''

PushQuayIoReadme:
runs-on: ubuntu-latest
name: Push README to quay.io
if: github.repository_owner == 'prometheus' || github.repository_owner == 'prometheus-community' # Don't run this workflow on forks.
steps:
- name: git checkout
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
- name: Set quay.io org name
run: echo "DOCKER_REPO=$(echo quay.io/${GITHUB_REPOSITORY_OWNER} | tr -d '-')" >> $GITHUB_ENV
- name: Set quay.io repo name
Expand All @@ -49,4 +52,6 @@ jobs:
with:
destination_container_repo: ${{ env.DOCKER_REPO_NAME }}
provider: quay
readme_file: 'README.md'
# Empty string results in README-containers.md being pushed if it
# exists. Otherwise, README.md is pushed.
readme_file: ''
9 changes: 6 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,22 @@

## unreleased

This release changes the default for GOGC, the Go runtime control for the trade-off between excess memory use and CPU usage. We have found that Prometheus operates with minimal additional CPU usage, but greatly reduced memory by adjusting the upstream Go default from 100 to 50.

* [CHANGE] Rules: Execute 1 query instead of N (where N is the number of alerts within alert rule) when restoring alerts. #13980
* [CHANGE] Runtime: Change GOGC threshold from 100 to 50 #14176
* [FEATURE] Rules: Add new option `query_offset` for each rule group via rule group configuration file and `rule_query_offset` as part of the global configuration to have more resilience for remote write delays. #14061
* [ENHANCEMENT] Rules: Add `rule_group_last_restore_duration_seconds` to measure the time it takes to restore a rule group. #13974
* [ENHANCEMENT] OTLP: Improve remote write format translation performance by using label set hashes for metric identifiers instead of string based ones. #14006 #13991
* [ENHANCEMENT] TSDB: Optimize querying with regexp matchers. #13620
* [BUGFIX] OTLP: Don't generate target_info unless at least one identifying label is defined. #13991
* [BUGFIX] OTLP: Don't generate target_info unless there are metrics. #13991

## 2.52.0-rc.1 / 2024-05-03
## 2.52.1 / 2024-05-29

* [BUGFIX] API: Fix missing comma during JSON encoding of API results. #14047
* [BUGFIX] Linode SD: Fix partial fetch when discovery would return more than 500 elements. #14141

## 2.52.0-rc.0 / 2024-04-22
## 2.52.0 / 2024-05-07

* [CHANGE] TSDB: Fix the predicate checking for blocks which are beyond the retention period to include the ones right at the retention boundary. #9633
* [FEATURE] Kubernetes SD: Add a new metric `prometheus_sd_kubernetes_failures_total` to track failed requests to Kubernetes API. #13554
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.52.0-rc.1
2.52.1
13 changes: 13 additions & 0 deletions cmd/prometheus/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ import (
"os/signal"
"path/filepath"
"runtime"
"runtime/debug"
"strconv"
"strings"
"sync"
"syscall"
Expand Down Expand Up @@ -1384,6 +1386,17 @@ func reloadConfig(filename string, expandExternalLabels, enableExemplarStorage b
return fmt.Errorf("one or more errors occurred while applying the new configuration (--config.file=%q)", filename)
}

oldGoGC := debug.SetGCPercent(conf.Runtime.GoGC)
if oldGoGC != conf.Runtime.GoGC {
level.Info(logger).Log("msg", "updated GOGC", "old", oldGoGC, "new", conf.Runtime.GoGC)
}
// Write the new setting out to the ENV var for runtime API output.
if conf.Runtime.GoGC >= 0 {
os.Setenv("GOGC", strconv.Itoa(conf.Runtime.GoGC))
} else {
os.Setenv("GOGC", "off")
}

noStepSuqueryInterval.Set(conf.GlobalConfig.EvaluationInterval)
l := []interface{}{"msg", "Completed loading of configuration file", "filename", filename, "totalDuration", time.Since(start)}
level.Info(logger).Log(append(l, timings...)...)
Expand Down
44 changes: 43 additions & 1 deletion config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"os"
"path/filepath"
"sort"
"strconv"
"strings"
"time"

Expand Down Expand Up @@ -151,6 +152,11 @@ var (
ScrapeProtocols: DefaultScrapeProtocols,
}

DefaultRuntimeConfig = RuntimeConfig{
// Go runtime tuning.
GoGC: 50,
}

// DefaultScrapeConfig is the default scrape configuration.
DefaultScrapeConfig = ScrapeConfig{
// ScrapeTimeout, ScrapeInterval and ScrapeProtocols default to the configured globals.
Expand Down Expand Up @@ -225,6 +231,7 @@ var (
// Config is the top-level configuration for Prometheus's config files.
type Config struct {
GlobalConfig GlobalConfig `yaml:"global"`
Runtime RuntimeConfig `yaml:"runtime,omitempty"`
AlertingConfig AlertingConfig `yaml:"alerting,omitempty"`
RuleFiles []string `yaml:"rule_files,omitempty"`
ScrapeConfigFiles []string `yaml:"scrape_config_files,omitempty"`
Expand Down Expand Up @@ -335,6 +342,14 @@ func (c *Config) UnmarshalYAML(unmarshal func(interface{}) error) error {
c.GlobalConfig = DefaultGlobalConfig
}

// If a runtime block was open but empty the default runtime config is overwritten.
// We have to restore it here.
if c.Runtime.isZero() {
c.Runtime = DefaultRuntimeConfig
// Use the GOGC env var value if the runtime section is empty.
c.Runtime.GoGC = getGoGCEnv()
}

for _, rf := range c.RuleFiles {
if !patRulePath.MatchString(rf) {
return fmt.Errorf("invalid rule file path %q", rf)
Expand Down Expand Up @@ -399,7 +414,7 @@ type GlobalConfig struct {
// How frequently to evaluate rules by default.
EvaluationInterval model.Duration `yaml:"evaluation_interval,omitempty"`
// Offset the rule evaluation timestamp of this particular group by the specified duration into the past to ensure the underlying metrics have been received.
RuleQueryOffset model.Duration `yaml:"rule_query_offset"`
RuleQueryOffset model.Duration `yaml:"rule_query_offset,omitempty"`
// File to which PromQL queries are logged.
QueryLogFile string `yaml:"query_log_file,omitempty"`
// The labels to add to any timeseries that this Prometheus instance scrapes.
Expand Down Expand Up @@ -564,6 +579,17 @@ func (c *GlobalConfig) isZero() bool {
c.ScrapeProtocols == nil
}

// RuntimeConfig configures the values for the process behavior.
type RuntimeConfig struct {
// The Go garbage collection target percentage.
GoGC int `yaml:"gogc,omitempty"`
}

// isZero returns true iff the global config is the zero value.
func (c *RuntimeConfig) isZero() bool {
return c.GoGC == 0
}

type ScrapeConfigs struct {
ScrapeConfigs []*ScrapeConfig `yaml:"scrape_configs,omitempty"`
}
Expand Down Expand Up @@ -1211,3 +1237,19 @@ func filePath(filename string) string {
func fileErr(filename string, err error) error {
return fmt.Errorf("%q: %w", filePath(filename), err)
}

func getGoGCEnv() int {
goGCEnv := os.Getenv("GOGC")
// If the GOGC env var is set, use the same logic as upstream Go.
if goGCEnv != "" {
// Special case for GOGC=off.
if strings.ToLower(goGCEnv) == "off" {
return -1
}
i, err := strconv.Atoi(goGCEnv)
if err == nil {
return i
}
}
return DefaultRuntimeConfig.GoGC
}
1 change: 1 addition & 0 deletions config/config_default_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ const ruleFilesConfigFile = "testdata/rules_abs_path.good.yml"

var ruleFilesExpectedConf = &Config{
GlobalConfig: DefaultGlobalConfig,
Runtime: DefaultRuntimeConfig,
RuleFiles: []string{
"testdata/first.rules",
"testdata/rules/second.rules",
Expand Down
6 changes: 6 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ const (
globLabelLimit = 30
globLabelNameLengthLimit = 200
globLabelValueLengthLimit = 200
globalGoGC = 42
)

var expectedConf = &Config{
Expand All @@ -96,6 +97,10 @@ var expectedConf = &Config{
ScrapeProtocols: DefaultGlobalConfig.ScrapeProtocols,
},

Runtime: RuntimeConfig{
GoGC: globalGoGC,
},

RuleFiles: []string{
filepath.FromSlash("testdata/first.rules"),
filepath.FromSlash("testdata/my/*.rules"),
Expand Down Expand Up @@ -2081,6 +2086,7 @@ func TestEmptyGlobalBlock(t *testing.T) {
c, err := Load("global:\n", false, log.NewNopLogger())
require.NoError(t, err)
exp := DefaultConfig
exp.Runtime = DefaultRuntimeConfig
require.Equal(t, exp, *c)
}

Expand Down
1 change: 1 addition & 0 deletions config/config_windows_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ const ruleFilesConfigFile = "testdata/rules_abs_path_windows.good.yml"

var ruleFilesExpectedConf = &Config{
GlobalConfig: DefaultGlobalConfig,
Runtime: DefaultRuntimeConfig,
RuleFiles: []string{
"testdata\\first.rules",
"testdata\\rules\\second.rules",
Expand Down
3 changes: 3 additions & 0 deletions config/testdata/conf.good.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ global:
monitor: codelab
foo: bar

runtime:
gogc: 42

rule_files:
- "first.rules"
- "my/*.rules"
Expand Down
23 changes: 16 additions & 7 deletions discovery/linode/linode.go
Original file line number Diff line number Diff line change
Expand Up @@ -186,12 +186,12 @@ func (d *Discovery) refresh(ctx context.Context) ([]*targetgroup.Group, error) {

if d.lastResults != nil && d.eventPollingEnabled {
// Check to see if there have been any events. If so, refresh our data.
opts := linodego.ListOptions{
eventsOpts := linodego.ListOptions{
PageOptions: &linodego.PageOptions{Page: 1},
PageSize: 25,
Filter: fmt.Sprintf(filterTemplate, d.lastRefreshTimestamp.Format("2006-01-02T15:04:05")),
}
events, err := d.client.ListEvents(ctx, &opts)
events, err := d.client.ListEvents(ctx, &eventsOpts)
if err != nil {
var e *linodego.Error
if errors.As(err, &e) && e.Code == http.StatusUnauthorized {
Expand Down Expand Up @@ -232,31 +232,40 @@ func (d *Discovery) refreshData(ctx context.Context) ([]*targetgroup.Group, erro
tg := &targetgroup.Group{
Source: "Linode",
}
opts := linodego.ListOptions{
// We need 3 of these because Linodego writes into the structure during pagination
listInstancesOpts := linodego.ListOptions{
PageSize: 500,
}
listIPAddressesOpts := linodego.ListOptions{
PageSize: 500,
}
listIPv6RangesOpts := linodego.ListOptions{
PageSize: 500,
}

// If region filter provided, use it to constrain results.
if d.region != "" {
opts.Filter = fmt.Sprintf(regionFilterTemplate, d.region)
listInstancesOpts.Filter = fmt.Sprintf(regionFilterTemplate, d.region)
listIPAddressesOpts.Filter = fmt.Sprintf(regionFilterTemplate, d.region)
listIPv6RangesOpts.Filter = fmt.Sprintf(regionFilterTemplate, d.region)
}

// Gather all linode instances.
instances, err := d.client.ListInstances(ctx, &opts)
instances, err := d.client.ListInstances(ctx, &listInstancesOpts)
if err != nil {
d.metrics.failuresCount.Inc()
return nil, err
}

// Gather detailed IP address info for all IPs on all linode instances.
detailedIPs, err := d.client.ListIPAddresses(ctx, &opts)
detailedIPs, err := d.client.ListIPAddresses(ctx, &listIPAddressesOpts)
if err != nil {
d.metrics.failuresCount.Inc()
return nil, err
}

// Gather detailed IPv6 Range info for all linode instances.
ipv6RangeList, err := d.client.ListIPv6Ranges(ctx, &opts)
ipv6RangeList, err := d.client.ListIPv6Ranges(ctx, &listIPv6RangesOpts)
if err != nil {
d.metrics.failuresCount.Inc()
return nil, err
Expand Down
2 changes: 1 addition & 1 deletion discovery/ovhcloud/ovhcloud_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ endpoint: %s

_, err := createClient(&conf)

require.ErrorContains(t, err, "missing application key")
require.ErrorContains(t, err, "missing authentication information")
}

func TestParseIPs(t *testing.T) {
Expand Down
12 changes: 6 additions & 6 deletions discovery/scaleway/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -175,14 +175,14 @@ func (d *instanceDiscovery) refresh(ctx context.Context) ([]*targetgroup.Group,
}

addr := ""
if server.IPv6 != nil {
labels[instancePublicIPv6Label] = model.LabelValue(server.IPv6.Address.String())
addr = server.IPv6.Address.String()
if server.IPv6 != nil { //nolint:staticcheck
labels[instancePublicIPv6Label] = model.LabelValue(server.IPv6.Address.String()) //nolint:staticcheck
addr = server.IPv6.Address.String() //nolint:staticcheck
}

if server.PublicIP != nil {
labels[instancePublicIPv4Label] = model.LabelValue(server.PublicIP.Address.String())
addr = server.PublicIP.Address.String()
if server.PublicIP != nil { //nolint:staticcheck
labels[instancePublicIPv4Label] = model.LabelValue(server.PublicIP.Address.String()) //nolint:staticcheck
addr = server.PublicIP.Address.String() //nolint:staticcheck
}

if server.PrivateIP != nil {
Expand Down
6 changes: 6 additions & 0 deletions docs/configuration/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,12 @@ global:
# that will be kept in memory. 0 means no limit.
[ keep_dropped_targets: <int> | default = 0 ]

runtime:
# Configure the Go garbage collector GOGC parameter
# See: https://tip.golang.org/doc/gc-guide#GOGC
# Lowering this number increases CPU usage.
[ gogc: <int> | default = 50 ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
Expand Down
7 changes: 7 additions & 0 deletions docs/querying/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,9 @@ Range vectors are returned as result type `matrix`. The corresponding
Each series could have the `"values"` key, or the `"histograms"` key, or both.
For a given timestamp, there will only be one sample of either float or histogram type.

Series are returned sorted by `metric`. Functions such as [`sort`](functions.md#sort)
and [`sort_by_label`](functions.md#sort_by_label) have no effect for range vectors.

### Instant vectors

Instant vectors are returned as result type `vector`. The corresponding
Expand All @@ -491,6 +494,10 @@ Instant vectors are returned as result type `vector`. The corresponding

Each series could have the `"value"` key, or the `"histogram"` key, but not both.

Series are not guaranteed to be returned in any particular order unless a function
such as [`sort`](functions.md#sort) or [`sort_by_label`](functions.md#sort_by_label)`
is used.

### Scalars

Scalar results are returned as result type `scalar`. The corresponding
Expand Down
4 changes: 4 additions & 0 deletions docs/querying/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -596,10 +596,14 @@ have exactly one element, `scalar` will return `NaN`.
`sort(v instant-vector)` returns vector elements sorted by their sample values,
in ascending order. Native histograms are sorted by their sum of observations.

Please note that `sort` only affects the results of instant queries, as range query results always have a fixed output ordering.

## `sort_desc()`

Same as `sort`, but sorts in descending order.

Like `sort`, `sort_desc` only affects the results of instant queries, as range query results always have a fixed output ordering.

## `sort_by_label()`

**This function has to be enabled via the [feature flag](../feature_flags/) `--enable-feature=promql-experimental-functions`.**
Expand Down
Loading

0 comments on commit fce8e33

Please sign in to comment.