-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy release 2024-10-10 #9341
Merged
Merged
Proxy release 2024-10-10 #9341
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The apt install stage before this commit: 0 upgraded, 391 newly installed, 0 to remove and 9 not upgraded. Need to get 261 MB of archives. after: 0 upgraded, 367 newly installed, 0 to remove and 9 not upgraded. Need to get 220 MB of archives.
Address minor technical debt in Layer inspired by #9224: - layer usage as arg same as in spans - avoid one Weak::upgrade
Because: - it's nice to be up-to-date, - we already had axum 0.7 in our dependency tree, so this avoids having to compile two versions, and - removes one of the remaining dpendencies to hyper version 0 Also bumps the 'tokio-tungstenite' dependency, to avoid having two versions in the dependency tree.
## Problem `Oversized vectored read [...]` logs are spewing in prod because we have a few keys that are unexpectedly large: * reldir/relblock - these are unbounded, so it's known technical debt * slru block - they can be a bit bigger than 128KiB due to storage format overhead ## Summary of changes * Bump threshold to 130KiB * Don't warn on oversized reldir and dbdir keys Closes #8967
Panic was triggered only when dump selected no timelines. sentry report: https://neondatabase.sentry.io/issues/5832368589/
…#9253) See [this comment](#8888 (comment)).
* I had to install `m4` in order to be able to run locally * The docs/docker.md was missing a pointer to where the compute node code is (Was originally on #8888 but I am pulling this out)
Add wrappers for a few commands that didn't have them before. Move the logic to generate tenant and timeline IDs from NeonCli to the callers, so that NeonCli is more purely just a type-safe wrapper around 'neon_local'.
In the passing, rename it to NeonLocalCli, to reflect that the binary is called 'neon_local'. Add wrapper for the 'timeline_import' command, eliminating the last raw call to the raw_cli() function from tests, except for a few in test_neon_cli.py which are about testing the 'neon_local' iteself. All the other calls are now made through the strongly-typed wrapper functions
…ds (#9195) This makes it more clear that the functions in NeonLocalCli are just typed wrappers around the corresponding 'neon_local' commands.
## Problem The S3 tests couldn't use SSO authentication for local tests against S3. ## Summary of changes Enable the `sso` feature of `aws-config`. Also run `cargo hakari generate` which made some updates to `workspace_hack`.
## Problem Secondary tenant heatmaps were always downloaded, even when they hadn't changed. This can be avoided by using a conditional GET request passing the `ETag` of the previous heatmap. ## Summary of changes The `ETag` was already plumbed down into the heatmap downloader, and just needed further plumbing into the remote storage backends. * Add a `DownloadOpts` struct and pass it to `RemoteStorage::download()`. * Add an optional `DownloadOpts::etag` field, which uses a conditional GET and returns `DownloadError::Unmodified` on match.
## Problem Creation of a timelines during a reconciliation can lead to unavailability if the user attempts to start a compute before the storage controller has notified cplane of the cut-over. ## Summary of changes Create timelines on all currently attached locations. For the latest location, we still look at the database (this is a previously). With this change we also look into the observed state to find *other* attached locations. Related #9144
NeonWALReader needs to know LSN before which WAL is not available locally, that is, basebackup LSN. Previously it was taken from WalpropShmemState, but that's racy, as walproposer sets its there only after successfull election. Get it directly with GetRedoStartLsn. Should fix flakiness of test_ondemand_wal_download_in_replication_slot_funcs etc. ref #9201
1. Adds local-proxy to compute image and vm spec 2. Updates local-proxy config processing, writing PID to a file eagerly 3. Updates compute-ctl to understand local proxy compute spec and to send SIGHUP to local-proxy over that pid. closes neondatabase/cloud#16867
Fixes (#9020) - Use the compute::COULD_NOT_CONNECT for connection error message; - Eliminate logging for one connection attempt; - Typo fix.
If peer safekeeper needs garbage collected segment it will be fetched now from s3 using on-demand WAL download. Reduces danger of running out of disk space when safekeeper fails.
## Problem See #9199 ## Summary of changes Fix update of hits/misses for LFC and prefetch introduced in 78938d1 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
#9266) rename console -> control_plane rename web -> console_redirect I think these names are a little more representative.
The prefetch-queue hash table uses a BufferTag struct as the hash key, and it's hashed using hash_bytes(). It's important that all the padding bytes in the key are cleared, because hash_bytes() will include them. I was getting compiler warnings like this on v14 and v15, when compiling with -Warray-bounds: In function ‘prfh_lookup_hash_internal’, inlined from ‘prfh_lookup’ at pg_install/v14/include/postgresql/server/lib/simplehash.h:821:9, inlined from ‘neon_read_at_lsnv’ at pgxn/neon/pagestore_smgr.c:2789:11, inlined from ‘neon_read_at_lsn’ at pgxn/neon/pagestore_smgr.c:2904:2: pg_install/v14/include/postgresql/server/storage/relfilenode.h:90:43: warning: array subscript ‘PrefetchRequest[0]’ is partly outside array bounds of ‘BufferTag[1]’ {aka ‘struct buftag[1]’} [-Warray-bounds] 89 | ((node1).relNode == (node2).relNode && \ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 90 | (node1).dbNode == (node2).dbNode && \ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ 91 | (node1).spcNode == (node2).spcNode) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pg_install/v14/include/postgresql/server/storage/buf_internals.h:116:9: note: in expansion of macro ‘RelFileNodeEquals’ 116 | RelFileNodeEquals((a).rnode, (b).rnode) && \ | ^~~~~~~~~~~~~~~~~ pgxn/neon/neon_pgversioncompat.h:25:31: note: in expansion of macro ‘BUFFERTAGS_EQUAL’ 25 | #define BufferTagsEqual(a, b) BUFFERTAGS_EQUAL(*(a), *(b)) | ^~~~~~~~~~~~~~~~ pgxn/neon/pagestore_smgr.c:220:34: note: in expansion of macro ‘BufferTagsEqual’ 220 | #define SH_EQUAL(tb, a, b) (BufferTagsEqual(&(a)->buftag, &(b)->buftag)) | ^~~~~~~~~~~~~~~ pg_install/v14/include/postgresql/server/lib/simplehash.h:280:77: note: in expansion of macro ‘SH_EQUAL’ 280 | #define SH_COMPARE_KEYS(tb, ahash, akey, b) (ahash == SH_GET_HASH(tb, b) && SH_EQUAL(tb, b->SH_KEY, akey)) | ^~~~~~~~ pg_install/v14/include/postgresql/server/lib/simplehash.h:799:21: note: in expansion of macro ‘SH_COMPARE_KEYS’ 799 | if (SH_COMPARE_KEYS(tb, hash, key, entry)) | ^~~~~~~~~~~~~~~ pgxn/neon/pagestore_smgr.c: In function ‘neon_read_at_lsn’: pgxn/neon/pagestore_smgr.c:2742:25: note: object ‘buftag’ of size 20 2742 | BufferTag buftag = {0}; | ^~~~~~ This commit silences those warnings, although it's not clear to me why the compiler complained like that in the first place. I found the issue with padding bytes while looking into those warnings, but that was coincidental, I don't think the padding bytes explain the warnings as such. In v16, the BUFFERTAGS_EQUAL macro was replaced with a static inline function, and that also silences the compiler warning. Not clear to me why.
…9031) Requires #9086 first to have `local_proxy_config`. This logic can still be reviewed implementation wise. Create JWT Auth functionality related roles without attributes and `neon_superuser` group. Read the JWT related roles from `local_proxy_config` `JWKS` settings and handle them differently than other console created roles.
…9294) In PostgreSQL v16, BUFFERTAGS_EQUAL was replaced with a static inline macro, BufferTagsEqual. Let's use the new name going forward, and have backwards-compatibility glue to allow using the new name on v14 and v15, rather than the other way round. This also makes BufferTagsEquals consistent with InitBufferTag, for which we were already using the new name.
I'm trying to debug a situation with the LR benchmark publisher not being in the correct state. This should aid in debugging, while just being generally useful. PR: #9265 Signed-off-by: Tristan Partin <tristan@neon.tech>
In short: Currently we reserve 75% of memory to the LFC, meaning that if we scale up to keep postgres using less than 25% of the compute's memory. This means that for certain memory-heavy workloads, we end up scaling much higher than is actually needed — in the worst case, up to 4x, although in practice it tends not to be quite so bad. Part of neondatabase/autoscaling#1030.
The neon_cli functions print the command that gets executed, which contains the same information. Before: 2024-10-07 22:32:28.884 INFO [neon_fixtures.py:3927] Stopping safekeeper 1 2024-10-07 22:32:28.884 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 1" 2024-10-07 22:32:28.989 INFO [neon_fixtures.py:3927] Stopping safekeeper 2 2024-10-07 22:32:28.989 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 2" 2024-10-07 22:32:29.93 INFO [neon_fixtures.py:3927] Stopping safekeeper 3 2024-10-07 22:32:29.94 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 3" 2024-10-07 22:32:29.251 INFO [neon_cli.py:450] Stopping pageserver with ['pageserver', 'stop', '--id=1'] 2024-10-07 22:32:29.251 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local pageserver stop --id=1" After: 2024-10-07 22:32:28.884 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 1" 2024-10-07 22:32:28.989 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 2" 2024-10-07 22:32:29.94 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 3" 2024-10-07 22:32:29.251 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local pageserver stop --id=1"
## Summary of changes CI: Collect stats for Github Workflows Runs
- PostGIS 3.5.0 - pgrouting 3.6.2 - h3 4.1.3 - unit 7.9 - pgjwt version (f3d82fd) - pg_hashids 1.2.1 - ip4r 2.4.2 - prefix 1.2.10 - postgresql-hll 2.18 - pg_roaringbitmap 0.5.4 - pg-semver 0.40.0 update support of extensions for v14-v16: - unit 7.7 -> 7.9 - pgjwt 9742dab -> f3d82fd --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
This seems to paper over a behavioral difference in Python 3.9 and Python 3.12 with how dataclasses work with mutable variables. On Python 3.12, I get the following error: ValueError: mutable default <class 'dict'> for field EXTRACTORS is not allowed: use default_factory This obviously doesn't occur in our testing environment. When I do what the error tells me, EXTRACTORS doesn't seem to exist as an attribute on the class in at least Python 3.9. The solution provided in this commit seems like the least amount of friction to keep the wheels turning. Signed-off-by: Tristan Partin <tristan@neon.tech>
Fixes some types, adds some types, and adds some override annotations. Signed-off-by: Tristan Partin <tristan@neon.tech>
… port (#9298) neondatabase/cloud#18349 Use the `-local-proxy` suffix to make sure we get the 10432 local_proxy port back from cplane.
vipvap
requested review from
jcsp,
conradludgate,
knizhnik,
petuhovskiy and
clipperhouse
and removed request for
a team
October 10, 2024 06:02
5094 tests run: 4887 passed, 0 failed, 207 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
306094a at 2024-10-10T06:51:29.975Z :recycle: |
cloneable
requested review from
cloneable
and removed request for
conradludgate
October 10, 2024 07:13
cloneable
approved these changes
Oct 10, 2024
Reviewed for changelog |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proxy release 2024-10-10
Please merge this Pull Request using 'Create a merge commit' button