-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf] cache nix.searchSystem #1546
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
378c267
to
a0d9a8f
Compare
a0d9a8f
to
d5cdee7
Compare
d5cdee7
to
7e75d9f
Compare
internal/devpkg/package.go
Outdated
@@ -284,8 +284,25 @@ func (p *Package) NormalizedPackageAttributePath() (string, error) { | |||
return p.normalizedPackageAttributePathCache, nil | |||
} | |||
|
|||
// search attempts to find the package in the local nixpkgs. It may be an expensive | |||
// call, if it is has not been cached. | |||
func (p *Package) search(query string) (map[string]*nix.Info, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is really weird. Having the package as a receiver and also requiring a query. I would suggest a different API, but I'm not sure we need this function at all. See comments below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow why its that's weird?
- We use the receiver's properties to determine how to execute the search.
- The query is needed to do the search.
Maybe it can be renamed to searchNixPackageLocally
or something more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird because you are searching for the query (which is determined by the package), but tangentially using the package information to determine how the search will happen as well.
It's a bit like
mysql := devpkg.New("mysql")
mysql.Search("mysql")
- Search should not have a receiver OR should not have a query param.
I'm not sure renaming it addresses the concern.
An alternative:
func (p *Package) lookupNixInfo() (map[string]*nix.Info, error) {
var query string
if p.IsDevboxPackage() {
if p.isVersioned() {
entry, err := p.lockfile.Resolve(p.Raw)
if err != nil {
return "", err
}
query = entry.Resolved
} else {
query = p.lockfile.LegacyNixpkgsPath(p.String())
}
} else {
query = p.String()
}
return nix.Search(query)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the concern. Your analogy is not accurate.
I'll just inline it.
We can't use a function like yours above because we need the query
later in normalizePackageAttributePath
internal/devpkg/package.go
Outdated
return nix.SearchNixpkgsAttribute(query) | ||
} | ||
|
||
// fallback to the slow but generalized nix.Search | ||
return nix.Search(query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the distinction between devbox packages and flakes? Can't you cache the flakes as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm being conservative in this PR. I'll see if it makes sense for flakes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot use nix.Search
caching for flakes, local flakes at least. For now, I'd lean towards being conservative and just support queries of the form nixpkgs/commit#attribute
To illustrate the problem, consider the examples/flakes/php
. This has a package "path:my-php-flake#php"
The query
for this package is of the form: <absolute path on laptop>/devbox/examples/flakes/php/my-php-flake#php
Originally, that flake exported:
php = pkgs.php.withExtensions ({ enabled, all }: enabled ++ (with all; [ ds ]));
and the infos
returned from nix.Search
is map[packages.x86_64-darwin.php:php-with-extensions-8.1.19]
However, inside the package's flake.nix, I can make this edit:
php = pkgs.cowsay;
and now the infos
is map[packages.x86_64-darwin.php:cowsay-3.7.0]
for the same query
.
internal/nix/search.go
Outdated
if strings.HasPrefix(url, "runx:") { | ||
// TODO implement runx search | ||
return map[string]*Info{}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is wrong, but I'm not sure we should fix it in this PR?
// queries of the form `nixpkgs/<commit-hash>#attribute`, we can know for sure that | ||
// once `nix search` returns a valid result, it will always be the very same result. | ||
// Hence we can cache it locally and answer future queries fast, by not calling `nix search`. | ||
func SearchNixpkgsAttribute(query string) (map[string]*Info, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just CachedSearch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the reasons explained in the comment.
I wanted to keep it specific to search queries I have high confidence should always resolve to the very same result. The nix search
api accepts installable regex
queries which this doesn't. Hence I chose a very specific name to indicate what scenarios this works with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above about flake-reference queries not being suitable to be cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we throw in a regex to verify that the input query is ^github:NixOS/nixpkgs/[0-9a-f]{40}#
? Just to make sure we don't end up caching a ref like github:NixOS/nixpkgs/nixpkgs-unstable#go_1_19
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, done
I had a few ideas I wanted to throw out there in case they help with the performance:
|
@savil I just realized the reason this is super slow for you it's because it's downloading I still think caching is a good idea, but now I think how we do profile comparisons for packages with store path is broken. We should not be running Just to give you an example, if our lockfile includes store paths, then simply comparing the store path on the nix profile and the store path in the lockfile is sufficient. @gcurtis, for more context: we use search to find a normalized way to compare packages in devbox.json with packages in the nix profile. Nix has a set of rules of how it determines the default output. To avoid re-implementing those rules we rely on search. If two flake references with outputs are searched we can compare the results and that way we know we are using same rules as nix. For example:
Are both the same. Similarly, nixpkgs can be referred to using Using
|
Ah, no, this is not accurate. In my testing, I had disabled remove-nixpkgs, removed For remove-nixpkgs, @ipince and I added various higher-level functions (like |
Directly-installed packages slowing things down makes sense. Even if it's not downloading a new nixpkgs commit, just evaluating the package for the first time can take a while. Taking a step back, why are we comparing packages in the profile to the ones in devbox.json? I'm a little unsure about adding caching, because that's what Nix is supposed to do. We should just have the devbox-generated flake refer to the packages and save them to a file. Here's a small example of a flake that could be generated to track Devbox dependencies: {
description = "A Devbox environment";
outputs = { self, nixpkgs }: {
packages.aarch64-darwin.default = derivation {
name = "devbox-env";
system = "aarch64-darwin";
builder = "/bin/bash";
args = let
script = builtins.toFile "builder.sh" ''
echo "$hello
$go" > $out
'';
in [ "${script}" ];
# Note the packages don't need to be nixpkgs attributes. They could also
# be direct store paths or `fetchClosure` calls.
hello = nixpkgs.legacyPackages.aarch64-darwin.hello;
go = nixpkgs.legacyPackages.aarch64-darwin.go;
};
};
} If you run // $ nix path-info --json | jq
[
{
"deriver": "/nix/store/kpl2nb2b3xwkyw0ma4d4qnxvsz3zllgh-devbox-env.drv",
"narHash": "sha256-KzYFyXkgG8kWpdHNABX/zu6qchMP7TWylosl8/zirOk=",
"narSize": 224,
"path": "/nix/store/c9046f6p7dg0v248cvrfhlqbj5bsy544-devbox-env",
"references": [
"/nix/store/c4n3i1nfhvcim30s4ij502cw84yl4vdn-hello-2.12.1",
"/nix/store/dpnvxm37r1z9vk77mdc3h68b95kxvsji-go-1.20.8"
],
"registrationTime": 1697048631,
"ultimate": true,
"valid": true
}
] |
c0bc050
to
81a06d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeland73 switched to using filecache
. I've left the API as nix.SearchNixpkgsAttribute
- This is a specialized api for queries of the form
nixpkgs/<commit>#attribute
- I want it distinct from
nix.Search
which is the more general API that supports all queries (nixpkgs/<commit>#attribute
, all other flake references, etc.) - open to alternate names, but not to having "cache" in it since that doesn't tell users when to use it, and users shouldn't need to worry about it being cached or not.
@gcurtis I hear your concerns. However, I'm a bit reluctant to change the machinery of how we're currently implementing all of with devpkg
and nix-profile. That would be a more invasive change.
To your concerns:
I'm a little unsure about adding caching, because that's what Nix is supposed to do.
nix search
is a very general API. This makes it hard for nix to internally optimize it. In our case, we are adding caching for a strict subset of it that we know is always guaranteed to return the same result: nixpkgs/<commit>#attribute
. We can do this because we know our app's specific query patterns and can exploit it. Seems reasonable to me.
We should just have the devbox-generated flake refer to the packages and save them to a file.
The problem with this is that we can't query it while incrementally adding or removing packages quickly (EDIT: actually, I have low confidence in what I said, perhaps there's a way. My bigger concern is limiting scope for now)
ping |
internal/nix/search.go
Outdated
// cacheKey sanitizes the search query to be a valid unix filename. | ||
// This cache key is used as the filename to store the cache value, and having a | ||
// representation of the query is important for debuggability. | ||
func cacheKey(input string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we handle collisions for queries that have sanitized characters? Using the infamous emacsPackages.@
package as an example, the input would get cached under the key emacsPackages.
. Does that mean devbox add emacsPackages.
would install that package if it was in the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh good catch. I added that @
to the "replace disallowed chars with underscores"
I tried searching the other non-alphanumeric-characters on my standard American keyboard, and no packages showed up so I think its okay (practically speaking).
Theoretically speaking this scheme is not collision-resistant since two packages with foo-<disallowed-char-number-one>
and foo-<disallowed-char-number-two>
would have a collision in the sanitized name. Same for the name-length-truncation we do.
// queries of the form `nixpkgs/<commit-hash>#attribute`, we can know for sure that | ||
// once `nix search` returns a valid result, it will always be the very same result. | ||
// Hence we can cache it locally and answer future queries fast, by not calling `nix search`. | ||
func SearchNixpkgsAttribute(query string) (map[string]*Info, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we throw in a regex to verify that the input query is ^github:NixOS/nixpkgs/[0-9a-f]{40}#
? Just to make sure we don't end up caching a ref like github:NixOS/nixpkgs/nixpkgs-unstable#go_1_19
.
81a06d1
to
0dbfe03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! no blocking comments.
} | ||
|
||
// read as: filecache.NeedsUpdate(err) | ||
// TODO savil: this should be implemented in the filecache package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree!
internal/nix/search.go
Outdated
|
||
// read as: filecache.NeedsUpdate(err) | ||
// TODO savil: this should be implemented in the filecache package | ||
func filecacheNeedsUpdate(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I would just name this isNotFoundOrExpired
Maybe in filecache, using new errors.Join
we can return an error that is both NotFound and also something like CacheMiss
so return errors.Join(NotFound, CacheMiss)
and then here we just check if it is CacheMiss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
named it as: filecacheIsCacheMiss
// nix.Search will guarantee that the package exists for the current system. | ||
var infos map[string]*nix.Info | ||
var err error | ||
if p.IsDevboxPackage() && !p.IsRunX() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! this feels really understandable to me.
Summary
From observing times for adding and removing packages, I notice that a lot of time is spent in
syncPackagesToProfile
and most of that is fromnix.Search
, which is a fairly slow operation. However, from observing the results, it seems that these should be constant.For example, here are some search results keyed by the queries:
https://gist.github.com/savil/584c4d9bd6c468aaa8c289265287a23a
Correct me if wrong: all these seem to be constants that won't change, because the queries we run have the nixpkgs-commit-hash in them.
The general
nix search
command seems to be slower since the query it accepts is aninstallable regex
, which can match a large number of concrete queries. Furthermore, it needs to accept queries likenixpkgs#vim
and figure out which nixpkgs commit hash this would resolve to. In our case, our queries are much more specific being of the formnixpkgs/<commit>#attribute
How was it tested?
This effect is more pronounced for packages that are not using store-paths. This will affect all users who are still on nix < 2.17, which is a fairly large percentage for now.
I have a devbox global of about 43 packages. For the times below, I ran
devbox global add vim
(where vim is already in my /nix/store), and didexport DEVBOX_FEATURE_REMOVE_NIXPKGS=0
BEFORE:
syncPackagesToProfile takes about 1 min 30+ seconds
AFTER:
syncPackagesToProfile takes 500ms