Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] cache nix.searchSystem #1546

Merged
merged 6 commits into from
Oct 27, 2023
Merged

[perf] cache nix.searchSystem #1546

merged 6 commits into from
Oct 27, 2023

Conversation

savil
Copy link
Collaborator

@savil savil commented Oct 9, 2023

Summary

From observing times for adding and removing packages, I notice that a lot of time is spent in syncPackagesToProfile and most of that is from nix.Search, which is a fairly slow operation. However, from observing the results, it seems that these should be constant.

For example, here are some search results keyed by the queries:
https://gist.github.com/savil/584c4d9bd6c468aaa8c289265287a23a
Correct me if wrong: all these seem to be constants that won't change, because the queries we run have the nixpkgs-commit-hash in them.

The general nix search command seems to be slower since the query it accepts is an installable regex, which can match a large number of concrete queries. Furthermore, it needs to accept queries likenixpkgs#vim and figure out which nixpkgs commit hash this would resolve to. In our case, our queries are much more specific being of the form nixpkgs/<commit>#attribute

How was it tested?

This effect is more pronounced for packages that are not using store-paths. This will affect all users who are still on nix < 2.17, which is a fairly large percentage for now.

I have a devbox global of about 43 packages. For the times below, I ran devbox global add vim (where vim is already in my /nix/store), and did export DEVBOX_FEATURE_REMOVE_NIXPKGS=0

BEFORE:
syncPackagesToProfile takes about 1 min 30+ seconds

AFTER:
syncPackagesToProfile takes 500ms

Copy link
Collaborator Author

savil commented Oct 9, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@savil savil force-pushed the savil/cache-nix-search branch 2 times, most recently from 378c267 to a0d9a8f Compare October 9, 2023 23:36
@savil savil force-pushed the savil/cache-nix-search branch from a0d9a8f to d5cdee7 Compare October 9, 2023 23:40
@savil savil marked this pull request as ready for review October 9, 2023 23:41
internal/nix/search.go Outdated Show resolved Hide resolved
@savil savil marked this pull request as draft October 10, 2023 13:04
@savil savil force-pushed the savil/cache-nix-search branch from d5cdee7 to 7e75d9f Compare October 10, 2023 18:51
@savil savil marked this pull request as ready for review October 10, 2023 18:52
@savil savil removed the request for review from LucilleH October 10, 2023 20:51
@@ -284,8 +284,25 @@ func (p *Package) NormalizedPackageAttributePath() (string, error) {
return p.normalizedPackageAttributePathCache, nil
}

// search attempts to find the package in the local nixpkgs. It may be an expensive
// call, if it is has not been cached.
func (p *Package) search(query string) (map[string]*nix.Info, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is really weird. Having the package as a receiver and also requiring a query. I would suggest a different API, but I'm not sure we need this function at all. See comments below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow why its that's weird?

  • We use the receiver's properties to determine how to execute the search.
  • The query is needed to do the search.

Maybe it can be renamed to searchNixPackageLocally or something more specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird because you are searching for the query (which is determined by the package), but tangentially using the package information to determine how the search will happen as well.

It's a bit like

mysql := devpkg.New("mysql")
mysql.Search("mysql")
  • Search should not have a receiver OR should not have a query param.

I'm not sure renaming it addresses the concern.

An alternative:

func (p *Package) lookupNixInfo() (map[string]*nix.Info, error) {
        var query string
	if p.IsDevboxPackage() {
		if p.isVersioned() {
			entry, err := p.lockfile.Resolve(p.Raw)
			if err != nil {
				return "", err
			}
			query = entry.Resolved
		} else {
			query = p.lockfile.LegacyNixpkgsPath(p.String())
		}
	} else {
		query = p.String()
	}
        return nix.Search(query)
}

Copy link
Collaborator Author

@savil savil Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the concern. Your analogy is not accurate.

I'll just inline it.

We can't use a function like yours above because we need the query later in normalizePackageAttributePath

internal/devpkg/package.go Outdated Show resolved Hide resolved
Comment on lines 297 to 301
return nix.SearchNixpkgsAttribute(query)
}

// fallback to the slow but generalized nix.Search
return nix.Search(query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the distinction between devbox packages and flakes? Can't you cache the flakes as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm being conservative in this PR. I'll see if it makes sense for flakes...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use nix.Search caching for flakes, local flakes at least. For now, I'd lean towards being conservative and just support queries of the form nixpkgs/commit#attribute

To illustrate the problem, consider the examples/flakes/php. This has a package "path:my-php-flake#php"

The query for this package is of the form: <absolute path on laptop>/devbox/examples/flakes/php/my-php-flake#php

Originally, that flake exported:

php = pkgs.php.withExtensions ({ enabled, all }: enabled ++ (with all; [ ds ]));

and the infos returned from nix.Search is map[packages.x86_64-darwin.php:php-with-extensions-8.1.19]

However, inside the package's flake.nix, I can make this edit:

 php = pkgs.cowsay;

and now the infos is map[packages.x86_64-darwin.php:cowsay-3.7.0] for the same query.

Comment on lines 33 to 36
if strings.HasPrefix(url, "runx:") {
// TODO implement runx search
return map[string]*Info{}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is wrong, but I'm not sure we should fix it in this PR?

// queries of the form `nixpkgs/<commit-hash>#attribute`, we can know for sure that
// once `nix search` returns a valid result, it will always be the very same result.
// Hence we can cache it locally and answer future queries fast, by not calling `nix search`.
func SearchNixpkgsAttribute(query string) (map[string]*Info, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just CachedSearch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reasons explained in the comment.

I wanted to keep it specific to search queries I have high confidence should always resolve to the very same result. The nix search api accepts installable regex queries which this doesn't. Hence I chose a very specific name to indicate what scenarios this works with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above about flake-reference queries not being suitable to be cached.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we throw in a regex to verify that the input query is ^github:NixOS/nixpkgs/[0-9a-f]{40}#? Just to make sure we don't end up caching a ref like github:NixOS/nixpkgs/nixpkgs-unstable#go_1_19.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done

internal/nix/search.go Outdated Show resolved Hide resolved
@gcurtis
Copy link
Collaborator

gcurtis commented Oct 11, 2023

I had a few ideas I wanted to throw out there in case they help with the performance:

  • Do we use nix search just to determine if a flakeref valid? If so, would running nix eval --raw nixpkgs#one nixpkgs#two ... once with all the packages speed things up?
  • Are we evaluating the flakerefs in pure mode? That would let Nix leverage its own cache.
  • Nix 2.18 has a new built-in for parsing flake refs. Not sure if that would help at all here.

@mikeland73
Copy link
Contributor

@savil I just realized the reason this is super slow for you it's because it's downloading nixpkgs. We are searching for the "resolved" packages. Previously nixpkgs always existed because you could not install the package otherwise. But now that we allow installing packages without nixpkgs, this function will needlessly download nixpkgs.

I still think caching is a good idea, but now I think how we do profile comparisons for packages with store path is broken. We should not be running search on a reference that includes nixpkgs if we don't intend on installing the package using nixpkgs

Just to give you an example, if our lockfile includes store paths, then simply comparing the store path on the nix profile and the store path in the lockfile is sufficient.

@gcurtis, for more context:

we use search to find a normalized way to compare packages in devbox.json with packages in the nix profile.

Nix has a set of rules of how it determines the default output. To avoid re-implementing those rules we rely on search. If two flake references with outputs are searched we can compare the results and that way we know we are using same rules as nix.

For example:

github:F1bonacc1/process-compose#process-compose
github:F1bonacc1/process-compose

Are both the same.

Similarly, nixpkgs can be referred to using github, nixpkgs or as a devbox package. This function allows us to compare all of these correctly.

Using eval could work, but in practice can be slower (~600ms) than search. I think the solution is:

  • We need to ensure we never download nixpkgs for a package that was installed via store path
  • Add caching to avoid slowish calls to nix search

@savil
Copy link
Collaborator Author

savil commented Oct 11, 2023

@savil I just realized the reason this is super slow for you it's because it's downloading nixpkgs. We are searching for the "resolved" packages. Previously nixpkgs always existed because you could not install the package otherwise. But now that we allow installing packages without nixpkgs, this function will needlessly download nixpkgs.

I still think caching is a good idea, but now I think how we do profile comparisons for packages with store path is broken. We should not be running search on a reference that includes nixpkgs if we don't intend on installing the package using nixpkgs

Just to give you an example, if our lockfile includes store paths, then simply comparing the store path on the nix profile and the store path in the lockfile is sufficient.

Ah, no, this is not accurate. In my testing, I had disabled remove-nixpkgs, removed .devbox, and installed all packages in the devbox.json. All timings are after doing that setup and then doing devbox add.

For remove-nixpkgs, @ipince and I added various higher-level functions (like NixProfileItem.Matches) so packages with store-paths don't call this function. Otherwise, you'd be witnessing "Downloading nixpkgs" a lot more!

@gcurtis
Copy link
Collaborator

gcurtis commented Oct 11, 2023

Directly-installed packages slowing things down makes sense. Even if it's not downloading a new nixpkgs commit, just evaluating the package for the first time can take a while.

Taking a step back, why are we comparing packages in the profile to the ones in devbox.json? I'm a little unsure about adding caching, because that's what Nix is supposed to do. We should just have the devbox-generated flake refer to the packages and save them to a file.

Here's a small example of a flake that could be generated to track Devbox dependencies:

{
  description = "A Devbox environment";

  outputs = { self, nixpkgs }: {
    packages.aarch64-darwin.default = derivation {
      name = "devbox-env";
      system = "aarch64-darwin";
      builder = "/bin/bash";
      args = let
        script = builtins.toFile "builder.sh" ''
          echo "$hello
          $go" > $out
        '';
        in [ "${script}" ];

      # Note the packages don't need to be nixpkgs attributes. They could also
      # be direct store paths or `fetchClosure` calls.
      hello = nixpkgs.legacyPackages.aarch64-darwin.hello;
      go = nixpkgs.legacyPackages.aarch64-darwin.go;
    };
  };
}

If you run nix path-info it spits out all the references:

// $ nix path-info --json | jq
[
  {
    "deriver": "/nix/store/kpl2nb2b3xwkyw0ma4d4qnxvsz3zllgh-devbox-env.drv",
    "narHash": "sha256-KzYFyXkgG8kWpdHNABX/zu6qchMP7TWylosl8/zirOk=",
    "narSize": 224,
    "path": "/nix/store/c9046f6p7dg0v248cvrfhlqbj5bsy544-devbox-env",
    "references": [
      "/nix/store/c4n3i1nfhvcim30s4ij502cw84yl4vdn-hello-2.12.1",
      "/nix/store/dpnvxm37r1z9vk77mdc3h68b95kxvsji-go-1.20.8"
    ],
    "registrationTime": 1697048631,
    "ultimate": true,
    "valid": true
  }
]

@savil savil force-pushed the savil/cache-nix-search branch 2 times, most recently from c0bc050 to 81a06d1 Compare October 24, 2023 13:00
@savil savil requested a review from mikeland73 October 24, 2023 13:03
Copy link
Collaborator Author

@savil savil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikeland73 switched to using filecache. I've left the API as nix.SearchNixpkgsAttribute

  • This is a specialized api for queries of the form nixpkgs/<commit>#attribute
  • I want it distinct from nix.Search which is the more general API that supports all queries (nixpkgs/<commit>#attribute, all other flake references, etc.)
  • open to alternate names, but not to having "cache" in it since that doesn't tell users when to use it, and users shouldn't need to worry about it being cached or not.

@gcurtis I hear your concerns. However, I'm a bit reluctant to change the machinery of how we're currently implementing all of with devpkg and nix-profile. That would be a more invasive change.

To your concerns:

I'm a little unsure about adding caching, because that's what Nix is supposed to do.

nix search is a very general API. This makes it hard for nix to internally optimize it. In our case, we are adding caching for a strict subset of it that we know is always guaranteed to return the same result: nixpkgs/<commit>#attribute. We can do this because we know our app's specific query patterns and can exploit it. Seems reasonable to me.

We should just have the devbox-generated flake refer to the packages and save them to a file.

The problem with this is that we can't query it while incrementally adding or removing packages quickly (EDIT: actually, I have low confidence in what I said, perhaps there's a way. My bigger concern is limiting scope for now)

@savil
Copy link
Collaborator Author

savil commented Oct 25, 2023

ping

// cacheKey sanitizes the search query to be a valid unix filename.
// This cache key is used as the filename to store the cache value, and having a
// representation of the query is important for debuggability.
func cacheKey(input string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we handle collisions for queries that have sanitized characters? Using the infamous emacsPackages.@ package as an example, the input would get cached under the key emacsPackages.. Does that mean devbox add emacsPackages. would install that package if it was in the cache?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh good catch. I added that @ to the "replace disallowed chars with underscores"

I tried searching the other non-alphanumeric-characters on my standard American keyboard, and no packages showed up so I think its okay (practically speaking).

Theoretically speaking this scheme is not collision-resistant since two packages with foo-<disallowed-char-number-one> and foo-<disallowed-char-number-two> would have a collision in the sanitized name. Same for the name-length-truncation we do.

// queries of the form `nixpkgs/<commit-hash>#attribute`, we can know for sure that
// once `nix search` returns a valid result, it will always be the very same result.
// Hence we can cache it locally and answer future queries fast, by not calling `nix search`.
func SearchNixpkgsAttribute(query string) (map[string]*Info, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we throw in a regex to verify that the input query is ^github:NixOS/nixpkgs/[0-9a-f]{40}#? Just to make sure we don't end up caching a ref like github:NixOS/nixpkgs/nixpkgs-unstable#go_1_19.

@savil savil force-pushed the savil/cache-nix-search branch from 81a06d1 to 0dbfe03 Compare October 26, 2023 21:25
Copy link
Contributor

@mikeland73 mikeland73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! no blocking comments.

internal/nix/search.go Outdated Show resolved Hide resolved
internal/nix/search.go Outdated Show resolved Hide resolved
}

// read as: filecache.NeedsUpdate(err)
// TODO savil: this should be implemented in the filecache package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree!


// read as: filecache.NeedsUpdate(err)
// TODO savil: this should be implemented in the filecache package
func filecacheNeedsUpdate(err error) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I would just name this isNotFoundOrExpired

Maybe in filecache, using new errors.Join we can return an error that is both NotFound and also something like CacheMiss

so return errors.Join(NotFound, CacheMiss)

and then here we just check if it is CacheMiss

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that sgtm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

named it as: filecacheIsCacheMiss

// nix.Search will guarantee that the package exists for the current system.
var infos map[string]*nix.Info
var err error
if p.IsDevboxPackage() && !p.IsRunX() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! this feels really understandable to me.

@savil savil merged commit 00ee8fb into main Oct 27, 2023
@savil savil deleted the savil/cache-nix-search branch October 27, 2023 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants