-
-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf improvements #104
Perf improvements #104
Conversation
✅ Deploy Preview for valibot canceled.
|
Thank you very much for your contribution! At the moment I have to focus on the text of my bachelor thesis. As soon as I have more time, I will check and merge your changes. |
Just a quick notice about path if we're really trying to optimize the performance. Usually path is only needed in case of failures so that the path can be added to the issues. If this is the case, it can be allocated lazily only if the validation (or sub-validation) returns any issues instead of the current implementation where the path is allocated eagerly. This should yield some small performance improvements due to decreased number of allocations and gc overhead. If the failure types are something like type NestedIssue = {
pathItem: PathItem
issues: Issues
}
type LeafIssueWithErrorDetailsButNOPath = { ... }
type Issue = NestedIssue | LeafIssueWithErrorDetailsButNOPath
type Issues = Issue[] Then the // object validation pseudo
let issues: Issues | undefined
for each key in objectSchema {
const result = parseKeySchema(object[key])
if (result.issues) {
(issues = issues ?? []).push({ pathItem: new PathItem(key), issues: result.issues })
}
}
return issues ? { valid: false, issues } : { valid: true, value } Of course the issues structure (and paths along with it) must be flattened at the top level parse before returning the issues to the library user, but that should be very straightforward task to do (similar to the current That said, I'm not familiar with this library or its features so take this with grain of salt... 😄 EDIT: fixed the issue type, original one was incorrect |
Yes, that is correct. Thank you very much for pointing it out. I will take it into account for the final implementation. |
Yep that's a very good point, interested to see how much this speeds things up further |
the objects created using getPipeInfo() are only necessary when the parser actually has a non-empty pipe. Since it is very often the case that the pipe is empty, this common path can be optimized by calling getPipeInfo() only in cases where executePipe() is called with a non-empty pipe.
refactors 'path' to use 'class LazyPath' instead of PathItem[]. Issue paths are only needed when flattening errors, so eagerly constructing them for the ParseInfo objects passed to every parser results in a lot of unnecessary allocations. Skipping most of these allocations and evaluating the paths only when they are needed improves performance.
60cf3c5
to
0619381
Compare
Thank you for your contribution! Due to my bachelor thesis I have a lot to do at the moment. I'll try to check the changes next week at the latest. |
Sure, no pressure nor hurry, it's your library dude :) I just rebased and fixed the conflicts, nothing else. If I have extra time I can implement @milankinen 's suggested refactor as well, but it's kind of 50/50 whether I can find the time |
Found an hour of idle time and started shoveling, looking pretty promising... :D
@milankinen you absolute GOAT! 🐐 Apart from the performance gains I think this also makes the code a tad cleaner. I'm not quite finished with the refactor yet - must make sure all the tests pass and none of the public API is broken. |
A suggestion by @milankinen considerably improves the happy-path performance of valibot: Never allocate any issue paths at all until an issue actually occurs, and let the parent (e.g. object) handle the path, since it knows what child property (==key) the issue propagates from. The Issue type is refactored to contain three variants: - LeafIssue (e.g. regular issue details but no path) - NestedIssue (just the path and a list of child issues) - UnionIssue (special case of NestedIssue where each union variant propagates its potential issues)
Can you explain why it is necessary to make a distinction between |
Sure! We can obviously iterate on the data model, but the idea in a nutshell is to accurately model the different kinds of objects according to the information that is actually relevant to each ("make illegal states unrepresentable"):
I'm also thinking there might potentially be better names for these different types of |
Wow nicely done @jussisaurio 😮 👌!
Sorry my fault, different issue types was just my initial thought because of separation of concerns. 😅 It's not absolutely necessary to have different types of issues if the issues are flattened "in-place" in schemas. E.g. something like: const createIssue = (msg: string) => ({msg, path: []})
const appendIssues = (issues: Issue[], childIssues: Issue[], pathItem: PathItem) => {
issues.push(...childIssues.map(issue => issue.path.unshift(pathItem) && issue))
return issues
}
// string.ts
if (typeof value !== 'string') {
return createIssue('not a string')
}
// object.ts
if (result.issues) {
issues = appendIssues(issues, result.issues, key)
} In my opinion it's a bit cleaner and more robust to keep the issue structure immutable during the validation phase and construct the path afterwards instead of mutating (because of perf) the path on each validation level. But both should work. Perfwise probably no difference. |
I don't have time to think about it in detail at the moment. However, I want to keep the library as simple as possible, as this also reduces the bundle size, and splitting it into different issues seems a bit complicated. It is already possible to extract all important information with the current implementation, isn't it? A path is still needed for a The only thing that would make sense to me would be:
With the current implementation, the path is only constructed once an issue occurs, which is great. Does this lead to bugs with deeply nested issues? We may construct the path retroactively e.g. with |
Tomorrow or Sunday, I will try to release a new version with all performance improvements. For time reasons (because of my bachelor thesis), I will then weigh whether I build on this issue or implement the ideas independently for the time being. I appreciate your contribution and will mention you in the release notes. 🤝 |
@fabian-hiller you are free to do what you like ofc :) the important point is the lazy construction of paths. as you can see in the most recent commit i removed the LazyPath class entirely because its not needed at all. ParseInfo no longer contains path because it's only constructed when an issue actually arises. that is the key detail. wish you the best with your thesis, those are a lot of work! |
And yes in the most recent commit the path is constructed retroactively in the flatten() function. In non flattened issues the path is only the key or the index, because nested issues have an issues array, which may have their own issues array, etc. So you can always build the full path string from those as necessary. Again, this was just my shot at the implementation, but it does avoid doing any extra work on the paths before an issue actually happens. If you want you can play around with it in a branch to see how it works. But anyway, do whatever you think is best! |
Thank you both for the research and ideas. Since I wanted to do a few things differently, I implemented it on my own and added you guys as co-authors. Feel free to check the result. You may see something we can do better. The paths are now created lazy, and the pipeline is only executed when required. Also, I changed the object check to However, I kept the previous implementation of The changes helped to improve performance by about 30% in my test environment. That's great! Thanks a lot! |
Hey @fabian-hiller , nicely done! 🥳 I made a simpler version of the flat cachedEntries here, trying to be a bit more sensitive towards eventual bundle size and readability. It still gets a tad confusing in the Perf-wise: I'm still consistently getting a 6% perf boost using this strategy on typescript-runtime-type-benchmarks compared to current |
Thanks for the hint. For |
I guess many of (similar) changes in this PR are already incorpolated in the latest And I ran the benchmark suite for the following commits on my MacBook Air (M2 chip):
Benchmark results
Interestingly, in the above result, we see that this "flat cache" strategy consistently improves the runtime performance for both Bun and Node.js and the performance improvement is more significant for Bun (up to 18.9% for "wide, valid" case). We might want to consider merging this strategy as long as the maintainer thinks the code is readable & maintainable enough. |
Thank you @naruaway. I will think about it. |
Hey,
Recently popped into this issue thread to discuss a fairly significant perf improvement idea, but found out @fabian-hiller had already done almost exactly the same thing!
However, I just finished a 4 hour plane ride and I spent the whole flight looking for more perf improvements and found a few!
Sizeable optimizations
Both of these generally just involved doing less whenever possible:
pipe.length === 0
I'd say these contributed 90% to the total performance improvement in this PR.
Micro-optimizations
cachedEntries
for theobject
schema a flat array instead of an array of 2-tuples -- mainly just less indirection which is more friendly to the CPU.input?.constructor === Object
Here are the results on my machine using typescript-runtime-type-benchmarks:
This run amounts to +47% on
parseSafe
, +35% onparseStrict
.