Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(record): prefer arrays over sets for lookups of blocked keys #68

Merged
merged 5 commits into from
Aug 7, 2023
Merged

refactor(record): prefer arrays over sets for lookups of blocked keys #68

merged 5 commits into from
Aug 7, 2023

Conversation

BastiDood
Copy link
Contributor

@BastiDood BastiDood commented Aug 6, 2023

Going from Set to Array

Following #67, this PR swaps the container of BLOCKED_KEYS from a Set into an Array. This is because although Set#has is $O(1)$, the constant coefficient for this is actually quite expensive due to underlying hashing algorithms. In comparison, the $O(n)$ function Array#includes should be faster for small arrays for the following reasons:

  1. Array#includes is a built-in method which does not perform any hashing.
  2. With its elements being collocated, arrays are typically friendlier to the CPU cache. This is not the case for sets whose elements are sparsely located in a buffer.

In the case of BLOCKED_KEYS, three elements are certainly sufficiently small to justify the linear search. See the discussion in #67 (comment) for more details and some benchmarks.

Short-circuiting

Along the way, I also inverted the logic in the forEach loops to accommodate for short-circuiting and fewer indentations. Moreover, I converted a forEach loop in record.ts to a for...of loop because the latter is generally considered to be faster.

@netlify
Copy link

netlify bot commented Aug 6, 2023

Deploy Preview for valibot canceled.

Name Link
🔨 Latest commit a184a77
🔍 Latest deploy log https://app.netlify.com/sites/valibot/deploys/64d0074dc73a02000856f5a4

@fabian-hiller
Copy link
Owner

Thank you for your contribution. Yes, we can change set to array.

Along the way, I also inverted the logic in the forEach loops to accommodate for short-circuiting and fewer indentations.

I prefer the way it was before. If something is in a condition, it is quite clear in which case it is executed.

Moreover, I converted a forEach loop in record.ts to a for...of loop because the latter is generally considered to be faster.

Do you have proof of this? I suspect it doesn't matter for Object.entries(). And if it is faster, I would change it not only here, but in the library in general, so that the code is consistent.

@fabian-hiller fabian-hiller self-assigned this Aug 6, 2023
@fabian-hiller fabian-hiller added the enhancement New feature or request label Aug 6, 2023
@BastiDood
Copy link
Contributor Author

BastiDood commented Aug 6, 2023

I prefer the way it was before. If something is in a condition, it is quite clear in which case it is executed.

Fair enough. This will be reverted ASAP.

Do you have proof of this? I suspect it doesn't matter for Object.entries(). And if it is faster, I would change it not only here, but in the library in general, so that the code is consistent.

I'll cite one of many articles in particular, but this is apparently a well known performance pitfall. I'll also go ahead and cite ThePrimeagen's streams, but I can't of course scroll through all the Twitch VODs right now.

Would you like to proceed with this refactor anyway?

@fabian-hiller
Copy link
Owner

Would you like to proceed with this refactor anyway?

I am not sure yet. I understand the performance advantage. However, I wonder if this approach is faster across the board or only after a certain number of object/array entries.

@BastiDood
Copy link
Contributor Author

Performance aside for a moment, I must say that I do prefer the for...of syntax over the forEach syntax simply because it semantically feels more "side effect-y". Normally, when we invoke methods on an Array, we chain together map and filter, which feels a lot like functional programming. However, forEach is a little different in that it goes against that expectation of purity.

So at least for the semantics and aesthetics, I can argue for the for...of.

Performance-wise, however, I can't say I'm surprised with the results in many benchmarks. The forEach method necessitates the creation of a brand new closure on top of what would be a for...of loop anyway. This is opposed to the for...of loop, which is already a primitive control flow mechanism in and of itself, which may be easier to optimize. Of course, this is only an educated guess and speculation. I do hope I got my point across, though.

@BastiDood
Copy link
Contributor Author

Hello there, I went ahead and implemented the refactor in a184a77. Let me know if you want me to git revert the change. 👍

Also, now that #71 has been merged, I rebased this branch on top of the latest main so that we can run CI here.

@vijayksingh
Copy link

Would you like to proceed with this refactor anyway?

I am not sure yet. I understand the performance advantage. However, I wonder if this approach is faster across the board or only after a certain number of object/array entries.

https://twitter.com/tmikov/status/1688271863087280128

There is some discussion regarding the same here.

@fabian-hiller
Copy link
Owner

fabian-hiller commented Aug 6, 2023

Before we revise the loops, I would prefer a benchmark that demonstrates a performance difference for each schema. For example, I'm not sure if for…of with .entries() is really faster.

@BastiDood
Copy link
Contributor Author

Hello! I am glad to report that after some benchmarking, I have observed considerable improvements in the average execution time.

Methodology

I created an untracked file named library/bench.mjs, which serves as the benchmark. Essentially, it pre-allocates 16K randomly generated objects that contains a number, a string, a date, and a dictionary of strings to numbers. Here, I used 16 key-value pairs for the record, which is admittedly an arbitrary decision.

import { getRandomValues } from 'node:crypto';
import { array, record, object, number, string, date } from './dist/index.js';

function randString() {
    const bytes = getRandomValues(new Uint8Array(21));
    return Buffer.from(bytes).toString('base64');
}

function randDict(n) {
    const dict = {};
    for (let i = 0; i < n; ++i)
        dict[randString()] = Math.random();
    return dict;
}

function randObj(dict) {
    return {
        test: Math.random(),
        name: randString(),
        expires: new Date,
        dict,
    };
}

function randArr(arrLen, dictLen) {
    const objs = Array(arrLen);
    for (let i = 0; i < arrLen; ++i)
        objs[i] = randObj(randDict(dictLen))
    return objs;
}

const ArrayOfObjects = array(object({
    test: number(),
    name: string(),
    expires: date(),
    dict: record(string(), number()),
}));

// Pre-allocate 16K randomly generated objects so that
// we don't incur the overhead during the benchmarking.
const objs = randArr(2 ** 14, 16);

// Warm up the optimization
ArrayOfObjects.parse(objs);

// Do the actual benchmark
const start = performance.now();
ArrayOfObjects.parse(objs);
const end = performance.now();

// Print the results
console.log(end - start);

I then ran the following Powershell script to run the benchmark ten times.

cd library

# Before PR
git switch main
pnpm build
for ($i = 0; $i -lt 10; ++$i) { node bench.mjs }

# After PR
git switch prefer-array
pnpm build
for ($i = 0; $i -lt 10; ++$i) { node bench.mjs }

Results

Important
Here are the relevant hardware specifications of my setup: Intel® Core™ i7-10750 CPU @ 2.60GHz (12 CPUs) with 8192 MB of RAM on Node.js v20.5.0 for Windows 10 Build 19045.3271.

All of the results below are in milliseconds.

Run Before PR After PR
1 1903.3630999922752 1895.031199991703
2 1900.3222999572754 1894.6241000294685
3 1902.9138000011444 1898.1823000311852
4 1938.1334999799728 1885.9580000042915
5 1922.8105999827385 1897.9362999796867
6 1920.4319999814034 1892.5608999729156
7 1895.236999988556 1885.3299000263214
8 1928.294000029564 1914.0356999635696
9 1901.0822999477386 1901.5369000434875
10 1928.0683000087738 1913.57800000906

We thus have the following average execution times in millseconds:

Before PR After PR
1914.065689986944 1897.877330005169

As we can see from the results, we have indeed gone faster by 16 milliseconds for 16K objects! That's effectively one frame in a 60 FPS render loop. Okay, of course in the real world, we wouldn't really spawn 16K objects in an array all at once, but this at least gives us an idea of the performance improvement nevertheless.

@fabian-hiller
Copy link
Owner

Thank you for your research. I plan to merge the PR. Purely out of interest, I would be interested to see if there is also a small difference if only 100 or 1,000 objects are used instead of 16K.

In a comparison between Valibot and Zod, Valibot was faster up to about 10 K runs on a loop and then Zod. Therefore, I would be interested to see how this benchmark changes with fewer objects.

@fabian-hiller fabian-hiller added the performance Its performance related label Aug 7, 2023
@BastiDood
Copy link
Contributor Author

I would be interested to see if there is also a small difference if only 100 or 1,000 objects are used instead of 16K.

I'll be completely honest with you. I did not expect the benchmarks to be this much better across the board. 😅

Elements Before PR After PR
128 6.1228900015354 6.0720500051975
1024 39.009030002356 36.450479996204
16384 1914.065689986944 1897.877330005169

I find the 1024 case to be the most fascinating because the range from 128-1024 sounds like a realistic figure in production (unlike the rather inflated 16K case). It seems that we get the biggest wins here. Or at least I think this is the point where the optimizing compiler really shows its true colors.

@fabian-hiller fabian-hiller merged commit b352de9 into fabian-hiller:main Aug 7, 2023
@BastiDood BastiDood deleted the prefer-array branch August 7, 2023 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Its performance related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants