Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: support buffers greater than 2^32 bytes in length in Buffer.concat and Buffer.copy #55492

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

duncpro
Copy link
Contributor

@duncpro duncpro commented Oct 22, 2024

Closes #55422

This pull request...

  1. Adds support for Buffer.copy with lengths greater than or equal to 2^32 bytes.
  2. Adds support for Buffer.concat where the result buffer has length greater than or equal to 2^32 bytes.
  3. Improves the docs for Buffer.copy.
    • They did not mention the behavior when the source slice cannot fit into the target slice.
    • They did not mention the behavior when sourceEnd exceeds the total number of bytes in the source buffer.
  4. Refactors implementation of Buffer.copy and Buffer.concat.
    • Adds asserts verifying its impossible to copy bytes from outside the source buffer.
    • Adds asserts verifying it's impossible to copy bytes anywhere except the target buffer.

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Oct 22, 2024
@avivkeller
Copy link
Member

avivkeller commented Oct 22, 2024

Hi! Can you please amend your commit message to follow the guidelines of this repository? It needs to be under 80 characters

@avivkeller
Copy link
Member

Additionally, can you add a test to validate this behavior

@duncpro
Copy link
Contributor Author

duncpro commented Oct 22, 2024

I noticed that parallel/test-stream-readable-unpipe-resume is failing on Ubuntu CI, which is strange because it's not failing on my Mac. I'll investigate.

@avivkeller
Copy link
Member

avivkeller commented Oct 22, 2024

I noticed that parallel/test-stream-readable-unpipe-resume is failing on Ubuntu CI, which is strange because it's not failing on my Mac. I'll investigate.

No need, it's a flaky test. It's a known process deadlock that's being investigated.

@avivkeller avivkeller added the commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. label Oct 22, 2024
@avivkeller avivkeller requested a review from ronag October 22, 2024 17:54
@avivkeller
Copy link
Member

@ronag I've requested your review because you were involved in the issue this closes, and you know a lot about buffers that I don't

@duncpro
Copy link
Contributor Author

duncpro commented Oct 22, 2024

Don't merge this yet. It's possible to read bytes from outside the source buffer into the target buffer.
The current implementation _copyActual does not check for negatives in source and target.

For example...

function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
  if (sourceEnd - sourceStart > target.byteLength - targetStart)
    sourceEnd = sourceStart + target.byteLength - targetStart;

  let nb = sourceEnd - sourceStart;
  const sourceLen = source.byteLength - sourceStart;
  if (nb > sourceLen)
    nb = sourceLen;

  if (nb <= 0)
    return 0;

  // _copy(source, target, targetStart, sourceStart, nb);

  // return nb;

   return [source, target, targetStart, sourceStart, nb];
}

let src = Buffer.alloc(50);
let dest = Buffer.alloc(100);
const [_, __, ___, sourceStart, copy_len] = _copyActual(src, dest, 0, -10, 0); 
console.log([sourceStart, copy_len]); // [ -10, 10 ]

Its probably better here not to trust the caller of _copyActual and to guard the call to _copy with explicit checks.
Our choice is to either throw an error, or to silently clamp the bounds. I prefer the former.

Resolved by: 3c18a13

@avivkeller avivkeller added the wip Issues and PRs that are still a work in progress. label Oct 22, 2024
@avivkeller
Copy link
Member

@nodejs/buffer PTAL ☝️

@duncpro
Copy link
Contributor Author

duncpro commented Oct 22, 2024

I've updated _copyActual to guard _copy (native binding here) against negative numbers, so there is no risk of potentially allowing out of bounds access in the future.

The expected behavior for an out of bounds copy is actually specified by test/parallel/test-buffer-copy.js.

At the latest commit 3c18a13, test/parallel/test-buffer-copy.js is passing (along with everything else).

Copy link

codecov bot commented Oct 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.41%. Comparing base (7bc3e16) to head (5ca703f).
Report is 56 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main   #55492    +/-   ##
========================================
  Coverage   88.41%   88.41%            
========================================
  Files         653      654     +1     
  Lines      187435   187575   +140     
  Branches    36077    36081     +4     
========================================
+ Hits       165714   165851   +137     
+ Misses      14960    14954     -6     
- Partials     6761     6770     +9     
Files with missing lines Coverage Δ
lib/buffer.js 100.00% <100.00%> (ø)
src/node_buffer.cc 70.36% <100.00%> (+0.10%) ⬆️

... and 33 files with indirect coverage changes

@duncpro
Copy link
Contributor Author

duncpro commented Oct 23, 2024

Not sure what do about code coverage issue. It seems to me that the range check should stay in as preventative measure.

@avivkeller
Copy link
Member

No need to worry about codecov, it's not 100% accurate

lib/buffer.js Outdated

if (targetStart >= target.byteLength || sourceStart >= sourceEnd)
return 0;

// Guaranteed 0 <= sourceStart < sourceEnd (NO GUARANTEE <= source.byteLnegth)
// Guaranteed 0 <= targetStart < target.byteLength here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just make these asserts. The perf impact is negligable...

lib/buffer.js Outdated
return _copyActual(source, target, targetStart, sourceStart, sourceEnd);
}

// Assumes `source`, `target` are real buffers.
// Assumes 0 <= sourceStart <= sourceEnd
// Assumes 0 <= targetStart <= target.byteLength
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just make these asserts. The perf impact is negligable...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node/lib/buffer.js

Lines 271 to 275 in 209289c

assert(isUint8Array(source) && isUint8Array(target));
// Enforce: 0 <= sourceStart <= sourceEnd <= source.byteLength
assert(0 <= sourceStart && sourceStart <= sourceEnd && sourceEnd <= source.byteLength);
// Enforce: 0 <= targetStart<= target.byteLength
assert(0 <= targetStart && targetStart <= target.byteLength);

@rotemdan
Copy link

Turns out the issue also occurs in Buffer.copy. See my comment on the issue thread.

@duncpro duncpro changed the title fix: support concatenating buffers larger than 4GB on 64-bit systems fix: support buffers greater than 2^32 bytes in length in Buffer.concat and Buffer.copy Oct 24, 2024
@duncpro duncpro changed the title fix: support buffers greater than 2^32 bytes in length in Buffer.concat and Buffer.copy fix: support buffers greater than $2^32$ bytes in length in Buffer.concat and Buffer.copy Oct 24, 2024
@duncpro duncpro changed the title fix: support buffers greater than $2^32$ bytes in length in Buffer.concat and Buffer.copy fix: support buffers greater than 2^32 bytes in length in Buffer.concat and Buffer.copy Oct 24, 2024
lib/buffer.js Outdated
@@ -218,6 +221,7 @@ function copyImpl(source, target, targetStart, sourceStart, sourceEnd) {
if (targetStart < 0)
throw new ERR_OUT_OF_RANGE('targetStart', '>= 0', targetStart);
}
// Guaranteed 0 <= targetStart
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Shouldn't an assert be used instead?

If not, IMO a comment isn't needed at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is redundant. This comment has been removed as of commit 5ca703f.

Comment on lines 103 to 104
// Test its possible to concat buffers where the total length of the result in bytes
// is greater than 2^32.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Test its possible to concat buffers where the total length of the result in bytes
// is greater than 2^32.
// Ref: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is replaced with a reference to the issue as of commit 5ca703f.

@@ -234,3 +235,12 @@ assert.deepStrictEqual(c, b.slice(0, c.length));
// No copying took place:
assert.deepStrictEqual(c.toString(), 'C'.repeat(c.length));
}


// Test its possible to copy buffer with length > 2^32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is replaced with a reference to the issue as of commit 5ca703f.

@duncpro
Copy link
Contributor Author

duncpro commented Oct 30, 2024

@redyetidev This is no longer WIP unless you have anymore concerns or suggestions.

@BridgeAR BridgeAR added request-ci Add this label to start a Jenkins CI on a PR. and removed wip Issues and PRs that are still a work in progress. labels Oct 30, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 30, 2024
@nodejs-github-bot
Copy link
Collaborator

@duncpro
Copy link
Contributor Author

duncpro commented Nov 5, 2024

I believe CI is failing because the runner is exhausting its memory.

We do some pretty large allocations in the unit tests.

// Ref: https://github.com/nodejs/node/issues/55422#issue-2594486812
if (2 ** 32 + 1 <= kMaxLength) {
const src = Buffer.alloc(2 ** 32 + 1, 1);
const dst = Buffer.alloc(2 ** 32 + 1, 2);
src.copy(dst);
assert.deepStrictEqual(src, dst);
}

// Ref: https://github.com/nodejs/node/issues/55422#issue-2594486812
if (2 ** 32 + 1 <= kMaxLength) {
const a = Buffer.alloc(2 ** 31, 0);
const b = Buffer.alloc(2 ** 31, 1);
const c = Buffer.alloc(1, 2);
const destin = Buffer.concat([a, b, c]);
assert.deepStrictEqual(destin.subarray(0, 2 ** 31), a);
assert.deepStrictEqual(destin.subarray(2 ** 31, 2 ** 32), b);
assert.deepStrictEqual(destin.subarray(2 ** 32), c);
}

Perhaps the tests should be skipped?

// - The source slice is not clamped to fit into the target slice. If it won't fit, this throws.
// - If either the source or target slice are out of bounds, this throws.
function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
assert(isUint8Array(source) && isUint8Array(target));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert here is a bit redundant given the checks that are made in the copyImpl call site.

Copy link
Contributor Author

@duncpro duncpro Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this assert is useful since it makes _copyActual's invariant that source and target need to be Uint8Array explicit.

It indicates to any future/new maintainers needing to call _copyActual what the invariants are. They don't have to guess at the invariants, or deduce the preconditions by exploring all the call-sites of _copyActual.

@@ -98,3 +99,15 @@ assert.deepStrictEqual(
assert.deepStrictEqual(Buffer.concat([new Uint8Array([0x41, 0x42]),
new Uint8Array([0x43, 0x44])]),
Buffer.from('ABCD'));

// Ref: https://github.com/nodejs/node/issues/55422#issue-2594486812
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please consider expanding the comment a bit here with a short summary of what is being tested for so folks don't have to necessarily follow the link to get the gist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is better to have the description of the test inlined.

However I was under the impression that the policy was to use refs?

See #55492 (comment) where I actually removed the explanation at the request of another maintainer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. needs-ci PRs that need a full CI run.
Projects
None yet
7 participants