-
-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite ChunkedMemoryStream #2828
Conversation
IMO the memory benefits were significant. With the current decoder & encoder design, non seekable streams are always fully buffered into memory. A switch to Recommendations:
|
This reverts commit 1e58db2.
Thanks for the review @antonfirsov I've instead chosen to completely rewrite the |
public override void Flush() | ||
{ | ||
_ = this.Read(this.singleByteBuffer, 0, 1); | ||
return MemoryMarshal.GetReference<byte>(this.singleByteBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the byte[]
path goes via AsSpan(), why not use the span approach directly?
Span<byte> buffer = stackalloc byte[1];
return Read(buffer) == 1 ? buffer[0] : -1
ideally with SkipLocalsInit
enabled
alt that elides a range check:
byte b = 0;
return Read(MemoryMarshal.CreateSpan(ref b, 1)) == 1 ? b : -1;
(you can't do this for the async path, though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... I'd completely forgotten about that method!
Rather than decorating a public mthod with [SkipLocalsInit]
i've opted for the following:
/// <inheritdoc/>
public override int ReadByte()
{
Unsafe.SkipInit(out byte b);
return this.Read(MemoryMarshal.CreateSpan(ref b, 1)) == 1 ? b : -1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better!
int offset = 0; | ||
int count = buffer.Length; | ||
while (count > 0) | ||
while (bytesToRead != 0 && this.currentChunk != this.memoryChunkBuffer.Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about the underlying implementation here; if this is doing additional downstream reads, you might prefer to exit after the first read (you're only required to read "some" data - you don't need to fill the supplied data, just return at least 1 byte or 0 for EOF); if all the data is already loaded, I wonder whether your memoryChunkBuffer
is duplicating the innards of ReadOnlySequence<byte>
- that already has all the Slice
, CopyTo
etc you might want; just a suggestion, though (I can help you grok ReadOnlySequence<T>
if you're not already familiar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looking at MemoryChunkBuffer
there is definitely some overlap. Ideally, I should be tracking the buffer and chunk indexes internally within that class.
However, this type needs to be expandable on-demand which AFAIK is not possible with ReadonlySequence<T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying buffer-chain is as mutable as you want it to be (it is your chain, ultimately); if you want to resize it, that is usually as simple as simply creating a new ROS (which is a lightweight struct just tracking the start and end), specifying the new bounds. The chain bits aren't trivial, but not too complex. I guess if what you already have works well, it might be overkill to touch it, though.
} | ||
|
||
return chunkBuffer.GetSpan()[this.readOffset++]; | ||
MemoryMarshal.Write(this.singleByteBuffer, ref value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto stackalloc
; or possibly even the more exotic:
byte b = 0;
var span = MemoryMarshal.CreateSpan(ref b, 1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <inheritdoc/>
public override void WriteByte(byte value)
=> this.Write(MemoryMarshal.CreateSpan(ref value, 1));
/// </summary> | ||
/// <returns>The <see cref="T:byte[]"/>.</returns> | ||
/// <returns>A new <see cref="T:byte[]"/>.</returns> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is very inefficient; I would suggest trying to deprecate this kind of API - especially if we can use ROS, for example;
[Obsolete("prefer " + nameof(AsReadOnlySequence)] public byte[] ToArray() => AsReadOnlySequence().ToArray();
public ReadOnlySequence<byte> AsReadOnlySequence() => /* magic happens */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we need this to when reading XMP data for the V3 build however I wish to rewrite the XMPProfile
type for V4 to avoid passing arrays around. Once this is merged to the V3 branch I'll upstream and make additional changes.
@JimBobSquarePants I believe I will have some time to also review this in the weekend. |
@@ -12,44 +14,23 @@ namespace SixLabors.ImageSharp.IO; | |||
/// Chunks are allocated by the <see cref="MemoryAllocator"/> assigned via the constructor | |||
/// and is designed to take advantage of buffer pooling when available. | |||
/// </summary> | |||
internal sealed class ChunkedMemoryStream : Stream | |||
public class ChunkedMemoryStream : Stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider reverting this back to being sealed. I.e. is this class designed to be sub-classed? If not then sealing can allow the JITter to make certain optimisations around the method calling of the virtual/override methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's supposed to be internal sealed
actually! I forgot to change it back after rewriting (public makes the IDE tell me to add method docs).
private readonly int allocatorCapacity; | ||
|
||
// Has the stream been disposed. | ||
private long length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: IMO it's fairly conventional in C# land to prefix private field names with an underscore, to allow easy distinction from local variables, and to avoid excessive use of this.
. I assume this is your personal preference, but thought I'd mention it as IMO it is somewhat non-idiomatic C#.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but I'd rather stick to using the language as designed rather than using conventions carried over from C.
The this
keyword provides important context IMO and encouraged consistancy throughout a codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colgreen the project's preferred coding style is based on Framework design guidelines, and on StyleCop recommendations. The guidelines explicitly prohibit prefixing variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guidelines state Internal and private fields are not covered by guidelines,
(I believe those guidelines are primarily related to public API surface, rather than private/internal naming). The underscore prefix for private fields is very common in my experience, e.g. it's used widely in Microsoft .NET repos.
However, this topic is probably not relevant in the context of this PR :)
// Has the stream been disposed. | ||
private long length; | ||
private long position; | ||
private int currentChunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names could be considered a little misleading/confusing.
E.g. currentChuck
is an index into memoryChunkBuffer
, so I think maybe call it currentChunkIdx
.
Whereas currentChunkIdx
is an offset/index /within/ the current chunk, so maybe call it intraChunkByteIdx
, chunkByteIdx
, or checkByteOffset
? etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.. good point. I've opted for bufferIndex
and chunkIndex
as an improvement.
chunk.Dispose(); | ||
chunk = chunk.Next; | ||
this.Dispose(true); | ||
GC.SuppressFinalize(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not necessary to call GC.SuppressFinalize(this)
in a sealed class with no finalizer. This would be to cover sub-types that have a finalizer (in scenarios where there is no finalizer defined directly on the type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force of habit. Well spotted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Previous reviewers had good points.
- Added some nitpicking.
- Improving funcitonal coverage would be valuable.
Looks good otherwise.
return i < 16 ? b128K * (1 << (int)((uint)i / 4)) : b4M; | ||
} | ||
|
||
private void Dispose(bool disposing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given https://github.com/SixLabors/ImageSharp/pull/2828/files#r1823357071, the Dispose(bool disposing)
is not even needed.
Same for MemoryChunk
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an override of the base Stream
method though. I can't implement Dispose
directly. All other implementations have been simplified now though.
if (remaining > count) | ||
{ | ||
remaining = count; | ||
} | ||
|
||
Span<byte> chunkBuffer = this.writeChunk.Buffer.GetSpan(); | ||
int chunkSize = this.writeChunk.Length; | ||
int count = buffer.Length; | ||
int offset = 0; | ||
while (count > 0) | ||
int bytesToWrite = (int)remaining; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remaining
is not being used after this line.
- if (remaining > count)
- {
- remaining = count;
- }
- int bytesToWrite = (int)remaining;
+ int bytesToWrite = count;
Same for Read
.
public IEnumerator<MemoryChunk> GetEnumerator() | ||
=> ((IEnumerable<MemoryChunk>)this.memoryChunks).GetEnumerator(); | ||
|
||
IEnumerator IEnumerable.GetEnumerator() | ||
=> ((IEnumerable)this.memoryChunks).GetEnumerator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any code enumerating this with foreach
, so IEnumerable
implementation can be deleted.
private readonly int allocatorCapacity; | ||
|
||
// Has the stream been disposed. | ||
private long length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colgreen the project's preferred coding style is based on Framework design guidelines, and on StyleCop recommendations. The guidelines explicitly prohibit prefixing variables.
@@ -30,7 +30,7 @@ public class ChunkedMemoryStreamTests | |||
[Fact] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to extend these tests to stress the corner(?) cases which were buggy in the previous implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Tests have been massivly exapanded. We now test reading larger buffers and test encoding to webp for all test images.
[InlineData(DefaultSmallChunkSize * 16)] | ||
public void MemoryStream_ReadByteBufferSpanTest(int length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also include lengths over DefaultSmallChunkSize * 16
and make buffer.Length
a parameter. Would test cases, when buffer.Length > DefaultSmallChunkSize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length
is already determined by the parameter but I've expanded to double the previous maximum length
@@ -167,18 +167,20 @@ public void MemoryStream_WriteToTests() | |||
[Fact] | |||
public void MemoryStream_WriteToSpanTests() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write tests could stress more cases with different sizes. See my comment on the read tests.
int offset = 0; | ||
int count = buffer.Length; | ||
while (count > 0) | ||
while (bytesToRead != 0 && this.currentChunk != this.memoryChunkBuffer.Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though current code looks good and Slice
would throw for a negative number, I find this safer to maintain. Same for Write
.
while (bytesToRead != 0 && this.currentChunk != this.memoryChunkBuffer.Length) | |
while (bytesToRead > 0 && this.currentChunk != this.memoryChunkBuffer.Length) |
@antonfirsov Are you happy with me pushing on with this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One small suggestion for the test code.
Co-authored-by: Anton Firszov <antonfir@gmail.com>
Prerequisites
Description
Fixes #2806
ChunkedMemoryStream
contained multiple writing bugs and was too costly to fix/maintain relative to performance benefits so I'm just ditching it.ChunkedMemoryStream
to simplify it and fix numerous bugs.