Add PyProxyBufferMethods #1215

hoodmane · 2021-02-08T06:51:01Z

This resolves #1202. I added an API getRawBuffer(type) to all PyProxies of Python objects that support the Python Buffer protocol. The return value of getRawBuffer is an object with fields buffer, shape, strides, offset, and release. The shape and strides are as in the PyBuffer protocol but strides has been divided by an appropriate alignment factor. The equivalent of x[idx0, idx1, ..., idxn] in python is buffer[ offset + strides[0] * idx0 + ... + strides[n] * idxn].
The arguments buffer, shape, strides, offset are suitable to be used as the arguments to the constructor for ndarray:
https://scijs.net/packages/#scijs/ndarray
e.g.,

let {buffer, shape, strides, offset, release} = proxy.getRawBuffer("i32");
let mat = new ndarray(buffer, shape, strides, offset);

release is a function to call to free the buffer memory (you'll have to destroy the proxy you got it from too though).

I think I might add a readonly flag to indicate whether the buffer had the readonly flag set. This will hopefully fail on any buffer with suboffsets (unless they decide to copy themselves in bf_getbuffer...)

@daoxian please let me know what you think.

daoxian · 2021-02-08T14:00:44Z

Great work! The ability to construct common JS ndarray of python's raw Buffer without copying data is enough for most of my cases. Although a question is: how to release such a variable and the proxy itself in JS? Should I simply set it to null or call some method like destroy() ? A release function example is appreciated.

daoxian · 2021-02-08T15:10:10Z

And also I suffered from some potential memory leaks, e.g.:
in python

js.g_tracker.resize_img(img_bytes, ...)

While in JS resize_img(img_bytes, ...), img_bytes is a Ref to the python bytes memory. So should I release the memory in JS by setting img_bytes=null ? If so , what's the necessarity of release ?

hoodmane · 2021-02-08T18:29:45Z

@daoxian You have to destroy the PyProxy and release the memory.

So should I release the memory in JS by setting img_bytes=null?

Currently you need to say img_bytes.destroy(), unfortunately setting img_bytes = null leaks the memory. See issues #1006 and #1049. I have a fairly good idea of what the garbage collection code should like, but in cases where you have allocated large amounts of memory it'll be best to be safe and explicitly destroy the proxy even after I implement automated garbage collection (and browsers like Safari can't support it).

Note that pyodide is currently rather full of memory leaks, you have to take special care to avoid them.

If so , what's the necessarity of release ?

getRawBuffer makes a call to PyObject_GetBuffer. This has to be paired with a call to PyBuffer_Release when you are done with the memory.

Note that no matter what getRawBuffer must have some sort of release method. Well, I suppose one could tie the lifetime of the rawBuffer to the lifetime of the PyProxy. But currently a PyProxy can be unexpectedly destroyed (#1049) and then you could trigger use after free errors. I think it's good to make the caller of getRawBuffer responsible for releasing the buffer.

hoodmane · 2021-02-08T18:30:06Z

Here's an example:

function resize_img(src_data, dst_data) {
   let src_buffer = src_data.getRawBuffer();
   let dst_buffer = dst_data.getRawBuffer();
   src_data.destroy();
   dst_data.destroy();
   const src_img=cv.matFromArray(src_buffer.shape[0], src_buffer.shape[1], cv.CV_8UC3, src_buffer.buffer); 
   const dst_img=cv.matFromArray(dst_buffer.shape[0], dst_buffer.shape[1], cv.CV_8UC3, dst_buffer.buffer);
   // not really sure what this call should look like
   cv.resize(src_img, dst_img);
   src_buffer.release();
   dst_buffer.release();
}

daoxian · 2021-02-09T01:44:12Z

I see. Are both APIs (destroy & release) suitable for ver 0.16.x ?
For example, bytes in Python will be converted to a Uin8ClampedArray array. Then the array is used in JS, so should I release the array after using it?

Btw, I think an automatic garbage-collection mechanism is necessary for pyodide. Manually releasing memory is not friendly to beginners.

hoodmane · 2021-02-09T02:20:38Z

I see. Are both APIs (destroy & release) suitable for ver 0.16.x ?

destroy is present in ver 0.16.1 (and much older versions as well). Release is not.

I think a automatic garbage-collection mechanism is necessary for pyodide. Manually releasing memory is not friendly to beginners.

I agree that it is a high priority to improve the GC situation. Unfortunately, browsers have a really poor APIs for destructors. Until July 2020 it was technically impossible to do anything better. With FinalizationRegistry it is possible to do automatic reference counted garbage collection, but it is still technically impossible to automatically collect reference cycles between Python and Javascript.

The debugging infrastructure is also very limited: there is no way to request for the browser to perform a collection, which is generally important for debugging destructors.
https://caniuse.com/?search=finalizationregistry

Using FinalizationRegistry to implement partial automatic GC is high on my priority list. See #1006 though.

daoxian · 2021-02-09T02:21:56Z

I see. Are both APIs (destroy & release) suitable for ver 0.16.x ?

destroy is present in ver 0.16.1 (and much older versions as well). Release is not.

I think a automatic garbage-collection mechanism is necessary for pyodide. Manually releasing memory is not friendly to beginners.

I agree that it is a high priority to improve the GC situation. Unfortunately, browsers have a really poor APIs for destructors. Until July 2020 it was technically impossible to do anything better. With FinalizationRegistry it is possible to do automatic reference counted garbage collection, but it is still technically impossible to automatically collect reference cycles between Python and Javascript.

The debugging infrastructure is also very limited: there is no way to request for the browser to perform a collection, which is generally important for debugging destructors.
https://caniuse.com/?search=finalizationregistry

Using FinalizationRegistry to implement partial automatic GC is high on my priority list. See #1006 though.

Sorry for the modification. What about this question: "bytes in Python will be converted to a Uin8ClampedArray array. Then the array is used in JS, so should I release the array after using it?"

hoodmane · 2021-02-09T05:00:47Z

bytes in Python will be converted to a Uin8ClampedArray array. Then the array is used in JS, so should I release the array after using it?

No that uses the previous implicit conversion method. It has lifetime tied to the Python object that it came from. In particular, make sure you have a python reference to the bytes for the entire period of time for which you use that array or else you may have use-after-free errors.

daoxian · 2021-02-22T01:35:23Z

From issue #1221 (comment)

I found a bug in your proxy-buffer-procs branch:
src/core/pyproxy.c, line 667:

if(window.BigInt64Array)

will break the web worker's rule (cannot access window in web worker) and throws exception when in web workers. You could fix it before PR merge.

if(globalThis.BigInt64Array)

is just fine to run.

hoodmane · 2021-03-25T20:45:37Z

Oops, one minute let me fix that.

hoodmane · 2021-03-25T20:48:25Z

I think it should be better now?

hoodmane · 2021-03-25T20:49:46Z

Nope still not working...

hoodmane · 2021-03-25T20:54:47Z

Okay, well getBuffer succeeds without throwing any errors now but it still seems a bit buggy.

rth · 2021-03-25T20:55:35Z

Thanks! At least the numpy test passes. I'll try to add a few more.

hoodmane · 2021-03-25T20:58:31Z

it still seems a bit buggy

Nevermind, that specific problem was just me misunderstanding the numpy API.

rth

I added a couple more numpy tests

rth · 2021-03-25T22:08:24Z

packages/numpy/test_numpy.py

+        "np.arange(6).reshape((2, -1))",
+        pytest.param(
+            "np.arange(12).reshape((3, -1))[::2, ::2]", marks=pytest.mark.xfail
+        ),


The buffer size (and contents) are different in this case. It's likely expected given this implementation where start and endoffsets are recomputed but I wonder if it's something that we want.

Yeah this is the expected behavior: if the buffer isn't contiguous the data field will contain extra entries that are not part of the array. There's no other way to export noncontinuous arrays without copying them.
What does the -1 in the shape mean?

-1 in reshape means the remaining shape. So for instance np.arange(6).reshape((2, -1)) is identical to np.arange(6).reshape((2, 3)). It will error if the remaining size isn't an int.

OK, maybe we could keep this line with a comment but remove the xfail

I will try to modify the test to assert the right thing for noncontiguous buffers.

packages/numpy/test_numpy.py

…these code paths

This reverts commit 5d371c1.

docs/usage/type-conversions.md

src/core/pyproxy.js

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

Makefile

Add PyProxyBufferMethods

8e4369b

hoodmane marked this pull request as draft February 8, 2021 06:52

hoodmane force-pushed the pyproxy-buffer-procs branch from efeccb3 to 8e4369b Compare February 8, 2021 20:33

hoodmane mentioned this pull request Feb 9, 2021

Soundness issues in buffer conversions code #1221

Closed

Hood added 9 commits February 9, 2021 21:33

Merge branch 'master' into pyproxy-buffer-procs

4731f58

Add one test

88a463e

Lint

38fa397

Merge branch 'master' into pyproxy-buffer-procs

75abf13

Add another test, this one not using numpy

688e674

Update test to check refcounts, fix reference counting in getRawBuffer

ad1a9b0

lint

7223365

Retest

e609b5f

Retest

27357a1

hoodmane mentioned this pull request Feb 11, 2021

Don't implicitly convert tuple and bytes objects into Javascript #1234

Merged

Hood added 3 commits February 11, 2021 18:38

Return a class "PyBuffer" from "getRawBuffer"

73d8e50

Retest

3747469

Merge branch 'master' into pyproxy-buffer-procs

a7ce410

hoodmane mentioned this pull request Mar 1, 2021

Update documentation for type translations #1233

Merged

Hood added 3 commits March 2, 2021 13:51

window ==> globalThis

2928417

Merge branch 'master' into pyproxy-buffer-procs

b818818

Merge branch 'master' into pyproxy-buffer-procs

5e34465

Fix alignment

9934445

Hood added 2 commits March 25, 2021 13:51

Fix ReferenceError

c462056

Fix another ReferenceError

10a6547

Hood and others added 4 commits March 25, 2021 14:14

Update type-translations docs

8c00b0b

Remove python2js_buffer code

5d371c1

More getBuffer tests with numpy

55871c9

Check strides

bb8d9a9

rth reviewed Mar 25, 2021

View reviewed changes

Comment the failing strides check

c1945eb

rth reviewed Mar 25, 2021

View reviewed changes

packages/numpy/test_numpy.py Outdated Show resolved Hide resolved

Hood added 3 commits March 25, 2021 18:42

Remove handling for endianness etc: it seems to be impossible to hit …

ca39d15

…these code paths

Revert "Remove python2js_buffer code"

8e3e737

This reverts commit 5d371c1.

Fix strides test

4d68be4

rth approved these changes Mar 26, 2021

View reviewed changes

docs/usage/type-conversions.md Show resolved Hide resolved

src/core/pyproxy.js Outdated Show resolved Hide resolved

src/core/pyproxy.js Show resolved Hide resolved

Hood and others added 5 commits March 26, 2021 10:33

Fix tests

7e541cf

Apply suggestions from code review

920329d

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

Update docs

3442f91

Merge branch 'master' into pyproxy-buffer-procs

bc7c9a6

Update makefile

485ab45

hoodmane commented Mar 26, 2021

View reviewed changes

Makefile Show resolved Hide resolved

hoodmane merged commit fd88a18 into pyodide:master Mar 26, 2021

hoodmane deleted the pyproxy-buffer-procs branch March 26, 2021 22:59

hoodmane mentioned this pull request Apr 19, 2021

Use javascript ndarray package for Buffer conversions #1168

Closed

hamlet4401 pushed a commit to tytgatlieven/pyodide that referenced this pull request Jul 3, 2021

API Add PyProxyBufferMethods (pyodide#1215)

9c92813

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PyProxyBufferMethods #1215

Add PyProxyBufferMethods #1215

hoodmane commented Feb 8, 2021 •

edited

Loading

daoxian commented Feb 8, 2021 •

edited

Loading

daoxian commented Feb 8, 2021 •

edited

Loading

hoodmane commented Feb 8, 2021

hoodmane commented Feb 8, 2021

daoxian commented Feb 9, 2021 •

edited

Loading

hoodmane commented Feb 9, 2021

daoxian commented Feb 9, 2021

hoodmane commented Feb 9, 2021 •

edited

Loading

daoxian commented Feb 22, 2021

hoodmane commented Mar 25, 2021

hoodmane commented Mar 25, 2021

hoodmane commented Mar 25, 2021

hoodmane commented Mar 25, 2021

rth commented Mar 25, 2021

hoodmane commented Mar 25, 2021

rth left a comment

rth Mar 25, 2021 •

edited

Loading

hoodmane Mar 26, 2021 •

edited

Loading

rth Mar 26, 2021

rth Mar 26, 2021

hoodmane Mar 26, 2021

Add PyProxyBufferMethods #1215

Add PyProxyBufferMethods #1215

Conversation

hoodmane commented Feb 8, 2021 • edited Loading

daoxian commented Feb 8, 2021 • edited Loading

daoxian commented Feb 8, 2021 • edited Loading

hoodmane commented Feb 8, 2021

hoodmane commented Feb 8, 2021

daoxian commented Feb 9, 2021 • edited Loading

hoodmane commented Feb 9, 2021

daoxian commented Feb 9, 2021

hoodmane commented Feb 9, 2021 • edited Loading

daoxian commented Feb 22, 2021

hoodmane commented Mar 25, 2021

hoodmane commented Mar 25, 2021

hoodmane commented Mar 25, 2021

hoodmane commented Mar 25, 2021

rth commented Mar 25, 2021

hoodmane commented Mar 25, 2021

rth left a comment

Choose a reason for hiding this comment

rth Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

hoodmane Mar 26, 2021 • edited Loading

Choose a reason for hiding this comment

rth Mar 26, 2021

Choose a reason for hiding this comment

rth Mar 26, 2021

Choose a reason for hiding this comment

hoodmane Mar 26, 2021

Choose a reason for hiding this comment

hoodmane commented Feb 8, 2021 •

edited

Loading

daoxian commented Feb 8, 2021 •

edited

Loading

daoxian commented Feb 8, 2021 •

edited

Loading

daoxian commented Feb 9, 2021 •

edited

Loading

hoodmane commented Feb 9, 2021 •

edited

Loading

rth Mar 25, 2021 •

edited

Loading

hoodmane Mar 26, 2021 •

edited

Loading