Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SWIG][mmlspark] allow allocating more than int max array #2859

Merged
merged 1 commit into from
Apr 19, 2020

Conversation

imatiach-msft
Copy link
Contributor

@imatiach-msft imatiach-msft commented Mar 3, 2020

resolves issue #2615 by introducing long versions of SWIG-generated array
ping @chris-smith-zocdoc @guolinke
validated jar in mmlspark

@StrikerRUS
Copy link
Collaborator

@imatiach-msft Sorry for the inconvenience, but can you please rebase to the latest master to fix CI checks?

@imatiach-msft imatiach-msft force-pushed the ilmat/long-max-array branch from d480025 to 9a83dcb Compare March 5, 2020 04:27
@imatiach-msft
Copy link
Contributor Author

@StrikerRUS done!

@StrikerRUS
Copy link
Collaborator

Hey @imatiach-msft,

Can you please resolve conflicts that appeared after merging #2850?

@StrikerRUS StrikerRUS requested review from chivee and guolinke March 20, 2020 01:08
@imatiach-msft
Copy link
Contributor Author

@StrikerRUS done! Resolved conflicts, updated PR.

@imatiach-msft
Copy link
Contributor Author

ping @guolinke @StrikerRUS thanks!

@StrikerRUS
Copy link
Collaborator

@imatiach-msft
The diff looks nice to me, but unfortunately I have no idea what is going on here, sorry! 😃
static TYPE *new_##NAME
%{}
...

@imatiach-msft
Copy link
Contributor Author

@StrikerRUS this is just a slightly modified version of the SWIG file here which we were already using in the previous %array_functions call:
https://github.com/swig/swig/blob/master/Lib/carrays.i
note how the values with "int" type are now just replaced to have "int64_t" type.
Hope this helps clear up any confusion!

@AlbertoEAF
Copy link
Contributor

AlbertoEAF commented Mar 25, 2020

Hello @imatiach-msft :),

If we're going to change that, then can we use an unsigned type like size_t? It's also the underlying type being used in the operator new allocation and negative values shouldn't be allowed.

@AlbertoEAF
Copy link
Contributor

AlbertoEAF commented Mar 25, 2020

Also,

SWIG's carrays.i, is NOT memory-allocation safe, which means your whole application can crash just because the OS kills your underlying C process.
Before talking with you, I posted an issue at SWIG's repo to see if they wanted to fix it but they never answered back: swig/swig#1755.

So my suggestion is since we're changing the Java API anyways, just fix it ourselves. We could throw an exception or even easier, use new (nothrow). Anyone who wishes to handle that case just has to test in Java for the null pointer, all other users who are not concerned with such issues (non-production) can keep working like before.

What do you say?

Copy link
Contributor

@AlbertoEAF AlbertoEAF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this issue swig/swig#1755 for users of our SWIG API to avoid irrecoverable crashes in production systems.

You can use new nothrow or wrap the std::bad_alloc exception through SWIG. I guess the second is more elegant but requires more SWIG code:
https://valelab4.ucsf.edu/svn/3rdpartypublic/swig/Doc/Manual/Java.html#Java_exception_handling

swig/lightgbmlib.i Show resolved Hide resolved
@imatiach-msft
Copy link
Contributor Author

imatiach-msft commented Mar 25, 2020

@AlbertoEAF "then can we use an unsigned type like size_t"
I think long is way more than enough in terms of size for the use-case. I'm not sure how I would call this from the java/scala code whereas I've already tested this with int64_t.

@imatiach-msft
Copy link
Contributor Author

imatiach-msft commented Mar 25, 2020

@AlbertoEAF "So my suggestion is since we're changing the Java API anyways, just fix it ourselves"

Agree that it should be fixed, but can this be part of a separate PR? This issue is much more pressing and I really want to fix it so I can publish a new lightgbm jar for customers with very large data. The additional change would require more integration testing on my side.

Copy link
Contributor

@AlbertoEAF AlbertoEAF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imatiach-msft asked to have such issues fixed in another PR.

Ignoring that, it will change the Java SWIG API, but I think the code looks good.

@StrikerRUS
Copy link
Collaborator

gently ping @guolinke for the review

@StrikerRUS StrikerRUS merged commit 0d3e204 into microsoft:master Apr 19, 2020
@StrikerRUS StrikerRUS changed the title [mmlspark] allow allocating more than int max array [SWIG][mmlspark] allow allocating more than int max array Apr 19, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants