-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Allow to pass Arrow array as labels #6163
Conversation
Yes @jameslamb! Will do some time today (I'm currently on vacation, not spending too much time with my laptop 😄). |
Oh don't worry about it, enjoy your vacation! I'll be here and ready whenever you get back. This is not urgent at all... it's up to you how fast we move through these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes on the Python side look great to me... thanks for working within this project's existing patterns for type hints, optional-dependency imports, etc. And the tests look excellent to me!
Changes on the C/C++ side look good to me at a glance, but I'm not fully qualified to approve them, so I'm just leaving a "Comment" review.
@guolinke or @shiyu1994 could you please help with a review?
@@ -67,6 +67,10 @@ def dummy_dataset_params() -> Dict[str, Any]: | |||
} | |||
|
|||
|
|||
def assert_arrays_equal(lhs: np.ndarray, rhs: np.ndarray): | |||
assert lhs.dtype == rhs.dtype and np.array_equal(lhs, rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this approach for testing if two numpy
arrays are identical, I'd prefer that we use np.testing.assert_array_equal(..., strict=True)
. That functionality is also so generically useful in lightgbm
's tests that I think it belongs in tests/python_package_test/utils.py
.
I've introduced such a function in https://github.com/microsoft/LightGBM/pull/6108/files#diff-da8bd23ee10ae6337467b4d88500a4a695b154d23a17742a4cef514cb1d6cf85R206.
@jmoralez I'm waiting to merge that PR until you have a chance to review #6108 again, specifically #6108 (comment). Will you have time to do that this week?
I know it's kind of involved, so if not I'll pull that new testing function out of that PR into a separate one that'd hopefully require less reviewing effort, so we could re-use it here and in other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb any chance #6108 is merged soon so I can take advantage of the new function? Or should I just copy the function into this branch so that we can proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't just copy that function into this branch... I want it to be reviewed by someone other than myself.
We can just merge this today and I'll clean up and consolidate the testing functions later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has but @jmoralez and I have been collaborating on it (as can be seen in the comment thread I linked to from #6163 (comment)), so I don't want to merge it until he has the time to review.
Pull request reviews here aren't just quality gates like a building or restaurant inspection... they're a chance for maintainers to agree/disagree about the state of this thing we're all maintaining together. "at least one approval" is the minimum standard we enforce but that doesn't mean that every PR should be merged as soon as one person approves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey. Sorry for the delay, I moved last week and missed the update in #6108. I'm reviewing it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No rush at all, don't let me add stress to your life! Seriously, it can happen any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Let's do query groups next (#6166), as I think that'll be easier than init_score
or weight
since group
can only be a 1-dimensional array.
@jameslamb: can't |
Ah yeah you're right LightGBM/python-package/lightgbm/basic.py Lines 1638 to 1639 in aeafccf
Sorry, forgot that for multi-class classification we don't support specifying different row weights by class. Going to #6164 next is fine. I didn't realize they relied on each other sequentially. |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Motivation
This is part of a series of PRs towards #6022. This is built on #6034 but, unfortunately, I cannot properly specify the PR base here as I can't use my fork's branch as base branch.
For a legible diff, see borchero/LightGBM@arrow-support-training-data...arrow-support-labels.