Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] allow to register any custom logger (fixes #4783) #4880

Merged
merged 11 commits into from
Mar 28, 2022
Merged

[python] allow to register any custom logger (fixes #4783) #4880

merged 11 commits into from
Mar 28, 2022

Conversation

RustingSword
Copy link
Contributor

Allow to register any custom logger, as long as it provides info, warning and error method. Refer to #4783.

@ghost
Copy link

ghost commented Dec 11, 2021

CLA assistant check
All CLA requirements met.

@jameslamb jameslamb changed the title [python] allow to register any custom logger [python] allow to register any custom logger (fixes #4783) Dec 12, 2021
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for this!

I've added (fixes #4783) to the pull request title here so GitHub will automatically close #4783 if this is merged, and so the connection between this PR and that issue is clear to anyone reading the release notes (PR titles become release note items in this project, see here for example).

Can you please add a unit test covering this new behavior, to be sure it isn't accidentally broken by future refactorings? See the diff from https://github.com/microsoft/LightGBM/pull/3820/files#diff-57a0d4d593bc1ebac04a99d2e477a178c209de8c50e6cffdb5f9195b26dd6e68 for an example.

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
@RustingSword
Copy link
Contributor Author

Thanks for the thorough guidance! I considered updating the unit test, however not sure what to test. How about add test to make sure the method check is satisfied? Maybe something like this:

def test_register_invalid_logger():
    class LoggerWithoutInfoMethod:
        def warning(self, msg: str) -> None:
            print(msg)

        def error(self, msg: str) -> None:
            print(msg)

    class LoggerWithoutWarningMethod:
        def info(self, msg: str) -> None:
            print(msg)

        def error(self, msg: str) -> None:
            print(msg)

    class LoggerWithoutErrorMethod:
        def info(self, msg: str) -> None:
            print(msg)

        def warning(self, msg: str) -> None:
            print(msg)

    def _test_raise_type_error(invalid_logger) -> None:
        try:
            lgb.register_logger(invalid_logger)
        except Exception as err:
            assert isinstance(err, TypeError)
            assert str(err) == "Logger must provide 'info', 'warning' and 'error' method"

    _test_raise_type_error(LoggerWithoutInfoMethod())
    _test_raise_type_error(LoggerWithoutWarningMethod())
    _test_raise_type_error(LoggerWithoutErrorMethod())

@jameslamb
Copy link
Collaborator

not sure what to test

I think tests covering the following would give us confidence that this feature is working:

  • given a logger class that implements info(), warning(), and error() but does not inherit from logging.Logger, that class's logging methods are used by correctly and successfully by lightgbm
  • given a class that has properties info, warning, and error set to something non-callable (like integers), lightgbm throws the error introduced in this PR
  • given a class that is missing info(), warning(), or error(), lightgbm throws the error introduced in this PR

For tests that check for errors, please use pytest.raises() for that instead of implementing _test_raise_type_error(), see the following for an example.

with pytest.raises(ValueError, match="Could not parse host name from worker address '0.0.0.1:80'"):

@RustingSword
Copy link
Contributor Author

  • given a logger class that implements info(), warning(), and error() but does not inherit from logging.Logger, that class's logging methods are used by correctly and successfully by lightgbm

Is there any way to directly test the logging utility in lightgbm, or do I need to test it like https://github.com/microsoft/LightGBM/pull/3820/files#diff-57a0d4d593bc1ebac04a99d2e477a178c209de8c50e6cffdb5f9195b26dd6e68 to go through the training process? If so, I'm afraid there will be a large portion of duplicate code, which requires further refactoring.

For tests that check for errors, please use pytest.raises() for that instead of implementing _test_raise_type_error()

OK, this is much better than my clumsy implementation.

Comment on lines 44 to 45
def error(self, msg: str) -> None:
warnings.warn(msg, stacklevel=3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was a typo, right?

Suggested change
def error(self, msg: str) -> None:
warnings.warn(msg, stacklevel=3)
def error(self, msg: str) -> None:
warnings.error(msg, stacklevel=3)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there's no error() in warnings? I seldom use this module, so correct me if I'm wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed! I confused this with error() function from logging module. Sorry.
https://docs.python.org/3/library/logging.html#logging.Logger.error

For our dummy logger I think we can just raise an exception then, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the default logger, in the future when error() is actually needed, we still need to implement it. So I'm inclined to leave a dummy yet usable implementation as default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a .error() method being added at all? Per your description in #4783

Since only info and warning are used to log messages, any logger class providing these two methods would suffice.

And I can confirm that such a method isn't used anywhere.

git grep -E "\.error"

I have a strong preference for not having any "well we might need this in the future" code in the package, as it increases the size of the package and amount of code to be maintained without affecting the user experience. I think this PR should not add a .error() implementation at all. @StrikerRUS what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's remove requirements to have .error() method in custom logger class in this PR. But please note that there will be a breaking change in case we'll want to require it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed .error() related code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

please note that there will be a breaking change in case we'll want to require it in the future

Yes, I understand. I think the case where the lightgbm library would want to emit an ERROR-level log but not raise an exception is unlikely to come up.

Even the logger on the C++ side does not have a method for ERROR-level messages: https://github.com/microsoft/LightGBM/blob/17d4e007852b4f0b34b211a674cb26e6751c98e2/include/LightGBM/utils/log.h.

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
@RustingSword
Copy link
Contributor Author

Is there any way to directly test the logging utility in lightgbm

I find that maybe I can directly use lgb._log_info(msg) to test logging functions?

Also I'm wondering whether we should export _log_info, _log_warning, _log_error as public methods, say as info, warning and error. So once we registered our own logger, we can use lgb.info(), lgb.warning(), lgb.error() in user code, to make the usage and logging result consistent.

@jameslamb
Copy link
Collaborator

@RustingSword are you still interested in pursuing this PR? If you can update this to the latest master, I'd be happy to provide another review.


I'm sorry, I realized today that you were waiting for feedback from us on #4880 (comment).

Also I'm wondering whether we should export _log_info, _log_warning, _log_error as public methods, say as info, warning and error

I don't believe those methods should be exported or that calling LightGBM's logger like lgb.info() should be encouraged. I think the ability to pass in your own logger means that if you want to have other code that calls the same logger as LightGBM, you can instantiate that logger in your code and pass it into register_logger().

@RustingSword
Copy link
Contributor Author

@jameslamb I rebased on master. If we don't expose logging methods, then there's no direct way to test logger. Should I duplicate the tests used in test_register_logger ? Personally I think the test is coupled with other functionalities, such as training, which have been tested in other module, e.g. test_engine.py.

@jameslamb
Copy link
Collaborator

I rebased on master

Since this project uses squash merging, just using git merge master and pushing a merge commit is sufficient. This is also typically easier...something about the way you rebased has made GitHub think this PR is changing 100 files, which is going to make it really difficult to review.

Screen Shot 2022-03-19 at 10 08 23 PM

Can you please remove those commits and instead merge master, so the diff only shows what this PR is changing? If you're not sure how to, please let me know and I'd be happy to update this branch for you.

If we don't expose logging methods, then there's no direct way to test logger

Since you are injecting a logger, I think it's possible to design one for a test which modifies a global variable, and then to test the content of those global variables.

pseudo-code showing what I mean INFO-level logs (which could be extended for WARN and FATAL).

import lightgbm as lgb

info_logs = []

class _TestLogger:
    def info(self, msg: str) -> None:
        global info_logs
        info_logs.append(msg)

lgb.register_logger(_TestLogger())

# directly invoke `lgb.basic._log_info()`
lgb.basic._log_info("hello")

assert info_logs == ["hello"]

Besides that, you could have one similar test that performs training with lgb.train(), and just check that info_logs is non-empty. That way, the test wouldn't be as tightly coupled to LightGBM's training logic, but would still check "lgb.train() actually uses the logger registered with register_logger()".

@RustingSword
Copy link
Contributor Author

Sorry, I will do a force update on this branch to merge master instead of rebase.

The suggested test method is great, will implement later.

@jameslamb
Copy link
Collaborator

Sounds good, thanks! We are VERY sorry it took so long to respond to your questions. Thanks very much for still being willing to contribute!

@StrikerRUS
Copy link
Collaborator

I don't believe those methods should be exported or that calling LightGBM's logger like lgb.info() should be encouraged. I think the ability to pass in your own logger means that if you want to have other code that calls the same logger as LightGBM, you can instantiate that logger in your code and pass it into register_logger().

I'm totally agree with this statement.

@StrikerRUS
Copy link
Collaborator

@RustingSword
Could you please fix one linting error?

./tests/python_package_test/test_utilities.py:147:1: E303 too many blank lines (3)

@RustingSword
Copy link
Contributor Author

@RustingSword Could you please fix one linting error?

./tests/python_package_test/test_utilities.py:147:1: E303 too many blank lines (3)

Sorry, missed this one, fixed.

@RustingSword
Copy link
Contributor Author

When I'm running pytest locally, the expected log messages in test_register_logger miss two lines at the beginning, which are

WARNING | categorical_feature in Dataset is overridden.
New categorical_feature is [1]

I'm not sure whether this should be updated, since there seems no related test error in the ci pipeline.

@StrikerRUS
Copy link
Collaborator

@RustingSword

When I'm running pytest locally ...

Are you using the latest version of LightGBM?
Refer to #4768.

@RustingSword
Copy link
Contributor Author

@RustingSword

When I'm running pytest locally ...

Are you using the latest version of LightGBM? Refer to #4768.

I see, I'm using 3.3.2, and #4768 is not included per #4930.

@jameslamb jameslamb requested review from jameslamb and removed request for chivee March 26, 2022 04:07
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look great, thank you very much! Please just see my one remaining comment about the .error() methood.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great to me, thanks very much for the idea and the contribution!

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RustingSword Thank you so much for your work!

Just one nit about docstring style:

Comment on lines 58 to 61
info_method_name: str, optional (default="info")
Method used to log info messages.
warning_method_name: str, optional (default="warning")
Method used to log warning messages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info_method_name: str, optional (default="info")
Method used to log info messages.
warning_method_name: str, optional (default="warning")
Method used to log warning messages.
info_method_name : str, optional (default="info")
Method used to log info messages.
warning_method_name : str, optional (default="warning")
Method used to log warning messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Maybe we can add a corresponding linter rule? Or even better, use a unified formatter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea! Do you know any one?
pydocstyle linter we use here doesn't catch such inconsistency.

pydocstyle --convention=numpy --add-ignore=D105 --match-dir="^(?!^external_libs|test|example).*" --match="(?!^test_|setup).*\.py" . || exit -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is a known issue with pydocstyle. I tried numpydoc as mentioned in the comment, however it is not satisfying either. First, numpydoc is an extension to sphinx, and the validation rules cannot be set in command line options when used as a standalone tool. Second, numpydoc is not intelligent enough. We can see this in above issue, where a single missing whitespace will result in multiple false positives.

smalltest.no:PR01:Parameters {'a'} not documented
smalltest.no:PR02:Unknown parameters {'a: int'}

These two errors are due to parsing error caused by the missing whitespace before colon.

So, I don't have suggested tool suited for this 😞

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no problem, thanks very much for investigating!

@RustingSword
Copy link
Contributor Author

@jameslamb @StrikerRUS Thank you both for your excellent guidance!

@StrikerRUS StrikerRUS merged commit 60e72d5 into microsoft:master Mar 28, 2022
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants