Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: skorch-dev/skorch
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.13.0
Choose a base ref
...
head repository: skorch-dev/skorch
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.14.0
Choose a head ref
  • 10 commits
  • 30 files changed
  • 3 contributors

Commits on May 17, 2023

  1. Configuration menu
    Copy the full SHA
    80cacbe View commit details
    Browse the repository at this point in the history

Commits on May 22, 2023

  1. Configuration menu
    Copy the full SHA
    f69be0f View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2023

  1. Configuration menu
    Copy the full SHA
    5b222a5 View commit details
    Browse the repository at this point in the history

Commits on Jun 15, 2023

  1. Remove support for python 3.7 (#983)

    Since it is EOL this month anyway we should drop support and
    encourage users to upgrade to a more stable python branch.
    
    I removed the TODOs related to python 3.7 to the best of
    my knowledge.
    ottonemo authored Jun 15, 2023
    Configuration menu
    Copy the full SHA
    de718ef View commit details
    Browse the repository at this point in the history

Commits on Jun 19, 2023

  1. Configuration menu
    Copy the full SHA
    5f1ec03 View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2023

  1. Zero-shot and few-shot classification (#979)

    Adding zero-shot and few-shot classification to skorch by using Large Language Models (LLMs)
    
    To see this in action, check out the accompanying [notebook](https://github.com/skorch-dev/skorch/blob/llm-classifier/notebooks/LLM_Classifier.ipynb).
    
    ## Description
    
    This PR adds two new classes, `ZeroShotClassifier` and `FewShotClassifier`. The classes allow users to perform zero-shot and few-shot classification. At the same time, the classes are sklearn estimators/classifiers, so they can be used as normal sklearn models.
    
    This opens up a few opportunities, such as grid searching for the best prompt or the optimal number of examples to use for few shot learning. Even if a user doesn't plan on using these classes in the final application, they may still use it to figure out the best prompt and LLM to use.
    
    By integrating with Hugging Face transformers, this feature allows to run the LLMs completely locally, no API is being called with user data. This also allows us to do some nice tricks that are not easily achieved -- or may even be impossible -- with APIs, such as forcing the LLM to predict only the labels we provide, or to return the probabilities associated with each label. We also added some caching on top, which is useful if the labels share a long common prefix (say, `intent.support.email` and `intent.support.phone`).
    
    ## Similar libraries
    
    This feature was inspired by https://github.com/iryna-kondr/scikit-llm, which has a similar approach, but is not based on HF transformers. At the time of writing, it uses OpenAI API or GPT4All. Apart from those differences, AFAICT, this package does not provide a `predict_proba`, doesn't allow to perform grid search on prompt etc., doesn't prevent the LLM from making predictions that differ from the allowed labels, and doesn't do any caching. It is a good choice though if using OpenAI is desired or if users are interested in summarization, text embeddings, translation, or multi-label prediction.
    
    ## Implementation
    
    @ottonemo and I paired on this PR.
    
    Why add this feature to skorch, as it doesn't really use anything from skorch? This is true and it could also be a standalone package, for instance. The arguments for adding it here is that this feature is spiritually similar to what skorch already does, namely marrying (PyTorch) neural networks with scikit-learn. Since this feature has dependencies on PyTorch and sklearn anyway, it's not like users would have to install a lot of useless skorch dependencies only to use this feature. And in the end, adding this feature to skorch should give it some visibility, which would otherwise not be easy to achieve.
    
    Why have a new submodule `llm`? We decided on this because we may very well have more LLM-related features to add in the future. Otherwise, the code could probably also move `hf.py`.
    
    Why base  this on top of HF transformers? This library provides the widest range of language models and thus immediately gives the user many options. Thanks to the ability to intercept logits after each forward call, we can implement some of the features described above, which wouldn't be possible with other options. Users will probably already have some familiarity the HF transformers.
    
    We think if this feature finds a lot of adoption and there is a demand for it, we can still later add the option to use OpenAI API or other such services. This may prevent us from offering certain features like `predict_proba`, but perhaps not all users are interested in that.
    
    ## Open tasks
    
    This PR provides a first batch of features, but there are certainly ideas to add more features if this idea gets traction. The most obvious ones are:
    
    1. Multi-label prediction: It should be rather straightforward to add multi-label predictions, i.e. for a single sample, allowing multiple labels to be predicted. I would have to check how sklearn expects the data to be for multi-label prediction.
    2. More flexibility when it comes to formatting the prompt. Right now, the prompt as rather rigid. For instance, users cannot easily change how exactly the few shot samples are added to the prompt. Perhaps we can do better there, e.g. by using jinja. Another idea is to allow the `prompt` argument to be a function that returns a string.
    3. Providing the same functionality as an sklearn transformer. This would allow users to put this in an sklearn `Pipeline` as part of a feature engineering step. As an example, let's say we deal with customer reviews in an e-commerce setting. This transformer could generate a feature that indicates whether a customer review mentions the size of an item of clothing, which may not be of interest in itself, but may be a useful feature for an overarching supervised learning task.
    
    It this PR is merged as is, these open tasks should be added as issues to be worked on in the future.
    
    ## Coincidental changes
    
    The base class of skorch errors was changed to `Exception` from `BaseException`. This way, a user can do `try ... except Exception`. Previously, they would have had to do `try ... except BaseException`, which is pretty uncommon. Probably not many (or any) users should be negatively affected by this change, since `except BaseException` still works.
    
    
    # Changes in this squashed commit
    
    * [WIP] Initial working version
    
    * Add a bunch of more tests
    
    Most notably, for FewShotClassifier.
    
    Add functionality to check prompt placeholders.
    
    Use check_random_state in FewShotClassifier.
    
    * Remove need for padding label ids
    
    This is preferred because not all pretrained tokenizers define a pad
    token, so using padding would put an extra burden on users.
    
    * Loading model&tokenizer just from model name
    
    Just pass the model name. Passing model and tokenizer explicitly is
    still possible.
    
    * Move prompts to their own module
    
    * Use torch.tensor to initialize tensor
    
    Not torch.LongTensor.
    
    * Some fixes to _load_model_and_tokenizer
    
    * Add check for all probs being 0
    
    * Remove option to cache seq2seq model
    
    There is a fundamental problem with caching seq2seq/encoder-decoder,
    because the outcomes are not necessarily the same. Therefore, caching is
    disabled for seq2seq and an error is raised.
    
    A few more checks on the model/tokenizer are introduced to ensure that
    valid arguments are passed.
    
    * Some changes to the notebook
    
    - Use fewer samples: 100 instead of 500, 500 was just a bit slow. The
      results are not affected a lot by this
    - Add installation instructions for extra dependencies
    - Disable caching for flan-t5
    
    * Add checking for low probabilities
    
    * Add test for wrong architecture raising
    
    * Add docstrings, more checking
    
    * More checking, testing of prompts, extend notebook
    
    * Add more code and tests for verifying X and y
    
    Also, fixed a bug that prevented usage of y if ys are not strings. It's
    not something we recommend to do (we warn about it), but it should be
    possible.
    
    * Fix the repr of the estimators
    
    * Update notebook, adding a lot more explanations
    
    * Rename notebook to conform with existing naming
    
    * Add documentation
    
    * Add entry to CHANGES.md
    
    * Clean up notebook
    
    * Fix a docstring
    
    * Support BFloat
    
    The `prod_cpu` op is not implemented for bfloat16
    which is why we need to convert to a support float type.
    Since the sample will probably quite small we don't need to
    specify a more concrete / memory-friendly type.
    
    * Add method to clear cache
    
    * Fix a few tests after prompt was changed
    
    Also: Add comment that changing the prompt may require adjusting the
    tests.
    
    * Add docs section about debugging, fix some typos
    
    * Add section prompts for few-shot classification
    
    * Fix a bug in get_examples for few-shot learning
    
    It was possible that the same sample is returned twice. This should now
    be fixed. Tests were added.
    
    Also added a docstring for random_state, which I forget to add earlier.
    
    * Add one more debugging tip to the docs
    
    * Document when not to use this new feature
    
    * Fix issue with rendering some docstrings
    
    * Apply some linting
    
    - long lines
    - useless f-string
    
    * Small amendment to docs about dealing with issues
    
    * Allow prompts not to have all placeholders
    
    Will only warn, not raise an error. This is because it may sometimes be
    desired not to fill all the keys (especially the labels may not be
    needed, depending on the prompt).
    
    * Add Bonus section testing bloomz-1b1 on PIQA
    
    In addition to the already comprehensive list of examples it might
    be a good idea to show-case how to handle tasks that are not simply
    classification tasks and how to deal with the problems that arise
    when doing so.
    
    ---------
    
    Co-authored-by: Marian Tietz <marian.tietz@ottogroup.com>
    BenjaminBossan and ottonemo authored Jun 23, 2023
    Configuration menu
    Copy the full SHA
    7991e5c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cf28846 View commit details
    Browse the repository at this point in the history
  3. Update changelog [no ci]

    ottonemo committed Jun 23, 2023
    Configuration menu
    Copy the full SHA
    fe8286a View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2023

  1. Allow regression with 1d targets (#974)

    * Allow regression with 1d targets
    
    This change makes it possible to pass a 1-dimensional y to
    `NeuralNetRegressor`.
    
    Problem description
    
    Right now, skorch requires the `y` passed to `NeuralNetRegressor.fit` to
    be 2-dimensional, even if there is only one target, as is the most
    common case. This problem has come up a few times in the past, but
    mostly it's just an annoyance - just do `y.reshape(-1, 1)` and you're
    good (the error message says as much).
    
    There are, however, also cases where it's not so easy to solve. For
    instance, in #972, a user reports that they cannot use skorch with
    sklearn's `BaggingRegressor`. The problem is that even if `y` is
    reshaped, once it is passed to the net from `BaggingRegressor`, it is 1d
    again. I assume that `BaggingRegressor` internally squeezes `y` at some
    point.
    
    This PR lifts the 2d restriction check.
    
    Initial motivation
    
    Why does skorch require `y` to be 2d? I couldn't remember the initial
    reasoning and did some archeology.
    
    I found this comment:
    
    (2f00e25#diff-66ed08bca4d171889565d0285a36b9b47e0e91e3b33d85c51352d8eb00faefac):
    
    >         # The problem with 1-dim float y is that the pytorch DataLoader will
    >         # somehow upcast it to DoubleTensor
    
    This strange behavior should not be an issue anymore, so if that was the
    only problem, we should be able to just remove the constraint, right?
    
    Problems with removing the constraint
    
    Unfortunately, it's not that easy. The issue comes down to the
    following: When we remove the constraint and allow the target `y` to be
    1d, but the prediction `y_pred` is still 2d, the criterion `nn.MSELoss`
    will probably do the wrong thing. What exactly is wrong? Instead of
    calculating the squared error for each sample pair, the criterion
    will broadcast the vector and calculate _all squared errors_ between
    each sample, then return the mean of that. To demonstrate, let's remove
    the reduction step and look at the shape:
    
    ```python
    >>> import torch
    >>> criterion = torch.nn.MSELoss(reduction='none')
    >>> y = torch.rand(100)
    >>> y_pred = torch.rand((100, 1))
    >>> y.shape, y_pred.shape
    (torch.Size([100]), torch.Size([100, 1]))
    >>> se = criterion(y_pred, y)
    /home/vinh/anaconda3/envs/skorch/lib/python3.10/site-packages/torch/nn/modules/loss.py:536: UserWarning: Using a target size (torch.Size([100])) that is different to the input size (torch.Size([100, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
      return F.mse_loss(input, target, reduction=self.reduction)
    >>> se.shape
    torch.Size([100, 100])
    ```
    
    As can be seen, PyTorch broadcasts the two arrays, leading to 100x100
    errors being calculated. Thankfully, PyTorch warns about potential
    issues with that.
    
    The current solution is to accept this behavior and hope that the users
    will indeed see the warning. If they don't see it or ignore it, it could
    be a huge issue, because they still get a loss scalar and might even see
    a small improvement in the loss during training. But the model will not
    converge and it's going to be a huge pain to debug the bug, if it's even
    identified as such.
    
    Just to be clear, existing code, which uses 2d targets, will not be
    affected by the change introduced in this PR and is still the preferred
    way (IMO) to use regression in skorch.
    
    Rejected solutions
    
    I did consider the following solutions but rejected them.
    
    Raising an error when shapes mismatch
    
    This would remove the risk of users missing the warning. The problem
    with this is that mismatching shapes can be okay in certain
    circumstances. Some criteria don't expect target and prediction to have
    the same shape, so we would need to check based on criterion. Moreover,
    theoretically, users may indeed want to broadcast. Raising an error
    would prevent that and users may have to resort to subclassing to
    circumvent the error.
    
    Automatic reshaping
    
    We could automatically add/remove dimensions if we see that they
    mismatch. This has the same problems as the previous solution regarding
    the dependence on the type of criterion. Furthermore, automatic
    adjustment of the user's output is prone to run into issues in some edge
    cases (e.g. when the broadcasting is actually desired).
    
    * Fix error when initializing BaggingRegressor
    
    For Python 3.7, CI got:
    
    TypeError: __init__() got an unexpected keyword argument 'estimator'
    
    for BaggingRegressor. Probably it installs an older version of sklearn,
    which uses a different argument name. Passing as positional arg should
    fix it.
    
    * Reviewer comment: typo
    
    Co-authored-by: ottonemo <marian.tietz@ottogroup.com>
    
    * Reviewer comment: typo
    
    Co-authored-by: ottonemo <marian.tietz@ottogroup.com>
    
    ---------
    
    Co-authored-by: ottonemo <marian.tietz@ottogroup.com>
    BenjaminBossan and ottonemo authored Jun 26, 2023
    Configuration menu
    Copy the full SHA
    df92d4d View commit details
    Browse the repository at this point in the history
  2. Prepare release 0.14.0 (#985)

    Release text:
    
    This release offers a new interface for scikit-learn to do zero-shot and
    few-shot classification using open source large language models (Jump right into
    the example notebook).
    
    skorch.llm.ZeroShotClassifier and skorch.llm.FewShotClassifier allow the user to
    do classification using open-source language models that are compatible with the
    huggingface generation interface. This allows you to do all sort of interesting
    things in your pipelines. From simply plugging a LLM into your classification
    pipeline to get preliminary results quickly, to using these classifiers to
    generate training data candidates for downstream models. This is a first draft
    of the interface, therefore it is not unlikely that the interface will change a
    bit in the future, so please, let us know about any potential issues you have.
    
    Other items of this release are
    
    - the drop of Python 3.7 support - this version of Python has reached EOL and
      will not be supported anymore
    - the NeptuneLogger now logs the skorch version thanks to AleksanderWWW
    - NeuralNetRegressor can now be fitted with 1-dimensional y, which is necessary
      in some specific circumstances (e.g. in conjunction with sklearn's
      BaggingRegressor, see sklearn.ensemble.BaggingRegressor() #972); for this to
      work correctly, the output of the of the PyTorch module should also be
      1-dimensional; the existing default, i.e. having y and y_pred be
      2-dimensional, remains the recommended way of using NeuralNetRegressor
    ottonemo authored Jun 26, 2023
    Configuration menu
    Copy the full SHA
    4c5cfda View commit details
    Browse the repository at this point in the history
Loading