Comparing changes

Since it is EOL this month anyway we should drop support and encourage users to upgrade to a more stable python branch. I removed the TODOs related to python 3.7 to the best of my knowledge.

@ottonemo

Adding zero-shot and few-shot classification to skorch by using Large Language Models (LLMs) To see this in action, check out the accompanying [notebook](https://github.com/skorch-dev/skorch/blob/llm-classifier/notebooks/LLM_Classifier.ipynb). ## Description This PR adds two new classes, `ZeroShotClassifier` and `FewShotClassifier`. The classes allow users to perform zero-shot and few-shot classification. At the same time, the classes are sklearn estimators/classifiers, so they can be used as normal sklearn models. This opens up a few opportunities, such as grid searching for the best prompt or the optimal number of examples to use for few shot learning. Even if a user doesn't plan on using these classes in the final application, they may still use it to figure out the best prompt and LLM to use. By integrating with Hugging Face transformers, this feature allows to run the LLMs completely locally, no API is being called with user data. This also allows us to do some nice tricks that are not easily achieved -- or may even be impossible -- with APIs, such as forcing the LLM to predict only the labels we provide, or to return the probabilities associated with each label. We also added some caching on top, which is useful if the labels share a long common prefix (say, `intent.support.email` and `intent.support.phone`). ## Similar libraries This feature was inspired by https://github.com/iryna-kondr/scikit-llm, which has a similar approach, but is not based on HF transformers. At the time of writing, it uses OpenAI API or GPT4All. Apart from those differences, AFAICT, this package does not provide a `predict_proba`, doesn't allow to perform grid search on prompt etc., doesn't prevent the LLM from making predictions that differ from the allowed labels, and doesn't do any caching. It is a good choice though if using OpenAI is desired or if users are interested in summarization, text embeddings, translation, or multi-label prediction. ## Implementation @ottonemo and I paired on this PR. Why add this feature to skorch, as it doesn't really use anything from skorch? This is true and it could also be a standalone package, for instance. The arguments for adding it here is that this feature is spiritually similar to what skorch already does, namely marrying (PyTorch) neural networks with scikit-learn. Since this feature has dependencies on PyTorch and sklearn anyway, it's not like users would have to install a lot of useless skorch dependencies only to use this feature. And in the end, adding this feature to skorch should give it some visibility, which would otherwise not be easy to achieve. Why have a new submodule `llm`? We decided on this because we may very well have more LLM-related features to add in the future. Otherwise, the code could probably also move `hf.py`. Why base this on top of HF transformers? This library provides the widest range of language models and thus immediately gives the user many options. Thanks to the ability to intercept logits after each forward call, we can implement some of the features described above, which wouldn't be possible with other options. Users will probably already have some familiarity the HF transformers. We think if this feature finds a lot of adoption and there is a demand for it, we can still later add the option to use OpenAI API or other such services. This may prevent us from offering certain features like `predict_proba`, but perhaps not all users are interested in that. ## Open tasks This PR provides a first batch of features, but there are certainly ideas to add more features if this idea gets traction. The most obvious ones are: 1. Multi-label prediction: It should be rather straightforward to add multi-label predictions, i.e. for a single sample, allowing multiple labels to be predicted. I would have to check how sklearn expects the data to be for multi-label prediction. 2. More flexibility when it comes to formatting the prompt. Right now, the prompt as rather rigid. For instance, users cannot easily change how exactly the few shot samples are added to the prompt. Perhaps we can do better there, e.g. by using jinja. Another idea is to allow the `prompt` argument to be a function that returns a string. 3. Providing the same functionality as an sklearn transformer. This would allow users to put this in an sklearn `Pipeline` as part of a feature engineering step. As an example, let's say we deal with customer reviews in an e-commerce setting. This transformer could generate a feature that indicates whether a customer review mentions the size of an item of clothing, which may not be of interest in itself, but may be a useful feature for an overarching supervised learning task. It this PR is merged as is, these open tasks should be added as issues to be worked on in the future. ## Coincidental changes The base class of skorch errors was changed to `Exception` from `BaseException`. This way, a user can do `try ... except Exception`. Previously, they would have had to do `try ... except BaseException`, which is pretty uncommon. Probably not many (or any) users should be negatively affected by this change, since `except BaseException` still works. # Changes in this squashed commit * [WIP] Initial working version * Add a bunch of more tests Most notably, for FewShotClassifier. Add functionality to check prompt placeholders. Use check_random_state in FewShotClassifier. * Remove need for padding label ids This is preferred because not all pretrained tokenizers define a pad token, so using padding would put an extra burden on users. * Loading model&tokenizer just from model name Just pass the model name. Passing model and tokenizer explicitly is still possible. * Move prompts to their own module * Use torch.tensor to initialize tensor Not torch.LongTensor. * Some fixes to _load_model_and_tokenizer * Add check for all probs being 0 * Remove option to cache seq2seq model There is a fundamental problem with caching seq2seq/encoder-decoder, because the outcomes are not necessarily the same. Therefore, caching is disabled for seq2seq and an error is raised. A few more checks on the model/tokenizer are introduced to ensure that valid arguments are passed. * Some changes to the notebook - Use fewer samples: 100 instead of 500, 500 was just a bit slow. The results are not affected a lot by this - Add installation instructions for extra dependencies - Disable caching for flan-t5 * Add checking for low probabilities * Add test for wrong architecture raising * Add docstrings, more checking * More checking, testing of prompts, extend notebook * Add more code and tests for verifying X and y Also, fixed a bug that prevented usage of y if ys are not strings. It's not something we recommend to do (we warn about it), but it should be possible. * Fix the repr of the estimators * Update notebook, adding a lot more explanations * Rename notebook to conform with existing naming * Add documentation * Add entry to CHANGES.md * Clean up notebook * Fix a docstring * Support BFloat The `prod_cpu` op is not implemented for bfloat16 which is why we need to convert to a support float type. Since the sample will probably quite small we don't need to specify a more concrete / memory-friendly type. * Add method to clear cache * Fix a few tests after prompt was changed Also: Add comment that changing the prompt may require adjusting the tests. * Add docs section about debugging, fix some typos * Add section prompts for few-shot classification * Fix a bug in get_examples for few-shot learning It was possible that the same sample is returned twice. This should now be fixed. Tests were added. Also added a docstring for random_state, which I forget to add earlier. * Add one more debugging tip to the docs * Document when not to use this new feature * Fix issue with rendering some docstrings * Apply some linting - long lines - useless f-string * Small amendment to docs about dealing with issues * Allow prompts not to have all placeholders Will only warn, not raise an error. This is because it may sometimes be desired not to fill all the keys (especially the labels may not be needed, depending on the prompt). * Add Bonus section testing bloomz-1b1 on PIQA In addition to the already comprehensive list of examples it might be a good idea to show-case how to handle tasks that are not simply classification tasks and how to deal with the problems that arise when doing so. --------- Co-authored-by: Marian Tietz <marian.tietz@ottogroup.com>

* Allow regression with 1d targets This change makes it possible to pass a 1-dimensional y to `NeuralNetRegressor`. Problem description Right now, skorch requires the `y` passed to `NeuralNetRegressor.fit` to be 2-dimensional, even if there is only one target, as is the most common case. This problem has come up a few times in the past, but mostly it's just an annoyance - just do `y.reshape(-1, 1)` and you're good (the error message says as much). There are, however, also cases where it's not so easy to solve. For instance, in #972, a user reports that they cannot use skorch with sklearn's `BaggingRegressor`. The problem is that even if `y` is reshaped, once it is passed to the net from `BaggingRegressor`, it is 1d again. I assume that `BaggingRegressor` internally squeezes `y` at some point. This PR lifts the 2d restriction check. Initial motivation Why does skorch require `y` to be 2d? I couldn't remember the initial reasoning and did some archeology. I found this comment: (2f00e25#diff-66ed08bca4d171889565d0285a36b9b47e0e91e3b33d85c51352d8eb00faefac): > # The problem with 1-dim float y is that the pytorch DataLoader will > # somehow upcast it to DoubleTensor This strange behavior should not be an issue anymore, so if that was the only problem, we should be able to just remove the constraint, right? Problems with removing the constraint Unfortunately, it's not that easy. The issue comes down to the following: When we remove the constraint and allow the target `y` to be 1d, but the prediction `y_pred` is still 2d, the criterion `nn.MSELoss` will probably do the wrong thing. What exactly is wrong? Instead of calculating the squared error for each sample pair, the criterion will broadcast the vector and calculate _all squared errors_ between each sample, then return the mean of that. To demonstrate, let's remove the reduction step and look at the shape: ```python >>> import torch >>> criterion = torch.nn.MSELoss(reduction='none') >>> y = torch.rand(100) >>> y_pred = torch.rand((100, 1)) >>> y.shape, y_pred.shape (torch.Size([100]), torch.Size([100, 1])) >>> se = criterion(y_pred, y) /home/vinh/anaconda3/envs/skorch/lib/python3.10/site-packages/torch/nn/modules/loss.py:536: UserWarning: Using a target size (torch.Size([100])) that is different to the input size (torch.Size([100, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.mse_loss(input, target, reduction=self.reduction) >>> se.shape torch.Size([100, 100]) ``` As can be seen, PyTorch broadcasts the two arrays, leading to 100x100 errors being calculated. Thankfully, PyTorch warns about potential issues with that. The current solution is to accept this behavior and hope that the users will indeed see the warning. If they don't see it or ignore it, it could be a huge issue, because they still get a loss scalar and might even see a small improvement in the loss during training. But the model will not converge and it's going to be a huge pain to debug the bug, if it's even identified as such. Just to be clear, existing code, which uses 2d targets, will not be affected by the change introduced in this PR and is still the preferred way (IMO) to use regression in skorch. Rejected solutions I did consider the following solutions but rejected them. Raising an error when shapes mismatch This would remove the risk of users missing the warning. The problem with this is that mismatching shapes can be okay in certain circumstances. Some criteria don't expect target and prediction to have the same shape, so we would need to check based on criterion. Moreover, theoretically, users may indeed want to broadcast. Raising an error would prevent that and users may have to resort to subclassing to circumvent the error. Automatic reshaping We could automatically add/remove dimensions if we see that they mismatch. This has the same problems as the previous solution regarding the dependence on the type of criterion. Furthermore, automatic adjustment of the user's output is prone to run into issues in some edge cases (e.g. when the broadcasting is actually desired). * Fix error when initializing BaggingRegressor For Python 3.7, CI got: TypeError: __init__() got an unexpected keyword argument 'estimator' for BaggingRegressor. Probably it installs an older version of sklearn, which uses a different argument name. Passing as positional arg should fix it. * Reviewer comment: typo Co-authored-by: ottonemo <marian.tietz@ottogroup.com> * Reviewer comment: typo Co-authored-by: ottonemo <marian.tietz@ottogroup.com> --------- Co-authored-by: ottonemo <marian.tietz@ottogroup.com>

Release text: This release offers a new interface for scikit-learn to do zero-shot and few-shot classification using open source large language models (Jump right into the example notebook). skorch.llm.ZeroShotClassifier and skorch.llm.FewShotClassifier allow the user to do classification using open-source language models that are compatible with the huggingface generation interface. This allows you to do all sort of interesting things in your pipelines. From simply plugging a LLM into your classification pipeline to get preliminary results quickly, to using these classifiers to generate training data candidates for downstream models. This is a first draft of the interface, therefore it is not unlikely that the interface will change a bit in the future, so please, let us know about any potential issues you have. Other items of this release are - the drop of Python 3.7 support - this version of Python has reached EOL and will not be supported anymore - the NeptuneLogger now logs the skorch version thanks to AleksanderWWW - NeuralNetRegressor can now be fitted with 1-dimensional y, which is necessary in some specific circumstances (e.g. in conjunction with sklearn's BaggingRegressor, see sklearn.ensemble.BaggingRegressor() #972); for this to work correctly, the output of the of the PyTorch module should also be 1-dimensional; the existing default, i.e. having y and y_pred be 2-dimensional, remains the recommended way of using NeuralNetRegressor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on May 17, 2023

Commits on May 22, 2023

Commits on Jun 2, 2023

Commits on Jun 15, 2023

Commits on Jun 19, 2023

Commits on Jun 23, 2023

Commits on Jun 26, 2023

This comparison is taking too long to generate.