-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to replace listeners for sourced components #6852
Conversation
@@ -738,6 +738,24 @@ def get_package_path(name: str) -> Path: | |||
return Path(pkg.__file__).parent | |||
|
|||
|
|||
def replace_model_node(model: Model, target: Model, replacement: Model) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably we should have this in Thinc. If we do put a version of this in Thinc I guess we can remove this utility later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There is a spaCy-specific issue here though. The TransformerListener.v1
specifically sets a ref "listener"
on its model to be able to get the actual TransformerListener
layer which is chained into the model. This is used in the get_tok2vec_width
utility function that is needed for shape inference when we combine a transformer with textcat
.
Soooo ... perhaps this method needs a list of "ref" names that shouldn't be transferred but deleted instead. So in this case we'd call it with replace_model_node(model, target, replacement, del_refs=["listener"]
or something such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really convinced that problem needs solving. Sure, we'll have a ref there named listener that returns a non-listener result, due to this quirk of the model's history. But it does return the thing the listener pointed to, so...Trying to clean it up would potentially cause just as many, but different, problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a huge problem and it won't lead to any bugs right now, but I can just feel us coming back to this in 6 months going "oh right, that listener ref is not actually a listener and now this new thing here broke down". I'm OK with leaving this as is though, because my proposed solution definitely isn't clean either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!
It won't be a very easy/straightforward utility to use, with the listener components already being a bit complex/unintuitive. But hopefully we'll be able to guide people through this when their use-case requires it.
Having to define the replace_listeners
list makes things more difficult though, I wish there was a way to avoid it (and still be generic enough). Couldn't we add this information to the internal listener_map
of the Tok2Vec
/ Transformer
component? Because when you're calling find_listeners
, you're actually collecting all the relevant paths. So basically the Tok2Vec
would know which sublayers of Tagger
are listening to it, and then in the config you could just say "make the tagger
component fully independent of the tok2vec
component"?
@@ -738,6 +738,24 @@ def get_package_path(name: str) -> Path: | |||
return Path(pkg.__file__).parent | |||
|
|||
|
|||
def replace_model_node(model: Model, target: Model, replacement: Model) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There is a spaCy-specific issue here though. The TransformerListener.v1
specifically sets a ref "listener"
on its model to be able to get the actual TransformerListener
layer which is chained into the model. This is used in the get_tok2vec_width
utility function that is needed for shape inference when we combine a transformer with textcat
.
Soooo ... perhaps this method needs a list of "ref" names that shouldn't be transferred but deleted instead. So in this case we'd call it with replace_model_node(model, target, replacement, del_refs=["listener"]
or something such.
That's an interesting idea, but I'm not sure it would work? 🤔 (It's all pretty abstract, though, so if you have an idea, feel free to submit a quick PR, might be easier to discuss if there's code 😅) The main reason I ended up adding this list is that we can't know where in the config the listeners are. In theory, they could be anywhere and passed in at any point as an argument. Once an object (model) is created, we can find all listeners, but we're not able to reconstruct where they were defined in the config 😞 |
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Whether listener was removed
…ion/spaCy into feature/replace-listeners
Yea I think you're right 😭 |
Description
Related discussion: #6649
Sourcing components from an existing model and only training some of them is a very common workflow. For example, a user may want to source the tagger and parser from the
en_core_web_sm
pipeline, freeze those two components and only train an NER component on top of it. By default, all component models "listen" to the same token-to-vector component (e.g.Tok2Vec
orTransformer
), so even though the tagger/parser are frozen and not updated, the tok2vec component will be updated together with the NER, which can lead to significantly degraded performance for the frozen component. This is logical, but pretty unintuitive.A solution for this is to replace the tok2vec listeners with standalone tok2vec layers so that components can "own" their tok2vec weights instead of sharing them. This may result in slightly larger package sizes and copied weights, but allows components to be more modular.
This PR adds support for this process via a new
Language.replace_listeners
method that takes care of updating the component model in place and adjusting the config accordingly. In the config, a sourced component can now optionally define a list ofreplace_listeners
, the paths to the listener layers to replace (either none or all of them).Example 1: Freeze the tagger, replace its listener with a copy of the tok2vec layer and train the NER and tok2vec components
Example 2: Freeze the tok2vec and tagger component, replace the NER's tok2vec listener with a copy of the tok2vec layer and train only the NER component
Todo
Types of change
enhancement
Checklist