You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
currently we assume that what we get as input is what we will predict as output (just shifted). However, thinking towards other research areas it might make sense that we rework that more generally:
model
input: BxIxT
output: BxQxT
where I might match Q but does not have to. In the training we would then have code like the following
deftraining_step(batch):
inputs=batch['x']
if't'inbatch:
targets=batch['t'] # allows us to provide alternative targetselifI==Q:
targets=inputs[..., 1:]
inputs=inputs[..., :-1]
else:
raiseValueError(...)
logits=self.forward(inputs)
loss=ce(logits, targets)
what's more is that we need to think about input transformers. Currently we use one-hot encoding hardwired into the model. We might instead consider a differentiable input_transform that is given to the model upon initialization. This would allow us to use differentiable embedding strategies.
The text was updated successfully, but these errors were encountered:
currently we assume that what we get as input is what we will predict as output (just shifted). However, thinking towards other research areas it might make sense that we rework that more generally:
where I might match Q but does not have to. In the training we would then have code like the following
what's more is that we need to think about input transformers. Currently we use one-hot encoding hardwired into the model. We might instead consider a differentiable input_transform that is given to the model upon initialization. This would allow us to use differentiable embedding strategies.
The text was updated successfully, but these errors were encountered: