Questions about the loss used for optimizing the proxy model

@sangmichaelxie It seems that the loss used for optimizing the proxy model in the code is different from the one described in the paper.
```py
loss = (pertoken_loss * curr_domain_weights.detach()).sum() / normalizer
```

In the code, you directly use the proxy model's own loss here to optimize. But in the paper, the loss seems to be the minimax loss which uses the excess loss. So which one should I conform? Or there is something wrong with my understanding. Thanks.

<img width="807" alt="image" src="https://github.com/sangmichaelxie/doremi/assets/33905626/83158ca9-eef3-4982-8219-a0c7f8181051">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the loss used for optimizing the proxy model #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development