Open
Description
Hi, thank you first for sharing your code.
I have some doubts about the realization of this alpha regularization.
self.alpha_reg = tf.reduce_mean(self.choice_qs * -tf.log(self.alphas))
When I read the article, I think this is a binary cross entropy loss, but regarding to the code implementation, if I have choice_qs with values [0,0,0,1,1,0] and alphas with values [0.9999,0.9999,0.9999,0.9999,0.9999,0.9999]. It performs bad for negative examples, but we have no gradient for negative examples. Is this a problem that I do not understand well, or is there any special trick in the implementation of this term.
Metadata
Assignees
Labels
No labels