Open
Description
I am implementing a discrete Bayesian network using a few Nonlinear functions and cannot get the messagePassingAlgorithm()
method to run without error. The resulting graph has all its edges terminated either by Clamp
or via a placeholder
. Here is the code, which is rather simple:
using ForneyLab
using Plots
gA = 3
gB = 5
gC = 8
# Three deterministic functions
function grade(i, d)
if i == 0 && d == 0
g1,g2,g3 = [0.3, 0.4, 0.3]
elseif i == 0 && d == 1
g1,g2,g3 = [0.05, 0.25, 0.7]
elseif i == 1 && d == 0
g1,g2,g3 = [0.9, 0.08, 0.02]
else
g1,g2,g3 = [0.5, 0.3, 0.2]
end
return [g1, g2, g3]
end
function letter(g)
if g == gA
l = .9
elseif g == gB
l = .6
else
l = 0.01
end
return l
end
function score(i)
if i == 0
s = 0.05
else
s = 0.8
end
return s
end
# The model
gr = FactorGraph()
# @RV b1 ~ Clamp(0.4)
# @RV b2 ~ Clamp(0.3)
@RV d ~ Bernoulli(0.4)
@RV i ~ Bernoulli(0.3)
@RV grade_ ~ Nonlinear{Sampling}(i, d, g=grade, n_samples=50)
@RV g ~ Categorical(grade_)
@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50)
@RV l ~ Bernoulli(letter_)
@RV score_ ~ Nonlinear{Sampling}(i, g=score, n_samples=50)
@RV s ~ Bernoulli(score_)
# placeholder(d, :d)
placeholder(i, :i)
placeholder(s, :s)
placeholder(l, :l)
placeholder(g, :g)
# The line the produces the error:
algo = messagePassingAlgorithm([d]) # <<<<< ERROR
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code));
The error produced is:
No applicable SumProductRule{Categorical} update for Categorical node with inbound types: Message{SampleList}, Nothing
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] inferUpdateRule!(entry::ScheduleEntry, rule_type::Type{SumProductRule{Categorical}}, inferred_outbound_types::Dict{Interface, Type})
@ ForneyLab ~/src/2022/ForneyLab.jl/src/algorithms/sum_product.jl:75
[3] inferUpdateRules!(schedule::Vector{ScheduleEntry}; inferred_outbound_types::Dict{Interface, Type})
@ ForneyLab ~/src/2022/ForneyLab.jl/src/message_passing.jl:203
[4] messagePassingSchedule(pf::PosteriorFactor)
@ ForneyLab ~/src/2022/ForneyLab.jl/src/algorithms/posterior_factor.jl:75
[5] messagePassingAlgorithm(target_variables::Vector{Variable}, pfz::PosteriorFactorization; ep_sites::Vector{Tuple}, id::Symbol, free_energy::Bool)
@ ForneyLab ~/src/2022/ForneyLab.jl/src/algorithms/inference_algorithm.jl:82
[6] messagePassingAlgorithm (repeats 2 times)
@ ~/src/2022/ForneyLab.jl/src/algorithms/inference_algorithm.jl:58 [inlined]
[7] top-level scope
@ In[31]:1
[8] eval
@ ./boot.jl:373 [inlined]
[9] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1196
Clearly, this means that there is no SumProduct rule to handle categorical variables with a SampleMessage Input. So my question is: how is this case treated? Would a straightforward variation technique be appropriate? Thanks.
Metadata
Metadata
Assignees
Labels
No labels
Activity
albertpod commentedon May 29, 2022
Hi @erlebach!
I am not sure I understand your model. Can you provide the equations for your generative model? Or you can try to explain what are you trying to model with this graph? It seems strange to me that you have a shared placeholder for the output of
Bernoulli
distribution and the input of aNonlinear
node (e.g.placeholder(i, :i)
)First of all, indeed there is no close form solution (aka sp message) in
ForneyLab.jl
with these incoming messages, i.e.Message{SampleList}, Nothing
.To circumvent the problem you need to introduce a recognition distribution ("variation technique") around the Categorical node. In the code below I've added one more line before algorithm construction:
However this code won't work either, because one of your recognition factors
g
is clamped (connected to theplaceholder(g, :g)
)What you can do, is to remove
placeholder(g, :g)
to build the algorithm.THIS WORKS!
However you might encounter other issues later. Hence, I would first suggest we discuss your model and then think of the inference.
erlebach commentedon May 29, 2022
Hi @albertpod:
Thank you for your investigation and suggestion, which I will try out. Let me provide you with some context. This problem was proposed as a class demonstration of Probability Graphical Models. The objective is to understand the relationships between intelligence (I), SAT scores (S), problem difficulty (D), test grade (G), and whether the student receives a letter of recommendation (L). Here is the proposed model:
P(I)
andP(D)
are modeled as Bernoulli distributions.P(G|D,I)
is modeled as a categorical distribution (we assume three different grades). The categorical probabilities depend on the inputs (D,I), so there are four different categorical triplets (whose elements sum to 1).P(L|G)
andP(S|I)
are Bernoulli distributions whose probability parameter depends on their condition variables. So each has two probabilities. These last three probabilities are captured through the functionsletter
,score
, andgrade
. The former two can be interpreted as mixtures of Bernoulli, andgrade
is a mixture of categorical distributions (I believe that theTransitionMixture
factor could be of help but I have not looked into it.I was hoping initially to be able to compute marginals on any variable not observed without appealing to variational inference. I wished to experiment with choosing different subsets of observed variables and examine their effect. Furthermore, I want to be able to specify which variables are observed via a dictionary, passed to some function, and have everything the appropriate inference executed. It should be noted that I am only providing a single observation at this stage, and the various distributions of the nodes are specified, so an analytical solution is possible. By analytical, I mean the following. The marginal of I is a Bernoulli with parameter
p=0.3
. If I specify thats=1
for example, theI
variable becomes a Bernoulli with a different parameter, which is increased. This parameter can be computed exactly. I can also perform a Monte-Carlo simulation to compute it. This is what I did in PyMC. I thought that forney lab would be able to apply expectation propagation to do this, since it can be done manually (with pencil and paper). I do realize that if I place a prior on the probability of intelligence, then its posterior can be evaluated via variational inference, and clarity ensues. But that was not my initial intent.Note that I might want to observe
g
. What does one do in that case? In reality, we can consider the current model to have a plate withN
measurements, and I am considering the case ofN=1
, i.e., the simplest case. That should be possible, no?I have implemented the model successfully in pymc, if that is of that is of interest to you. These implementations serve as exercises to develop expertise in a variety of frameworks in Julia and Python.
For reference, I have never heard the term
recognition distribution
before.erlebach commentedon May 29, 2022
How about we move this to the discussions?
albertpod commentedon May 30, 2022
Sorry, I noticed the new topic on
Discussion
earlier than I read your answer @erlebach.As for recognition distribution
q
. This term is also referred to as instrumental distribution. This term arises when we approximate our posterior distributionp
with some reference probability distributionq
(recognition distribution). In this case, we try to minimize KL divergence betweenp
andq
as log evidence can't be computed.We had a short discussion with @ThijsvdLaar about your issue. It seems like you can be just fine with
TransitionMixture
for this problem. @ThijsvdLaar will get back to you regarding this.I understand your wish to observe
g
, but then (to my understanding) your graph will change as well, i.e., you won't needP(G|D, I)
.In the meantime, I will try to come up with the inference solution for your model using a
Nonlinear
node.erlebach commentedon May 30, 2022
Thank you! Quick question on something I never understood. If a variable is observed, its probability distribution is quite irrelevant, is that not true? However, when estimating a likelihood, its probability distribution still impacts the posterior through the remaining non-observed variables. If true, P(G|D, I) is still needed. In any case, ideally, I would like, if possible, the flexibility of observing or not observing any subset of random variables without having to write different methods for each combination of observables. Could't an observable be implemented in a general manner through multiplication by a Delta function? This is possible with MCMC, but perhaps not with message passing.
albertpod commentedon May 30, 2022
I am not sure I've understood your question. What do you mean by irrelevant distribution? (let's continue this part at the Discussion; it is always helpful to draw a graph and inspect how the messages flow depending on your graph structure)
As for the solution with a
Nonlinear
node. It seems like we have found a bug that doesn't allow us to proceed with the mix of Bernoulli and Categorical distributions. However, Bernoulli can always be converted to a Categorical distribution, e.g.Bernoulli(0.3) <-> Categorical([0.3, 0.7])
. Given that I rewrote your model in the following way:As a side note, at the moment, our lab focuses on
ReactiveMP.jl
. Only three people can tackleForneyLab
issues (2 of those write theses, and one will be unavailable for a few days). As a consequence, our replies will be slow, but they will come sooner or later :) Thanks for trying FL in this setting; it helps to improve the framework.erlebach commentedon May 30, 2022
Thanks! One question: you wrote:
@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50, dims=[(2,),(2,),(2,)])
Shouldn't this be written as
@RV letter_ ~ Nonlinear{Sampling}(g, g=letter, n_samples=50, dims=[(2,),(2,)])
I am surprised this would have worked for you.
albertpod commentedon May 30, 2022
@erlebach ah, right, I made a typo. Your variant is correct.
FL just ignored the last dimensionality specification. We should throw an error or warning when a user does it.
erlebach commentedon May 30, 2022
Great. I ran the code, and it does not work. But I have a question. You wrote:
Why is that? When sampled, a Categorical returns a single number (0,1), or (1,2)? Why are you specifying a vector of length 2? Or are you specifying two observed values? I just want to make sure since in my original problem, I only assigned one observed value.
When executing
stepGRADE
, I get the error:Looking more closely, the error is in ProgressMeter. I removed Progressmeter, and still get the same error.
More specifically:
albertpod commentedon May 30, 2022
I have a feeling that you haven't specified the dimensionality of your placeholder as I, i.e.
placeholder(s, :s, dims=(2,))
.Are you sure you are using exactly the same code?
As for data, the Categorical node in FL defines a one-dimensional probability distribution over the normal basis vectors of dimension
d
. The output is one-hot encoded, meaning that for 1 we use (1,0), and for 2 - (0,1), etc.erlebach commentedon May 30, 2022
You are correct, @albertpod ; I had not specified the dimensionality of the placeholders. I did not copy/paste your code, but instead modified mine. That is how I learn faster.
I did not realize that a Categorical was one-hot encoded. Makes sense (different than PyMC).
The code now runs. Whether it runs correctly is another matter. I am surprised that a 100 iterations of a variational method runs slower than PyMC running 5,000 iterations of a Monte-Carlo sampler. Of course, the variational approach provides a parametrized distribution. And I have not been careful to make precise measurements. I guess the answer is because I am using the nonlinear sampler three times in my Model. :-)
Why do you use
unsafeMean
rather thanmean()
? I cannot find any documentation onunsafeMean
. What does it do?albertpod commentedon May 30, 2022
I understand, no worries.
Well, actually you also use "sampling". ForneyLab knows how to combine different message passing techniques. Here you have used hybrid message passing, i.e. SP, VMP, and Sampling-based (around a nonlinear node). It's hard to compare the iterations from one package to another, but I am also surprised by your observation.
I would be also suspicious about the correctness of the result. Unfortunately, for this model, we cannot inspect the free energy (our minimization target). So we don't know when the algorithm converges: within 1, 2,3, or 100 iterations. Maybe you can check how your marginals change over iterations.
We still need to check a few things on our side. Perhaps, we'll come up with a better solution.
As for
unsafeMean
, that's just a habit (old school FL), you can safely usemean
now.4 remaining items