SimpleGradientDescent.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>HTML5 Simple Gradient Descent</title>
<script type="text/x-mathjax-config">
  MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script type="text/javascript"
  src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<style type="text/css">
<!--
body { background-color:#ededed; font:norm2al 12px/18px Arial, Helvetica, sans-serif; }
h1 { display:block; width:600px; margin:20px auto; paddVing-bottom:20px; font:norm2al 24px/30px Georgia, "Times New Roman", Times, serif; color:#333; text-shadow: 1px 2px 3px #ccc; border-bottom:1px solid #cbcbcb; }
#container { width:600px; margin:0 auto; }
#myCanvas { background:#fff; border:1px solid #cbcbcb; }
#nav { display:block; width:100%; text-align:center; }
#nav li { display:block; font-weight:bold; line-height:21px; text-shadow:1px 1px 1px #fff; width:100px; height:21px; paddVing:5px; margin:0 10px; background:#e0e0e0; border:1px solid #ccc; -moz-border-radius:4px;-webkit-border-radius:4px; border-radius:4px; float:left; }
#nav li a { color:#000; display:block; text-decoration:none; width:100%; height:100%; }
-->
</style>
</head>

<script>

var zoom = 40;

function f(x,y)
{
    return 4*x+2*y+6;
}

var a=0;
var b=0;
var c=0;

function DrawAxis(context)
{
    context.beginPath();
    context.strokeStyle="#000000";
    context.moveTo((-100*zoom+300),(0*zoom+300));      
    context.lineTo((100*zoom+300),(0*zoom+300));       
    context.moveTo((0*zoom+300),(-100*zoom+300));      
    context.lineTo((0*zoom+300),(100*zoom+300));       
    context.closePath();
    context.stroke();

    for(var i=-30;i<30;i++)
    {
        var x = (i*zoom+300);
        var y = (-i*zoom+300);
        context.font="15px ti92pluspc";
        context.fillText(i,x,300);
        context.fillText(i,300,y);
    }
}

function DrawLine(context, x1,y1,x2,y2)
{
    context.beginPath();
    context.moveTo((x1*zoom+300),(-y1*zoom+300));      
    context.lineTo((x2*zoom+300),(-y2*zoom+300));           
    context.closePath();
    context.stroke();
}

function g(a,b, c, x,y)
{
    return a*x+b*y+c;
}

// g(a,b,c) = ax+by+c
function dgda(x,y) { return x; }
function dgdb(x,y) { return y; }
function dgdc(x,y) { return 1; }


function DrawSpace(context)
{
    var canvasData = context.getImageData(0, 0, 600, 600);
    
    for(var y=0;y<600;y++)
    {
        for(var x=0;x<600;x++)
        {
            var index = (x + y * 600) * 4;

            var xx = (x-300)/zoom
            var yy = (300-y)/zoom
            var v = g(a,b,c, xx,yy)*255

            canvasData.data[index + 0] = v;
            canvasData.data[index + 1] = 1-v;
            canvasData.data[index + 2] = 1;
            canvasData.data[index + 3] = 128;
        }
    }
    context.putImageData(canvasData, 0, 0);    
}

function iterate()
{
    
    context.beginPath();
    context.strokeStyle="#ff0000";

    var l=.01;
    for(var i=0;i<5;i++)
    {
        var x = (2*Math.random()-1)*10;
        var y = (2*Math.random()-1)*10;

        //compute real value
        t = f(x, y)
        
        // e = ( s(net(a,b,x,y))-t)^2
        o = g(a,b,c, x,y)
        
        // de = 2*(o -t) * (dg/da, dg/db)          
        de = 2*(o-t);

        // if g is close to t then restart
        if (Math.abs(de)<.0001)
        {
            a= Math.random()*40-20;
            b= Math.random()*40-20;
            c= Math.random()*40-20;        
            break;
        }

        context.moveTo((a*zoom+300),(-b*zoom+300));      
        a+=-l*de*dgda(x,y);
        b+=-l*de*dgdb(x,y);
        c+=-l*de*dgdc(x,y);
        context.lineTo((a*zoom+300),(-b*zoom+300));       
    }
    context.closePath();
    context.stroke();        
    
    DrawSpace(context);
    
    DrawAxis(context)        

    context.strokeStyle="#ff0000";

    x1 = -10; y1 = (40-6)/2
    x2 = 10; y2 = (-40-6)/2
    DrawLine(context, x1,y1,x2,y2)


    document.getElementById("text").innerHTML = "<br>e: "  + de + "<br>a: "  + a + "<br>b: " + b + "<br>c: " + c + "<br>";
}

function init()
{
    var myCanvas = document.getElementById("myCanvas");
    context = myCanvas.getContext('2d');
    context.clearRect(0,0,600,600);

   
    setInterval(iterate,10); 
}
</script>


<body onload="init()">
<h1>Simple Linear regression</h1>
<div id="container">
	
<canvas id="myCanvas" width="600" height="600"></canvas>

<div id="text"></div>

<h2>Intro</h2>
The best way to learn something is to teach it, so here I go!<br><br>
Let's say we have a system with 2 inputs, $x$ and $y$, that outputs a value according to the following mistery formula: 
$$m(x,y)=4x+2y+6$$
Let's say we don't know this formula, we can only know it's output given a $(x,y)$.<br>
<br>
Using 'trial and error', let's try to figure out how that formula looks like. For the shake of simplicity let's assume we know the formula has the following form:
$$f(x,y)=ax+by+c$$    

How would we go about?

Using the mistery formula we can compute a valid result, for example: $m(4,3)=28$
Since we are using trial and error we'd start by choosing some random values for $a$, $b$ and $c$, we'd apply our formula  $f(x,y)$ and then we'd see how close we are from the function $m$.
<br><br>
The 'how close' is given by computing the distance between the good result and ours. In this case this distance would be given by:

$$E(a,b,c) = (f(x,y,a,b,c)-m(x,y))^2)$$
<br><br>
Notice we dont know the internals of $m$.
<br><br>
The question is, how does $E(a,b,c)$ change when I change $a$, $b$ and $c$?

In maths, this is given by the derivative:

$$dE(a,b,c) = \frac{\partial{E}}{\partial{a}}da + \frac{\partial{E}}{\partial{b}}db + \frac{\partial{E}}{\partial{c}}dc$$

in out case, and using the chain rule:

$$df(a,b,c) = 2f(a,b,c) (\frac{\partial{f}}{\partial{a}}, \frac{\partial{f}}{\partial{b}}, \frac{\partial{f}}{\partial{c}})$$

Since the E funtion is a cuadratic function, we know it has a minimum. And more over the derivative is a vector that points toward this minimum. 
<br><br>
So the algorithm is, we start at a random position for $a$, $b$ and $c$, and then we move in small steps ($\lambda$) in the direction of the derivative:

$$ (a',b',c') = (a,b,c) - \lambda dE(a,b,c)$$

by repeating this over and over we'll end up reaching the minumum. This method is called the 'Gradient descent'.
<br><br>
Oh and BTW our function $f(x,y,a,b,c)$, can be expressed:

$$ f(x_1,x_2,w_1,w_2,b) = x_1w_1 + x_2w_2 + b$$

hence:

$$ f(x_i,w_i) = \sum{x_iw_i} + b$$

and this is starting to shape up as a neuron. All we'd need to add is the sigmoid function to smoothly clamp the outputs into values from -1 to 1.
<br><br>

$$ f(x_i,w_i)=sig(\sum{w_ix_i})$$

In a neural network the outputs of neurons are connected to the inputs of another neuron, this matehmatically is expressed this way:

$$f(g(h(x_i, w_{hi}), w_{gi}), w_{fi})$$

So, once again the error is computed as before:
$$E(a,b,c) = (f(...)-m(x,y))^2)$$
and the derivative of the error would computed as before, but now our $a$ varibles are called $w_i$, so for a $w_hi$ we'd have that:

$$\frac{\partial{f(g(h(x_i, w_{hi}), w_{gi}), w_{fi})}}{w_{hi}}= \frac{\partial{f}}{\partial{g}}\frac{\partial{g}}{\partial{h}}\frac{\partial{i(x_i, w_{hi})}}{\partial{w_{hi}}}$$

That is, thanks to the chain rule, the derivative of our full network can be expressed as the derivatives of the individual neurons. 
<br><br>
And the $w_hi$ would be updated as before, yielding now the following:

$$ w'_{hi} = w_{hi} + \lambda df$$
  
Congratulations, you are one step closer to understanding the full monty. 
  
</br>
</br>
<h2>Contact/Questions:</h2>
 &lt;my_github_account_username&gt;$@gmail.com$.
</br>
</br>
</div>
</body>
</html>