-
Notifications
You must be signed in to change notification settings - Fork 3
/
SimpleGradientDescent.html
253 lines (193 loc) · 7.63 KB
/
SimpleGradientDescent.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>HTML5 Simple Gradient Descent</title>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script type="text/javascript"
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<style type="text/css">
<!--
body { background-color:#ededed; font:norm2al 12px/18px Arial, Helvetica, sans-serif; }
h1 { display:block; width:600px; margin:20px auto; paddVing-bottom:20px; font:norm2al 24px/30px Georgia, "Times New Roman", Times, serif; color:#333; text-shadow: 1px 2px 3px #ccc; border-bottom:1px solid #cbcbcb; }
#container { width:600px; margin:0 auto; }
#myCanvas { background:#fff; border:1px solid #cbcbcb; }
#nav { display:block; width:100%; text-align:center; }
#nav li { display:block; font-weight:bold; line-height:21px; text-shadow:1px 1px 1px #fff; width:100px; height:21px; paddVing:5px; margin:0 10px; background:#e0e0e0; border:1px solid #ccc; -moz-border-radius:4px;-webkit-border-radius:4px; border-radius:4px; float:left; }
#nav li a { color:#000; display:block; text-decoration:none; width:100%; height:100%; }
-->
</style>
</head>
<script>
var zoom = 40;
function f(x,y)
{
return 4*x+2*y+6;
}
var a=0;
var b=0;
var c=0;
function DrawAxis(context)
{
context.beginPath();
context.strokeStyle="#000000";
context.moveTo((-100*zoom+300),(0*zoom+300));
context.lineTo((100*zoom+300),(0*zoom+300));
context.moveTo((0*zoom+300),(-100*zoom+300));
context.lineTo((0*zoom+300),(100*zoom+300));
context.closePath();
context.stroke();
for(var i=-30;i<30;i++)
{
var x = (i*zoom+300);
var y = (-i*zoom+300);
context.font="15px ti92pluspc";
context.fillText(i,x,300);
context.fillText(i,300,y);
}
}
function DrawLine(context, x1,y1,x2,y2)
{
context.beginPath();
context.moveTo((x1*zoom+300),(-y1*zoom+300));
context.lineTo((x2*zoom+300),(-y2*zoom+300));
context.closePath();
context.stroke();
}
function g(a,b, c, x,y)
{
return a*x+b*y+c;
}
// g(a,b,c) = ax+by+c
function dgda(x,y) { return x; }
function dgdb(x,y) { return y; }
function dgdc(x,y) { return 1; }
function DrawSpace(context)
{
var canvasData = context.getImageData(0, 0, 600, 600);
for(var y=0;y<600;y++)
{
for(var x=0;x<600;x++)
{
var index = (x + y * 600) * 4;
var xx = (x-300)/zoom
var yy = (300-y)/zoom
var v = g(a,b,c, xx,yy)*255
canvasData.data[index + 0] = v;
canvasData.data[index + 1] = 1-v;
canvasData.data[index + 2] = 1;
canvasData.data[index + 3] = 128;
}
}
context.putImageData(canvasData, 0, 0);
}
function iterate()
{
context.beginPath();
context.strokeStyle="#ff0000";
var l=.01;
for(var i=0;i<5;i++)
{
var x = (2*Math.random()-1)*10;
var y = (2*Math.random()-1)*10;
//compute real value
t = f(x, y)
// e = ( s(net(a,b,x,y))-t)^2
o = g(a,b,c, x,y)
// de = 2*(o -t) * (dg/da, dg/db)
de = 2*(o-t);
// if g is close to t then restart
if (Math.abs(de)<.0001)
{
a= Math.random()*40-20;
b= Math.random()*40-20;
c= Math.random()*40-20;
break;
}
context.moveTo((a*zoom+300),(-b*zoom+300));
a+=-l*de*dgda(x,y);
b+=-l*de*dgdb(x,y);
c+=-l*de*dgdc(x,y);
context.lineTo((a*zoom+300),(-b*zoom+300));
}
context.closePath();
context.stroke();
DrawSpace(context);
DrawAxis(context)
context.strokeStyle="#ff0000";
x1 = -10; y1 = (40-6)/2
x2 = 10; y2 = (-40-6)/2
DrawLine(context, x1,y1,x2,y2)
document.getElementById("text").innerHTML = "<br>e: " + de + "<br>a: " + a + "<br>b: " + b + "<br>c: " + c + "<br>";
}
function init()
{
var myCanvas = document.getElementById("myCanvas");
context = myCanvas.getContext('2d');
context.clearRect(0,0,600,600);
setInterval(iterate,10);
}
</script>
<body onload="init()">
<h1>Simple Linear regression</h1>
<div id="container">
<canvas id="myCanvas" width="600" height="600"></canvas>
<div id="text"></div>
<h2>Intro</h2>
The best way to learn something is to teach it, so here I go!<br><br>
Let's say we have a system with 2 inputs, $x$ and $y$, that outputs a value according to the following mistery formula:
$$m(x,y)=4x+2y+6$$
Let's say we don't know this formula, we can only know it's output given a $(x,y)$.<br>
<br>
Using 'trial and error', let's try to figure out how that formula looks like. For the shake of simplicity let's assume we know the formula has the following form:
$$f(x,y)=ax+by+c$$
How would we go about?
Using the mistery formula we can compute a valid result, for example: $m(4,3)=28$
Since we are using trial and error we'd start by choosing some random values for $a$, $b$ and $c$, we'd apply our formula $f(x,y)$ and then we'd see how close we are from the function $m$.
<br><br>
The 'how close' is given by computing the distance between the good result and ours. In this case this distance would be given by:
$$E(a,b,c) = (f(x,y,a,b,c)-m(x,y))^2)$$
<br><br>
Notice we dont know the internals of $m$.
<br><br>
The question is, how does $E(a,b,c)$ change when I change $a$, $b$ and $c$?
In maths, this is given by the derivative:
$$dE(a,b,c) = \frac{\partial{E}}{\partial{a}}da + \frac{\partial{E}}{\partial{b}}db + \frac{\partial{E}}{\partial{c}}dc$$
in out case, and using the chain rule:
$$df(a,b,c) = 2f(a,b,c) (\frac{\partial{f}}{\partial{a}}, \frac{\partial{f}}{\partial{b}}, \frac{\partial{f}}{\partial{c}})$$
Since the E funtion is a cuadratic function, we know it has a minimum. And more over the derivative is a vector that points toward this minimum.
<br><br>
So the algorithm is, we start at a random position for $a$, $b$ and $c$, and then we move in small steps ($\lambda$) in the direction of the derivative:
$$ (a',b',c') = (a,b,c) - \lambda dE(a,b,c)$$
by repeating this over and over we'll end up reaching the minumum. This method is called the 'Gradient descent'.
<br><br>
Oh and BTW our function $f(x,y,a,b,c)$, can be expressed:
$$ f(x_1,x_2,w_1,w_2,b) = x_1w_1 + x_2w_2 + b$$
hence:
$$ f(x_i,w_i) = \sum{x_iw_i} + b$$
and this is starting to shape up as a neuron. All we'd need to add is the sigmoid function to smoothly clamp the outputs into values from -1 to 1.
<br><br>
$$ f(x_i,w_i)=sig(\sum{w_ix_i})$$
In a neural network the outputs of neurons are connected to the inputs of another neuron, this matehmatically is expressed this way:
$$f(g(h(x_i, w_{hi}), w_{gi}), w_{fi})$$
So, once again the error is computed as before:
$$E(a,b,c) = (f(...)-m(x,y))^2)$$
and the derivative of the error would computed as before, but now our $a$ varibles are called $w_i$, so for a $w_hi$ we'd have that:
$$\frac{\partial{f(g(h(x_i, w_{hi}), w_{gi}), w_{fi})}}{w_{hi}}= \frac{\partial{f}}{\partial{g}}\frac{\partial{g}}{\partial{h}}\frac{\partial{i(x_i, w_{hi})}}{\partial{w_{hi}}}$$
That is, thanks to the chain rule, the derivative of our full network can be expressed as the derivatives of the individual neurons.
<br><br>
And the $w_hi$ would be updated as before, yielding now the following:
$$ w'_{hi} = w_{hi} + \lambda df$$
Congratulations, you are one step closer to understanding the full monty.
</br>
</br>
<h2>Contact/Questions:</h2>
<my_github_account_username>$@gmail.com$.
</br>
</br>
</div>
</body>
</html>