forked from erlyaws/yaws
-
Notifications
You must be signed in to change notification settings - Fork 0
/
internals.yaws
319 lines (253 loc) · 11.5 KB
/
internals.yaws
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
<erl>
out(A) ->
{ssi, "TAB.inc", "%%",[{"internals", "choosen"}]}.
</erl>
<div id="entry">
<h1>Internals</h1>
<h2>Introduction</h2>
<p>I'll try to describe some of the internal workings of Yaws in this page.
The page is thus mostly interesting for people interested in either hacking Yaws
or simply wanting to get a better understanding.
</p>
<p>I'll describe how Yaws pages get compiled, the process structure
and other things which can make it easier to understand the code. This page
is ment to be read by programmers that wish to either work on Yaws or
just get a better understanding.
</p>
<h2> JIT Compiling a .yaws page</h2>
<p>
When the client GETs a a page that has a .yaws suffix. The Yaws server
will read that page from the hard disk and divide it in parts
that consist of HTML code and Erlang code. Each chunk of Erlang code
will be compiled into a module. The chunk of Erlang code must contain
a function <tt>out/1</tt> If it doesn't the Yaws server will insert a
proper error message into the generated HTML output.
</p>
<p>When the Yaws server ships a .yaws page it will process it chunk by chunk
through the .yaws file. If it is HTML code, the server will ship that
as is, whereas if it is Erlang code, the Yaws server will invoke the
<tt>out/1</tt> function in that code and insert the output of that <tt>out/1</tt>
function into the stream
of HTML that is being shipped to the client.
</p>
<p>Yaws will cache the result of the compilation
and the next time a client requests the same .yaws page Yaws will
be able to invoke the already compiled modules directly.
</p>
<p>This is best explained by an example:</p>
<p>Say that a file consists of 400 bytes, we have "foo.yaws"
and it looks like:</p>
<p>
<img src="compile_layout.png" />
</p>
<p>When a client request the file "foo.yaws", the webserver will
look in its cache for the file, (more on that later). For the sake of
argument, we assume the file is not in the cache.
</p>
<p>The file will be processes by the code in <tt>yaws_compile.erl</tt>
and the result will be a structure that looks like:</p>
<div class="box">
<verbatim>
[CodeSpec]
CodeSpec = Data | Code | Error
Data = {data, NumChars}
Code = {mod, LineNo, YawsFile, NumSkipChars, Mod, Func}
Err = {error, NumSkipChars, E}
</verbatim>
</div>
<p>In the particular case of our "foo.yaws" file above, the JIT
compiler will return:
</p>
<div class="box">
<verbatim>
[{mod, 1, "/foo.yaws", 100, m1, out},
{data, 200},
{mod, 30, "/foo.yaws", 100, m2, out}
]
</verbatim>
</div>
<p>
This structure gets stored in the cache and will continue
to be associated to the file "foo.yaws".
</p>
<p>When the server "ships" a .yaws page, it needs the <tt>CodeSpec</tt>
structure to do it. If the structure is not in the cache, the page
gets JIT compiled and inserted into the cache.
</p>
<p>To ship the above <tt>CodeSpec</tt> structure, the server
performs the following steps:</p>
<ol>
<li>Create the Arg structure which is a #arg{} record, this
structure is wellknown to all yaws programmers since it's the
main mechanism to pass data from the server to the .yaws
page.</li>
<li>Item (1) Invoke <tt>m1:out(Arg)</tt></li>
<li>Look at the return value from <tt>m1:out(Arg)</tt> and
perform whatever is requested. This typically involves generating
some dynamic ehtml code, generate headers or whatever.
</li>
<li>Finally jump ahead 100 bytes in the file as a result of
processing the first <tt>CodeSpec</tt> item.</li>
<li>Item (2) Next <tt>CodeSpec</tt> is just plain data from the file,
thus we read 200 bytes from the file (or rather from the cache
since the data will be there) and ship to the client.</li>
<li>Item (3) Yet another {mod structure which is handled
the same way as Item (1) above except that the erlang module
is <tt>m2</tt> instead of <tt>m1</tt></li>
</ol>
<p>Another thing that is worth mentioning is that yaws will
not ship (write on the socket) data until all content is generated.
This is questionable
and different from what i.e. PHP does. This makes it possible to
generate headers after content has been generated.
</p>
<h2>Process structure</h2>
<p>Before describing the process structure, I need to describe
the two most important datastructures in Yaws. The <tt>#gconf{}</tt>
and the <tt>#sconf{}</tt> records.
</p>
<p><b>Note:</b> To retrieve information from these records, yaws:gconf_*/1
and yaws:sconf_*/1 (e.g. yaws:gconf_id/1 or yaws:sconf_docroot/1) should
be used in preference to a direct access to reduce the dependence of your
code on it.
</p>
<h3>The <tt>#gconf{}</tt> record</h3>
<p>This record is used to hold all global state, i.e. state and configuration
data which is valid for all Virtual servers.
The record looks like:
</p>
<div class="box">
<verbatim>
%% global conf
-record(gconf,{
yaws_dir, % topdir of Yaws installation
trace, % false | {true,http} | {true,traffic}
flags = ?GC_DEF, % boolean flags
logdir,
ebin_dir = [],
src_dir = [],
runmods = [], % runmods for entire server
keepalive_timeout = 30000,
keepalive_maxuses = nolimit, % nolimit or non negative integer
max_num_cached_files = 400,
max_num_cached_bytes = 1000000, % 1 MEG
max_size_cached_file = 8000,
max_connections = nolimit, % max number of TCP connections
%% Override default connection handler processes spawn options for
%% performance/memory tuning.
%% [] | [{fullsweep_after,Number}, {min_heap_size, Size}]
%% other options such as monitor, link are ignored.
process_options = [],
large_file_chunk_size = 10240,
mnesia_dir = [],
log_wrap_size = 1000000, % wrap logs after 1M
cache_refresh_secs = 30, % seconds (auto zero when debug)
include_dir = [], % list of inc dirs for .yaws files
phpexe = "/usr/bin/php-cgi", % cgi capable php executable
yaws, % server string
id = "default", % string identifying this instance of yaws
enable_soap = false, % start yaws_soap_srv iff true
%% a list of
%% {{Mod, Func}, WsdlFile, Prefix} | {{Mod, Func}, WsdlFile}
%% automatically setup in yaws_soap_srv init.
soap_srv_mods = [],
acceptor_pool_size = 8, % size of acceptor proc pool
mime_types_info, % undefined | #mime_types_info{}
nslookup_pref = [inet], % [inet | inet6]
ysession_mod = yaws_session_server, % storage module for ysession
ysession_cookiegen, % ysession cookie generation module
ysession_idle_timeout = 2*60*1000, % default 2 minutes
ysession_long_timeout = 60*60*1000, % default 1 hour
sni = disable % disable | enable | strict
}).
</verbatim>
</div>
<p>The structure is derived from the /etc/yaws/yaws.conf file and is passed
around all through the functions in the server.
</p>
<h3> The <tt>#sconf{}</tt> record</h3>
<p>The next important datastructure is the <tt>#sconf{}</tt> record. It
is used to describe a single virtual server.
<p>Each:
</p>
<p>
<verbatim>
<server>
.....
</server>
</verbatim>
</p>
<p>In the /etc/yaws/yaws.conf file corresponds to one <tt>#sconf{}</tt>
record. We have: </p>
<div class="box">
<verbatim>
%% server conf
-record(sconf, {
port = 8000, % which port is this server listening to
flags = ?SC_DEF,
redirect_map=[], % a list of
% {Prefix, #url{}, append|noappend}
% #url{} can be partially populated
rhost, % forced redirect host (+ optional port)
rmethod, % forced redirect method
docroot, % path to the docs
xtra_docroots = [], % if we have additional pseudo docroots
listen = [{127,0,0,1}], % bind to this IP, {0,0,0,0} is possible
servername = "localhost", % servername is what Host: header is
serveralias = [], % Alternate names for this vhost
yaws, % server string for this vhost
ets, % local store for this server
ssl, % undefined | #ssl{}
authdirs = [], % [{docroot, [#auth{}]}]
partial_post_size = 10240,
%% An item in the appmods list can be either of the
%% following, this is all due to backwards compat issues.
%% 1. an atom - this is the equivalent to {atom, atom}
%% 2 . A two tuple {Path, Mod}
%% 3 A three tuple {Path, Mod, [ExcludeDir ....]}
appmods = [],
expires = [],
errormod_401 = yaws_outmod, % the default 401 error module
errormod_404 = yaws_outmod, % the default 404 error module
errormod_crash = yaws_outmod, % use the same module for crashes
arg_rewrite_mod = yaws,
logger_mod = yaws_log, % access/auth logging module
opaque = [], % useful in embedded mode
start_mod, % user provided module to be started
allowed_scripts = [yaws,php,cgi,fcgi],
tilde_allowed_scripts = [],
index_files = ["index.yaws", "index.html", "index.php"],
revproxy = [],
soptions = [{listen_opts, [{backlog, 1024}]}],
extra_cgi_vars = [],
stats, % raw traffic statistics
fcgi_app_server, % FastCGI application server {host,port}
php_handler = {cgi, "/usr/bin/php-cgi"},
shaper,
deflate_options, % undefined | #deflate{}
mime_types_info, % undefined | #mime_types_info{}
% if undefined, global config is used
dispatch_mod % custom dispatch module
}).
</verbatim>
</div>
<p>Both of these two structures are defined in "yaws.hrl"</p>
<p>Now we're ready to describe the process structure. We have:</p>
<img src="process_tree.png" />
<p>Thus, all the different "servers" defined in the configuration
file are clumped together in groups. For HTTP (i.e. not HTTPS) servers
there can be multiple virtual servers per IP address. Each group is
defined by the pair <tt>{IpAddr, Port}</tt> and they all need to
have different server names.</p>
<p>The client will send the server name in the "Host:" header and that
header is used to pick a <tt>#sconf{}</tt> record out of the list
of virtual servers for a specific <tt>{Ip,Port}</tt> pair.
</p>
<p>SSL servers are different, we cannot read the headers before we
decide which virtual server to choose because the certificate is connected
to a server name. Thus, there can only be one HTTPS server per
<tt>{Ip,Port}</tt> pair.
</div>
<erl>
out(A) -> {ssi, "END2",[],[]}.
</erl>