out(A) -> {ssi, "TAB.inc", "%%",[{"internals", "choosen"}]}.

Internals

Introduction

I'll try to describe some of the internal workings of Yaws in this page. The page is thus mostly interesting for people interested in either hacking Yaws or simply wanting to get a better understanding.

I'll describe how Yaws pages get compiled, the process structure and other things which can make it easier to understand the code. This page is ment to be read by programmers that wish to either work on Yaws or just get a better understanding.

JIT Compiling a .yaws page

When the client GETs a a page that has a .yaws suffix. The Yaws server will read that page from the hard disk and divide it in parts that consist of HTML code and Erlang code. Each chunk of Erlang code will be compiled into a module. The chunk of Erlang code must contain a function out/1 If it doesn't the Yaws server will insert a proper error message into the generated HTML output.

When the Yaws server ships a .yaws page it will process it chunk by chunk through the .yaws file. If it is HTML code, the server will ship that as is, whereas if it is Erlang code, the Yaws server will invoke the out/1 function in that code and insert the output of that out/1 function into the stream of HTML that is being shipped to the client.

Yaws will cache the result of the compilation and the next time a client requests the same .yaws page Yaws will be able to invoke the already compiled modules directly.

This is best explained by an example:

Say that a file consists of 400 bytes, we have "foo.yaws" and it looks like:

When a client request the file "foo.yaws", the webserver will look in its cache for the file, (more on that later). For the sake of argument, we assume the file is not in the cache.

The file will be processes by the code in yaws_compile.erl and the result will be a structure that looks like:

[CodeSpec] CodeSpec = Data | Code | Error Data = {data, NumChars} Code = {mod, LineNo, YawsFile, NumSkipChars, Mod, Func} Err = {error, NumSkipChars, E}

In the particular case of our "foo.yaws" file above, the JIT compiler will return:

[{mod, 1, "/foo.yaws", 100, m1, out}, {data, 200}, {mod, 30, "/foo.yaws", 100, m2, out} ]

This structure gets stored in the cache and will continue to be associated to the file "foo.yaws".

When the server "ships" a .yaws page, it needs the CodeSpec structure to do it. If the structure is not in the cache, the page gets JIT compiled and inserted into the cache.

To ship the above CodeSpec structure, the server performs the following steps:

  1. Create the Arg structure which is a #arg{} record, this structure is wellknown to all yaws programmers since it's the main mechanism to pass data from the server to the .yaws page.
  2. Item (1) Invoke m1:out(Arg)
  3. Look at the return value from m1:out(Arg) and perform whatever is requested. This typically involves generating some dynamic ehtml code, generate headers or whatever.
  4. Finally jump ahead 100 bytes in the file as a result of processing the first CodeSpec item.
  5. Item (2) Next CodeSpec is just plain data from the file, thus we read 200 bytes from the file (or rather from the cache since the data will be there) and ship to the client.
  6. Item (3) Yet another {mod structure which is handled the same way as Item (1) above except that the erlang module is m2 instead of m1

Another thing that is worth mentioning is that yaws will not ship (write on the socket) data until all content is generated. This is questionable and different from what i.e. PHP does. This makes it possible to generate headers after content has been generated.

Process structure

Before describing the process structure, I need to describe the two most important datastructures in Yaws. The #gconf{} and the #sconf{} records.

Note: To retrieve information from these records, yaws:gconf_*/1 and yaws:sconf_*/1 (e.g. yaws:gconf_id/1 or yaws:sconf_docroot/1) should be used in preference to a direct access to reduce the dependence of your code on it.

The #gconf{} record

This record is used to hold all global state, i.e. state and configuration data which is valid for all Virtual servers. The record looks like:

%% global conf -record(gconf,{ yaws_dir, % topdir of Yaws installation trace, % false | {true,http} | {true,traffic} flags = ?GC_DEF, % boolean flags logdir, ebin_dir = [], src_dir = [], runmods = [], % runmods for entire server keepalive_timeout = 30000, keepalive_maxuses = nolimit, % nolimit or non negative integer max_num_cached_files = 400, max_num_cached_bytes = 1000000, % 1 MEG max_size_cached_file = 8000, max_connections = nolimit, % max number of TCP connections %% Override default connection handler processes spawn options for %% performance/memory tuning. %% [] | [{fullsweep_after,Number}, {min_heap_size, Size}] %% other options such as monitor, link are ignored. process_options = [], large_file_chunk_size = 10240, mnesia_dir = [], log_wrap_size = 1000000, % wrap logs after 1M cache_refresh_secs = 30, % seconds (auto zero when debug) include_dir = [], % list of inc dirs for .yaws files phpexe = "/usr/bin/php-cgi", % cgi capable php executable yaws, % server string id = "default", % string identifying this instance of yaws enable_soap = false, % start yaws_soap_srv iff true %% a list of %% {{Mod, Func}, WsdlFile, Prefix} | {{Mod, Func}, WsdlFile} %% automatically setup in yaws_soap_srv init. soap_srv_mods = [], acceptor_pool_size = 8, % size of acceptor proc pool mime_types_info, % undefined | #mime_types_info{} nslookup_pref = [inet], % [inet | inet6] ysession_mod = yaws_session_server, % storage module for ysession ysession_cookiegen, % ysession cookie generation module ysession_idle_timeout = 2*60*1000, % default 2 minutes ysession_long_timeout = 60*60*1000, % default 1 hour sni = disable % disable | enable | strict }).

The structure is derived from the /etc/yaws/yaws.conf file and is passed around all through the functions in the server.

The #sconf{} record

The next important datastructure is the #sconf{} record. It is used to describe a single virtual server.

Each:

.....

In the /etc/yaws/yaws.conf file corresponds to one #sconf{} record. We have:

%% server conf -record(sconf, { port = 8000, % which port is this server listening to flags = ?SC_DEF, redirect_map=[], % a list of % {Prefix, #url{}, append|noappend} % #url{} can be partially populated rhost, % forced redirect host (+ optional port) rmethod, % forced redirect method docroot, % path to the docs xtra_docroots = [], % if we have additional pseudo docroots listen = [{127,0,0,1}], % bind to this IP, {0,0,0,0} is possible servername = "localhost", % servername is what Host: header is serveralias = [], % Alternate names for this vhost yaws, % server string for this vhost ets, % local store for this server ssl, % undefined | #ssl{} authdirs = [], % [{docroot, [#auth{}]}] partial_post_size = 10240, %% An item in the appmods list can be either of the %% following, this is all due to backwards compat issues. %% 1. an atom - this is the equivalent to {atom, atom} %% 2 . A two tuple {Path, Mod} %% 3 A three tuple {Path, Mod, [ExcludeDir ....]} appmods = [], expires = [], errormod_401 = yaws_outmod, % the default 401 error module errormod_404 = yaws_outmod, % the default 404 error module errormod_crash = yaws_outmod, % use the same module for crashes arg_rewrite_mod = yaws, logger_mod = yaws_log, % access/auth logging module opaque = [], % useful in embedded mode start_mod, % user provided module to be started allowed_scripts = [yaws,php,cgi,fcgi], tilde_allowed_scripts = [], index_files = ["index.yaws", "index.html", "index.php"], revproxy = [], soptions = [{listen_opts, [{backlog, 1024}]}], extra_cgi_vars = [], stats, % raw traffic statistics fcgi_app_server, % FastCGI application server {host,port} php_handler = {cgi, "/usr/bin/php-cgi"}, shaper, deflate_options, % undefined | #deflate{} mime_types_info, % undefined | #mime_types_info{} % if undefined, global config is used dispatch_mod % custom dispatch module }).

Both of these two structures are defined in "yaws.hrl"

Now we're ready to describe the process structure. We have:

Thus, all the different "servers" defined in the configuration file are clumped together in groups. For HTTP (i.e. not HTTPS) servers there can be multiple virtual servers per IP address. Each group is defined by the pair {IpAddr, Port} and they all need to have different server names.

The client will send the server name in the "Host:" header and that header is used to pick a #sconf{} record out of the list of virtual servers for a specific {Ip,Port} pair.

SSL servers are different, we cannot read the headers before we decide which virtual server to choose because the certificate is connected to a server name. Thus, there can only be one HTTPS server per {Ip,Port} pair.

out(A) -> {ssi, "END2",[],[]}.