forked from twisted/twisted
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathipc10paper.html
1568 lines (1365 loc) · 70.8 KB
/
ipc10paper.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>The Twisted Network Framework</title>
</head>
<body>
<p><em><strong>Note:</strong> This document is relevant for the
version of Twisted that were current previous to <a
href="http://www.python10.com">IPC10</a>. Even at the time of
its release, <a href="ipc10errata.html">there were errata
issued</a> to make it current. It is remaining unaltered for
historical purposes but it is no longer accurate.</em></p>
<h1>The Twisted Network Framework</h1>
<h6>Moshe Zadka <a
href="mailto:m@moshez.org">m@moshez.org</a></h6>
<h6>Glyph Lefkowitz <a
href="mailto:glyph@twistedmatrix.com">glyph@twistedmatrix.com</a></h6>
<h3>Abstract</h3>
<p>Twisted is a framework for writing asynchronous,
event-driven networked programs in Python -- both clients and
servers. In addition to abstractions for low-level system calls
like <code>select(2)</code> and <code>socket(2)</code>, it also
includes a large number of utility functions and classes, which
make writing new servers easy. Twisted includes support for
popular network protocols like HTTP and SMTP, support for GUI
frameworks like <code>GTK+</code>/<code>GNOME</code> and
<code>Tk</code> and many other classes designed to make network
programs easy. Whenever possible, Twisted uses Python's
introspection facilities to save the client programmer as much
work as possible. Even though Twisted is still work in
progress, it is already usable for production systems -- it can
be used to bring up a Web server, a mail server or an IRC
server in a matter of minutes, and require almost no
configuration.</p>
<p><strong>Keywords:</strong> internet, network, framework,
event-based, asynchronous</p>
<h3>Introduction</h3>
<p>Python lends itself to writing frameworks. Python has a
simple class model, which facilitates inheritance. It has
dynamic typing, which means code needs to assume less. Python
also has built-in memory management, which means application
code does not need to track ownership. Thus, when writing a new
application, a programmer often finds himself writing a
framework to make writing this kind of application easier.
Twisted evolved from the need to write high-performance
interoperable servers in Python, and making them easy to use
(and difficult to use incorrectly).</p>
<p>There are three ways to write network programs:</p>
<ol>
<li>Handle each connection in a separate process</li>
<li>Handle each connection in a separate thread</li>
<li>Use non-blocking system calls to handle all connections
in one thread.</li>
</ol>
<p>When dealing with many connections in one thread, the
scheduling is the responsibility of the application, not the
operating system, and is usually implemented by calling a
registered function when each connection is ready to for
reading or writing -- commonly known as event-driven, or
callback-based, programming.</p>
<p>Since multi-threaded programming is often tricky, even with
high level abstractions, and since forking Python processes has
many disadvantages, like Python's reference counting not
playing well with copy-on-write and problems with shared state,
it was felt the best option was an event-driven framework. A
benefit of such approach is that by letting other event-driven
frameworks take over the main loop, server and client code are
essentially the same - making peer-to-peer a reality. While
Twisted includes its own event loop, Twisted can already
interoperate with <code>GTK+</code>'s and <code>Tk</code>'s
mainloops, as well as provide an emulation of event-based I/O
for Jython (specific support for the Swing toolkit is planned).
Client code is never aware of the loop it is running under, as
long as it is using Twisted's interface for registering for
interesting events.</p>
<p>Some examples of programs which were written using the
Twisted framework are <code>twisted.web</code> (a web server),
<code>twisted.mail</code> (a mail server, supporting both SMTP
and POP3, as well as relaying), <code>twisted.words</code> (a
chat application supporting integration between a variety of IM
protocols, like IRC, AOL Instant Messenger's TOC and
Perspective Broker, a remote-object protocol native to
Twisted), <code>im</code> (an instant messenger which connects
to twisted.words) and <code>faucet</code> (a GUI client for the
<code>twisted.reality</code> interactive-fiction framework).
Twisted can be useful for any network or GUI application
written in Python.</p>
<p>However, event-driven programming still contains some tricky
aspects. As each callback must be finished as soon as possible,
it is not possible to keep persistent state in function-local
variables. In addition, some programming techniques, such as
recursion, are impossible to use. Event-driven programming has
a reputation of being hard to use due to the frequent need to
write state machines. Twisted was built with the assumption
that with the right library, event-driven programming is easier
then multi-threaded programming. Twisted aims to be that
library.</p>
<p>Twisted includes both high-level and low-level support for
protocols. Most protocol implementation by twisted are in a
package which tries to implement "mechanisms, not policy". On
top of those implementations, Twisted includes usable
implementations of those protocols: for example, connecting the
abstract HTTP protocol handler to a concrete resource-tree, or
connecting the abstract mail protocol handler to deliver mail
to maildirs according to domains. Twisted tries to come with as
much functionality as possible out of the box, while not
constraining a programmer to a choice between using a
possibly-inappropriate class and rewriting the non-interesting
parts himself.</p>
<p>Twisted also includes Perspective Broker, a simple
remote-object framework, which allows Twisted servers to be
divided into separate processes as the end deployer (rather
then the original programmer) finds most convenient. This
allows, for example, Twisted web servers to pass requests for
specific URLs with co-operating servers so permissions are
granted according to the need of the specific application,
instead of being forced into giving all the applications all
permissions. The co-operation is truly symmetrical, although
typical deployments (such as the one which the Twisted web site
itself uses) use a master/slave relationship.</p>
<p>Twisted is not alone in the niche of a Python network
framework. One of the better known frameworks is Medusa. Medusa
is used, among other things, as Zope's native server serving
HTTP, FTP and other protocols. However, Medusa is no longer
under active development, and the Twisted development team had
a number of goals which would necessitate a rewrite of large
portions of Medusa. Twisted seperates protocols from the
underlying transport layer. This seperation has the advantages
of resuability (for example, using the same clients and servers
over SSL) and testability (because it is easy to test the
protocol with a much lighter test harness) among others.
Twisted also has a very flexible main-loop which can
interoperate with third-party main-loops, making it usable in
GUI programs too.</p>
<h3>Complementing Python</h3>
<p>Python comes out of the box with "batteries included".
However, it seems that many Python projects rewrite some basic
parts: logging to files, parsing options and high level
interfaces to reflection. When the Twisted project found itself
rewriting those, it moved them into a separate subpackage,
which does not depend on the rest of the twisted framework.
Hopefully, people will use <code>twisted.python</code> more and
solve interesting problems instead. Indeed, it is one of
Twisted's goals to serve as a repository for useful Python
code.</p>
<p>One useful module is <code>twisted.python.reflect</code>,
which has methods like <code>prefixedMethods</code>, which
returns all methods with a specific prefix. Even though some
modules in Python itself implement such functionality (notably,
<code>urllib2</code>), they do not expose it as a function
usable by outside code. Another useful module is
<code>twisted.python.hook</code>, which can add pre-hooks and
post-hooks to methods in classes.</p>
<blockquote>
<pre class="python">
# Add all method names beginning with opt_ to the given
# dictionary. This cannot be done with dir(), since
# it does not search in superclasses
dct = {}
reflect.addMethodNamesToDict(self.__class__, dct, "opt_")
# Sum up all lists, in the given class and superclasses,
# which have a given name. This gives us "different class
# semantics": attributes do not override, but rather append
flags = []
reflect.accumulateClassList(self.__class__, 'optFlags', flags)
# Add lock-acquire and lock-release to all methods which
# are not multi-thread safe
for methodName in klass.synchronized:
hook.addPre(klass, methodName, _synchPre)
hook.addPost(klass, methodName, _synchPost)
</pre>
<h6>Listing 1: Using <code>twisted.python.reflect</code> and
<code>twisted.python.hook</code></h6>
</blockquote>
<p>The <code>twisted.python</code> subpackage also contains a
high-level interface to getopt which supplies as much power as
plain getopt while avoiding long
<code>if</code>/<code>elif</code> chains and making many common
cases easier to use. It uses the reflection interfaces in
<code>twisted.python.reflect</code> to find which options the
class is interested in, and constructs the argument to
<code>getopt</code>. Since in the common case options' values
are just saved in instance attributes, it is very easy to
indicate interest in such options. However, for the cases
custom code needs to be run for an option (for example,
counting how many <code>-v</code> options were given to
indicate verbosity level), it will call a method which is named
correctly.</p>
<blockquote>
<pre class="python">
class ServerOptions(usage.Options):
# Those are (short and long) options which
# have no argument. The corresponding attribute
# will be true iff this option was given
optFlags = [['nodaemon','n'],
['profile','p'],
['threaded','t'],
['quiet','q'],
['no_save','o']]
# This are options which require an argument
# The default is used if no such option was given
# Note: since options can only have string arguments,
# putting a non-string here is a reliable way to detect
# whether the option was given
optStrings = [['logfile','l',None],
['file','f','twistd.tap'],
['python','y',''],
['pidfile','','twistd.pid'],
['rundir','d','.']]
# For methods which can be called multiple times
# or have other unusual semantics, a method will be called
# Twisted assumes that the option needs an argument if and only if
# the method is defined to accept an argument.
def opt_plugin(self, pkgname):
pkg = __import__(pkgname)
self.python = os.path.join(os.path.dirname(
os.path.abspath(pkg.__file__)), 'config.tac')
# Most long options based on methods are aliased to short
# options. If there is only one letter, Twisted knows it is a short
# option, so it is "-g", not "--g"
opt_g = opt_plugin
try:
config = ServerOptions()
config.parseOptions()
except usage.error, ue:
print "%s: %s" % (sys.argv[0], ue)
sys.exit(1)
</pre>
<h6>Listing 2: <code>twistd</code>'s Usage Code</h6>
</blockquote>
<p>Unlike <code>getopt</code>, Twisted has a useful abstraction
for the non-option arguments: they are passed as arguments to
the <code>parsedArgs</code> method. This means too many
arguments, or too few, will cause a usage error, which will be
flagged. If an unknown number of arguments is desired,
explicitly using a tuple catch-all argument will work.</p>
<h3>Configuration</h3>
<p>The formats of configuration files have shown two visible
trends over the years. On the one hand, more and more
programmability has been added, until sometimes they become a
new language. The extreme end of this trend is using a regular
programming language, such as Python, as the configuration
language. On the other hand, some configuration files became
more and more machine editable, until they become a miniature
database formates. The extreme end of that trend is using a
generic database tool.</p>
<p>Both trends stem from the same rationale -- the need to use
a powerful general purpose tool instead of hacking domain
specific languages. Domain specific languages are usually
ad-hoc and not well designed, having neither the power of
general purpose languages nor the predictable machine editable
format of generic databases.</p>
<p>Twisted combines these two trends. It can read the
configuration either from a Python file, or from a pickled
file. To some degree, it integrates the approaches by
auto-pickling state on shutdown, so the configuration files can
migrate from Python into pickles. Currently, there is no way to
go back from pickles to equivalent Python source, although it
is planned for the future. As a proof of concept, the RPG
framework Twisted Reality already has facilities for creating
Python source which evaluates into a given Python object.</p>
<blockquote>
<pre class="python">
from twisted.internet import main
from twisted.web import proxy, server
site = server.Site(proxy.ReverseProxyResource('www.yahoo.com', 80, '/'))
application = main.Application('web-proxy')
application.listenOn(8080, site)
</pre>
<h6>Listing 3: The configuration file for a reverse web
proxy</h6>
</blockquote>
<p>Twisted's main program, <code>twistd</code>, can receive
either a pickled <code>twisted.internet.main.Application</code>
or a Python file which defines a variable called
<code>application</code>. The application can be saved at any
time by calling its <code>save</code> method, which can take an
optional argument to save to a different file name. It would be
fairly easy, for example, to have a Twisted server which saves
the application every few seconds to a file whose name depends
on the time. Usually, however, one settles for the default
behavior which saves to a <code>shutdown</code> file. Then, if
the shutdown configuration proves suitable, the regular pickle
is replaced by the shutdown file. Hence, on the fly
configuration changes, regardless of complexity, can always
persist.</p>
<p>There are several client/server protocols which let a
suitably privileged user to access to application variable and
change it on the fly. The first, and least common denominator,
is telnet. The administrator can telnet into twisted, and issue
Python statements to her heart's content. For example, one can
add ports to listen on to the application, reconfigure the web
servers and various other ways by simple accessing
<code>__main__.application</code>. Some proof of concepts for a
simple suite of command-line utilities to control a Twisted
application were written, including commands which allow an
administrator to shut down the server or save the current state
to a tap file. These are especially useful on Microsoft
Windows(tm) platforms, where the normal UNIX way of
communicating shutdown requests via signals are less
reliable.</p>
<p>If reconfiguration on the fly is not necessary, Python
itself can be used as the configuration editor. Loading the
application is as simple as unpickling it, and saving it is
done by calling its <code>save</code> method. It is quite easy
to add more services or change existing ones from the Python
interactive mode.</p>
<p>A more sophisticated way to reconfigure the application on
the fly is via the manhole service. Manhole is a client/server
protocol based on top of Perspective Broker, Twisted's
translucent remote-object protocol which will be covered later.
Manhole has a graphical client called <code>gtkmanhole</code>
which can access the server and change its state. Since Twisted
is modular, it is possible to write more services for user
friendly configuration. For example, through-the-web
configuration is planned for several services, notably
mail.</p>
<p>For cases where a third party wants to distribute both the
code for a server and a ready to run configuration file, there
is the plugin configuration. Philosophically similar to the
<code>--python</code> option to <code>twistd</code>, it
simplifies the distribution process. A plugin is an archive
which is ready to be unpacked into the Python module path. In
order to keep a clean tree, <code>twistd</code> extends the
module path with some Twisted-specific paths, like the
directory <code>TwistedPlugins</code> in the user's home
directory. When a plugin is unpacked, it should be a Python
package which includes, alongside <code>__init__.py</code> a
file named <code>config.tac</code>. This file should define a
variable named <code>application</code>, in a similar way to
files loaded with <code>--python</code>. The plugin way of
distributing configurations is meant to reduce the temptation
to put large amount of codes inside the configuration file
itself.</p>
<p>Putting class and function definition inside the
configuration files would make the persistent servers which are
auto-generated on shutdown useless, since they would not have
access to the classes and functions defined inside the
configuration file. Thus, the plugin method is intended so
classes and functions can still be in regular, importable,
Python modules, but still allow third parties distribute
powerful configurations. Plugins are used by some of the
Twisted Reality virtual worlds.</p>
<h3>Ports, Protocol and Protocol Factories</h3>
<p><code>Port</code> is the Twisted class which represents a
socket listening on a port. Currently, twisted supports both
internet and unix-domain sockets, and there are SSL classes
with identical interface. A <code>Port</code> is only
responsible for handling the transfer layer. It calls
<code>accept</code> on the socket, checks that it actually
wants to deal with the connection and asks its factory for a
protocol. The factory is usually a subclass of
<code>twisted.protocols.protocol.Factory</code>, and its most
important method is <code>buildProtocol</code>. This should
return something that adheres to the protocol interface, and is
usually a subclass of
<code>twisted.protocols.protocol.Protocol</code>.</p>
<blockquote>
<pre class="python">
from twisted.protocols import protocol
from twisted.internet import main, tcp
class Echo(protocol.Protocol):
def dataReceived(self, data):
self.transport.write(data)
factory = protocol.Factory()
factory.protocol = Echo
port = tcp.Port(8000, factory)
app = main.Application("echo")
app.addPort(port)
app.run()
</pre>
<h6>Listing 4: A Simple Twisted Application</h6>
</blockquote>
<p>The factory is responsible for two tasks: creating new
protocols, and keeping global configuration and state. Since
the factory builds the new protocols, it usually makes sure the
protocols have a reference to it. This allows protocols to
access, and change, the configuration. Keeping state
information in the factory is the primary reason for keeping an
abstraction layer between ports and protocols. Examples of
configuration information is the root directory of a web server
or the user database of a telnet server. Note that it is
possible to use the same factory in two different Ports. This
can be used to run the same server bound to several different
addresses but not to all of them, or to run the same server on
a TCP socket and a UNIX domain sockets.</p>
<p>A protocol begins and ends its life with
<code>connectionMade</code> and <code>connectionLost</code>;
both are called with no arguments. <code>connectionMade</code>
is called when a connection is first established. By then, the
protocol has a <code>transport</code> attribute. The
<code>transport</code> attribute is a <code>Transport</code> -
it supports <code>write</code> and <code>loseConnection</code>.
Both these methods never block: <code>write</code> actually
buffers data which will be written only when the transport is
signalled ready to for writing, and <code>loseConnection</code>
marks the transport for closing as soon as there is no buffered
data. Note that transports do <em>not</em> have a
<code>read</code> method: data arrives when it arrives, and the
protocol must be ready for its <code>dataReceived</code>
method, or its <code>connectionLost</code> method, to be
called. The transport also supports a <code>getPeer</code>
method, which returns parameters about the other side of the
transport. For TCP sockets, this includes the remote IP and
port.</p>
<blockquote>
<pre class="python">
# A tcp port-forwarder
# A StupidProtocol sends all data it gets to its peer.
# A StupidProtocolServer connects to the host/port,
# and initializes the client connection to be its peer
# and itself to be the client's peer
from twisted.protocols import protocol
class StupidProtocol(protocol.Protocol):
def connectionLost(self): self.peer.loseConnection();del self.peer
def dataReceived(self, data): self.peer.write(data)
class StupidProtocolServer(StupidProtocol):
def connectionMade(self):
clientProtocol = StupidProtocol()
clientProtocol.peer = self.transport
self.peer = tcp.Client(self.factory.host, self.factory.port,
clientProtocol)
# Create a factory which creates StupidProtocolServers, and
# has the configuration information they assume
def makeStupidFactory(host, port):
factory = protocol.Factory()
factory.host, factory.port = host, port
factory.protocol = StupidProtocolServer
return factory
</pre>
<h6>Listing 5: TCP forwarder code</h6>
</blockquote>
<h3>The Event Loop</h3>
<p>While Twisted has the ability to let other event loops take
over for integration with GUI toolkits, it usually uses its own
event loop. The event loop code uses global variables to
maintain interested readers and writers, and uses Python's
<code>select()</code> function, which can accept any object
which has a <code>fileno()</code> method, not only raw file
descriptors. Objects can use the event loop interface to
indicate interest in either reading to or writing from a given
file descriptor. In addition, for those cases where time-based
events are needed (for example, queue flushing or periodic POP3
downloads), Twisted has a mechanism for repeating events at
known delays. While far from being real-time, this is enough
for most programs' needs.</p>
<h3>Going Higher Level</h3>
<p>Unfortunately, handling arbitrary data chunks is a hard way
to code a server. This is why twisted has many classes sitting
in submodules of the twisted.protocols package which give
higher level interface to the data. For line oriented
protocols, <code>LineReceiver</code> translates the low-level
<code>dataReceived</code> events into <code>lineReceived</code>
events. However, the first naive implementation of
<code>LineReceiver</code> proved to be too simple. Protocols
like HTTP/1.1 or Freenet have packets which begin with header
lines that include length information, and then byte streams.
<code>LineReceiver</code> was rewritten to have a simple
interface for switching at the protocol layer between
line-oriented parts and byte-stream parts.</p>
<p>Another format which is gathering popularity is Dan J.
Bernstein's netstring format. This format keeps ASCII text as
ASCII, but allows arbitrary bytes (including nulls and
newlines) to be passed freely. However, netstrings were never
designed to be used in event-based protocols where over-reading
is unavoidable. Twisted makes sure no user will have to deal
with the subtle problems handling netstrings in event-driven
programs by providing <code>NetstringReceiver</code>.</p>
<p>For even higher levels, there are the protocol-specific
protocol classes. These translate low-level chunks into
high-level events such as "HTTP request received" (for web
servers), "approve destination address" (for mail servers) or
"get user information" (for finger servers). Many RFCs have
been thus implemented for Twisted (at latest count, more then
12 RFCs have been implemented). One of Twisted's goals is to be
a repository of event-driven implementations for various
protocols in Python.</p>
<blockquote>
<pre class="python">
class DomainSMTP(SMTP):
def validateTo(self, helo, destination):
try:
user, domain = string.split(destination, '@', 1)
except ValueError:
return 0
if not self.factory.domains.has_key(domain):
return 0
if not self.factory.domains[domain].exists(user, domain, self):
return 0
return 1
def handleMessage(self, helo, origin, recipients, message):
# No need to check for existence -- only recipients which
# we approved at the validateTo stage are passed here
for recipient in recipients:
user, domain = string.split(recipient, '@', 1)
self.factory.domains[domain].saveMessage(origin, user, message,
domain)
</pre>
<h6>Listing 6: Implementation of virtual domains using the
SMTP protocol class</h6>
</blockquote>
<p>Copious documentation on writing new protocol abstraction
exists, since this is the largest amount of code written --
much like most operating system code is device drivers. Since
many different protocols have already been implemented, there
are also plenty of examples to draw on. Usually implementing
the client-side of a protocol is particularly challenging,
since protocol designers tend to assume much more state kept on
the client side of a connection then on the server side.</p>
<h3>The <code>twisted.tap</code> Package and
<code>mktap</code></h3>
<p>Since one of Twisted's configuration formats are pickles,
which are tricky to edit by hand, Twisted evolved a framework
for creating such pickles. This framework is contained in the
<code>twisted.tap</code> package and the <code>mktap</code>
script. New servers, or new ways to configure existing servers,
can easily participate in the twisted.tap framework by creating
a <code>twisted.tap</code> submodule.</p>
<p>All <code>twisted.tap</code> submodules must conform to a
rigid interface. The interface defines functions to accept the
command line parameters, and functions to take the processed
command line parameters and add servers to
<code>twisted.main.internet.Application</code>. Existing
<code>twisted.tap</code> submodules use
<code>twisted.python.usage</code>, so the command line format
is consistent between different modules.</p>
<p>The <code>mktap</code> utility gets some generic options,
and then the name of the server to build. It imports a
same-named <code>twisted.tap</code> submodule, and lets it
process the rest of the options and parameters. This makes sure
that the process configuring the <code>main.Application</code>
is agnostic for where it is used. This allowed
<code>mktap</code> to grow the <code>--append</code> option,
which appends to an existing pickle rather then creating a new
one. This option is frequently used to post-add a telnet server
to an application, for net-based on the fly configuration
later.</p>
<p>When running <code>mktap</code> under UNIX, it saves the
user id and group id inside the tap. Then, when feeding this
tap into <code>twistd</code>, it changes to this user/group id
after binding the ports. Such a feature is necessary in any
production-grade server, since ports below 1024 require root
privileges to use on UNIX -- but applications should not run as
root. In case changing to the specified user causes difficulty
in the build environment, it is also possible to give those
arguments to <code>mktap</code> explicitly.</p>
<blockquote>
<pre class="python">
from twisted.internet import tcp, stupidproxy
from twisted.python import usage
usage_message = """
usage: mktap stupid [OPTIONS]
Options are as follows:
--port <#>, -p: set the port number to <#>.
--host <host>, -h: set the host to <host>
--dest_port <#>, -d: set the destination port to <#>
"""
class Options(usage.Options):
optStrings = [["port", "p", 6666],
["host", "h", "localhost"],
["dest_port", "d", 6665]]
def getPorts(app, config):
s = stupidproxy.makeStupidFactory(config.host, int(config.dest_port))
return [(int(config.port), s)]
</pre>
<h6>Listing 7: <code>twisted.tap.stupid</code></h6>
</blockquote>
<p>The <code>twisted.tap</code> framework is one of the reasons
servers can be set up with little knowledge and time. Simply
running <code>mktap</code> with arguments can bring up a web
server, a mail server or an integrated chat server -- with
hardly any need for maintainance. As a working
proof-on-concept, the <code>tap2deb</code> utility exists to
wrap up tap files in Debian packages, which include scripts for
running and stopping the server and interact with
<code>init(8)</code> to make sure servers are automatically run
on start-up. Such programs can also be written to interface
with the Red Hat Package Manager or the FreeBSD package
management systems.</p>
<blockquote>
<pre class="shell">
% mktap --uid 33 --gid 33 web --static /var/www --port 80
% tap2deb -t web.tap -m 'Moshe Zadka <moshez@debian.org>'
% su
password:
# dpkg -i .build/twisted-web_1.0_all.deb
</pre>
<h6>Listing 8: Bringing up a web server on a Debian
system</h6>
</blockquote>
<h3>Multi-thread Support</h3>
<p>Sometimes, threads are unavoidable or hard to avoid. Many
legacy programs which use threads want to use Twisted, and some
vendor APIs have no non-blocking version -- for example, most
database systems' API. Twisted can work with threads, although
it supports only one thread in which the main select loop is
running. It can use other threads to simulate non-blocking API
over a blocking API -- it spawns a thread to call the blocking
API, and when it returns, the thread calls a callback in the
main thread. Threads can call callbacks in the main thread
safely by adding those callbacks to a list of pending events.
When the main thread is between select calls, it searches
through the list of pending events, and executes them. This is
used in the <code>twisted.enterprise</code> package to supply
an event driven interfaces to databases, which uses Python's DB
API.</p>
<p>Twisted tries to optimize for the common case -- no threads.
If there is need for threads, a special call must be made to
inform the <code>twisted.python.threadable</code> module that
threads will be used. This module is implemented differently
depending on whether threads will be used or not. The decision
must be made before importing any modules which use threadable,
and so is usually done in the main application. For example,
<code>twistd</code> has a command line option to initialize
threads.</p>
<p>Twisted also supplies a module which supports a threadpool,
so the common task of implementing non-blocking APIs above
blocking APIs will be both easy and efficient. Threads are kept
in a pool, and dispatch requests are done by threads which are
not working. The pool supports a maximum amount of threads, and
will throw exceptions when there are more requests than
allowable threads.</p>
<p>One of the difficulties about multi-threaded systems is
using locks to avoid race conditions. Twisted uses a mechanism
similar to Java's synchronized methods. A class can declare a
list of methods which cannot safely be called at the same time
from two different threads. A function in threadable then uses
<code>twisted.python.hook</code> to transparently add
lock/unlock around these methods. This allows Twisted classes
to be written without thought about threading, except for one
localized declaration which does not entail any performance
penalty for the single-threaded case.</p>
<h3>Twisted Mail Server</h3>
<p>Mail servers have a history of security flaws. Sendmail is
by now the poster boy of security holes, but no mail servers,
bar maybe qmail, are free of them. Like Dan Bernstein of qmail
fame said, mail cannot be simply turned off -- even the
simplest organization needs a mail server. Since Twisted is
written in a high-level language, many problems which plague
other mail servers, notably buffer overflows, simply do not
exist. Other holes are avoidable with correct design. Twisted
Mail is a project trying to see if it is possible to write a
high quality high performance mail server entirely in
Python.</p>
<p>Twisted Mail is built on the SMTP server and client protocol
classes. While these present a level of abstraction from the
specific SMTP line semantics, they do not contain any message
storage code. The SMTP server class does know how to divide
responsibility between domains. When a message arrives, it
analyzes the recipient's address, tries matching it with one of
the registered domain, and then passes validation of the
address and saving the message to the correct domain, or
refuses to handle the message if it cannot handle the domain.
It is possible to specify a catch-all domain, which will
usually be responsible for relaying mails outwards.</p>
<p>While correct relaying is planned for the future, at the
moment we have only so-called "smarthost" relaying. All e-mail
not recognized by a local domain is relayed to a single outside
upstream server, which is supposed to relay the mail further.
This is the configuration for most home machines, which are
Twisted Mail's current target audience.</p>
<p>Since the people involved in Twisted's development were
reluctant to run code that runs as a super user, or with any
special privileges, it had to be considered how delivery of
mail to users is possible. The solution decided upon was to
have Twisted deliver to its own directory, which should have
very strict permissions, and have users pull the mail using
some remote mail access protocol like POP3. This means only a
user would write to his own mail box, so no security holes in
Twisted would be able to adversely affect a user.</p>
<p>Future plans are to use a Perspective Broker-based service
to hand mail to users to a personal server using a UNIX domain
socket, as well as to add some more conventional delivery
methods, as scary as they may be.</p>
<p>Because the default configuration of Twisted Mail is to be
an integrated POP3/SMTP servers, it is ideally suited for the
so-called POP toaster configuration, where there are a
multitude of virtual users and domains, all using the same IP
address and computer to send and receive mails. It is fairly
easy to configure Twisted as a POP toaster. There are a number
of deployment choices: one can append a telnet server to the
tap for remote configuration, or simple scripts can add and
remove users from the user database. The user database is saved
as a directory, where file names are keys and file contents are
values, so concurrency is not usually a problem.</p>
<blockquote>
<pre class="shell">
% mktap mail -d foobar.com=$HOME/Maildir/ -u postmaster=secret -b \
-p 110 -s 25
% twistd -f mail.tap
</pre>
<h6>Bringing up a simple mail-server</h6>
</blockquote>
<p>Twisted's native mail storage format is Maildir, a format
that requires no locking and is safe and atomic. Twisted
supports a number of standardized extensions to Maildir,
commonly known as Maildir++. Most importantly, it supports
deletion as simply moving to a subfolder named
<code>Trash</code>, so mail is recoverable if accessed through
a protocol which allows multiple folders, like IMAP. However,
Twisted itself currently does not support any such protocol
yet.</p>
<h3>Introducing Perspective Broker</h3>
<h4>All the World's a Game</h4>
<p>Twisted was originally designed to support multi-player
games; a simulated "real world" environment. Experience with
game systems of that type is enlightening as to the nature of
computing on the whole. Almost all services on a computer are
modeled after some simulated real-world activity. For example,
e-"mail", or "document publishing" on the web. Even
"object-oriented" programming is based around the notion that
data structures in a computer simulate some analogous
real-world objects.</p>
<p>All such networked simulations have a few things in common.
They each represent a service provided by software, and there
is usually some object where "global" state is kept. Such a
service must provide an authentication mechanism. Often, there
is a representation of the authenticated user within the
context of the simulation, and there are also objects aside
from the user and the simulation itself that can be
accessed.</p>
<p>For most existing protocols, Twisted provides these
abstractions through <code>twisted.internet.passport</code>.
This is so named because the most important common
functionality it provides is authentication. A simulation
"world" as described above -- such as an e-mail system,
document publishing archive, or online video game -- is
represented by subclass of <code>Service</code>, the
authentication mechanism by an <code>Authorizer</code> (which
is a set of <code>Identities</code>), and the user of the
simulation by a <code>Perspective</code>. Other objects in the
simulation may be represented by arbitrary python objects,
depending upon the implementation of the given protocol.</p>
<p>New problem domains, however, often require new protocols,
and re-implementing these abstractions each time can be
tedious, especially when it's not necessary. Many efforts have
been made in recent years to create generic "remote object" or
"remote procedure call" protocols, but in developing Twisted,
these protocols were found to require too much overhead in
development, be too inefficient at runtime, or both.</p>
<p>Perspective Broker is a new remote-object protocol designed
to be lightweight and impose minimal constraints upon the
development process and use Python's dynamic nature to good
effect, but still relatively efficient in terms of bandwidth
and CPU utilization. <code>twisted.spread.pb</code> serves as a
reference implementation of the protocol, but implementation of
Perspective Broker in other languages is already underway.
<code>spread</code> is the <code>twisted</code> subpackage
dealing with remote calls and objects, and has nothing to do
with the <code>spread</code> toolkit.</p>
<p>Perspective Broker extends
<code>twisted.internet.passport</code>'s abstractions to be
concrete objects rather than design patterns. Rather than
having a <code>Protocol</code> implementation translate between
sequences of bytes and specifically named methods (as in the
other Twisted <code>Protocols</code>), Perspective Broker
defines a direct mapping between network messages and
quasi-arbitrary method calls.</p>
<h3>Translucent, not Transparent</h3>
<p>In a server application where a large number of clients may
be interacting at once, it is not feasible to have an
arbitrarily large number of OS threads blocking and waiting for
remote method calls to return. Additionally, the ability for
any client to call any method of an object would present a
significant security risk. Therefore, rather than attempting to
provide a transparent interface to remote objects,
<code>twisted.spread.pb</code> is "translucent", meaning that
while remote method calls have different semantics than local
ones, the similarities in semantics are mirrored by
similarities in the syntax. Remote method calls impose as
little overhead as possible in terms of volume of code, but "as
little as possible" is unfortunately not "nothing".</p>
<p><code>twisted.spread.pb</code> defines a method naming
standard for each type of remotely accessible object. For
example, if a client requests a method call with an expression
such as <code>myPerspective.doThisAction()</code>, the remote
version of <code>myPerspective</code> would be sent the message
<code>perspective_doThisAction</code>. Depending on the manner
in which an object is accessed, other method prefixes may be
<code>observe_</code>, <code>view_</code>, or
<code>remote_</code>. Any method present on a remotely
accessible object, and named appropriately, is considered to be
published -- since this is accomplished with
<code>getattr</code>, the definition of "present" is not just
limited to methods defined on the class, but instances may have
arbitrary callable objects associated with them as long as the
name is correct -- similarly to normal python objects.</p>
<p>Remote method calls are made on remote reference objects
(instances of <code>pb.RemoteReference</code>) by calling a
method with an appropriate name. However, that call will not
block -- if you need the result from a remote method call, you
pass in one of the two special keyword arguments to that method
-- <code>pbcallback</code> or <code>pberrback</code>.
<code>pbcallback</code> is a callable object which will be
called when the result is available, and <code>pberrback</code>
is a callable object which will be called if there was an
exception thrown either in transmission of the call or on the
remote side.</p>
<p>In the case that neither <code>pberrback</code> or
<code>pbcallback</code> is provided,
<code>twisted.spread.pb</code> will optimize network usage by
not sending confirmations of messages.</p>
<blockquote>
<pre class="python">
# Server Side
class MyObject(pb.Referenceable):
def remote_doIt(self):
return "did it"
# Client Side
...
def myCallback(result):
print result # result will be 'did it'
def myErrback(stacktrace):
print 'oh no, mr. bill!'
print stacktrace
myRemoteReference.doIt(pbcallback=myCallback,
pberrback=myErrback)
</pre>
<h6>Listing 9: A remotely accessible object and accompanying
call</h6>
</blockquote>
<h3>Different Behavior for Different Perspectives</h3>
<p>Considering the problem of remote object access in terms of
a simulation demonstrates a requirement for the knowledge of an
actor with certain actions or requests. Often, when processing
message, it is useful to know who sent it, since different
results may be required depending on the permissions or state
of the caller.</p>
<p>A simple example is a game where certain an object is
invisible, but players with the "Heightened Perception"
enchantment can see it. When answering the question "What
objects are here?" it is important for the room to know who is
asking, to determine which objects they can see. Parallels to
the differences between "administrators" and "users" on an
average multi-user system are obvious.</p>
<p>Perspective Broker is named for the fact that it does not
broker only objects, but views of objects. As a user of the
<code>twisted.spread.pb</code> module, it is quite easy to
determine the caller of a method. All you have to do is
subclass <code>Viewable</code>.</p>
<blockquote>
<pre class="python">
# Server Side
class Greeter(pb.Viewable):
def view_greet(self, actor):
return "Hello %s!\n" % actor.perspectiveName
# Client Side
...
remoteGreeter.greet(pbcallback=sys.stdout.write)
...
</pre>
<h6>Listing 10: An object responding to its calling
perspective</h6>
</blockquote>
Before any arguments sent by the client, the actor
(specifically, the Perspective instance through which this
object was retrieved) will be passed as the first argument to
any <code>view_xxx</code> methods.
<h3>Mechanisms for Sharing State</h3>
<p>In a simulation of any decent complexity, client and server
will wish to share structured data. Perspective Broker provides
a mechanism for both transferring (copying) and sharing
(caching) that state.</p>
<p>Whenever an object is passed as an argument to or returned
from a remote method call, that object is serialized using
<code>twisted.spread.jelly</code>; a serializer similar in some
ways to Python's native <code>pickle</code>. Originally,
<code>pickle</code> itself was going to be used, but there were
several security issues with the <code>pickle</code> code as it
stands. It is on these issues of security that