Leading items
Welcome to the LWN.net Weekly Edition for December 7, 2017
This edition contains the following feature content:
- Who should see Python deprecation warnings?: the Python community reconsiders a 2009 decision to hide deprecation warnings.
- Trying Tryton: the accounting quest continues with a look at Tryton. It didn't go all that well.
- Container IDs for the audit subsystem: a proposal to make container auditing possible.
- Restricting automatic kernel-module loading: an attempt to reduce the kernel's attack surface by restricting automatic module loading.
- A thorough introduction to eBPF: guest author Matt Fleming gives an overview of BPF in the kernel.
- Mozilla releases tools and data for speech recognition: a ground-breaking software and data release from Mozilla.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Who should see Python deprecation warnings?
As all Python developers discover sooner or later, Python is a rapidly evolving language whose community occasionally makes changes that can break existing programs. The switch to Python 3 is the most prominent example, but minor releases can include significant changes as well. The CPython interpreter can emit warnings for upcoming incompatible changes, giving developers time to prepare their code, but those warnings are suppressed and invisible by default. Work is afoot to make them visible, but doing so is not as straightforward as it might seem.
In early November, one sub-thread of a big discussion on preparing for the
Python 3.7 release focused on the await and async
identifiers. They will become keywords in 3.7, meaning that any code using
those names for any other purpose will break. Nick Coghlan observed that Python 3.6 does not warn
about the use of those names, calling it "a fairly major
oversight/bug
". In truth, though, Python 3.6 does emit
warnings in that case — but users rarely see them.
The reason for that comes down to the configuration of a relatively obscure module called warnings. The Python interpreter can generate quite a few warnings in various categories, many of which are likely to be seen as noise by users. The warnings module is used to emit warnings, but it also gives developers a way to bring back some silence by establishing a filter controlling which warnings will actually be printed out to the error stream. The default filter is:
ignore::DeprecationWarning ignore::PendingDeprecationWarning ignore::ImportWarning ignore::BytesWarning ignore::ResourceWarning
The first line filters out DeprecationWarning events, such as the warnings regarding await and async in the 3.6 release. Those warnings were also present in 3.5 as longer-term PendingDeprecationWarnings, which are also invisible by default.
As it happens, things were not always this way. While
PendingDeprecationWarning has always been filtered,
DeprecationWarning was visible by default until the
Python 2.7 and 3.2 releases. In a 2009 thread discussing the
change, Python benevolent dictator for life Guido van Rossum argued
that the deprecation warnings, while being useful to some developers, were
more often just "irrelevant noise
", especially for anybody who
does not actually work on the code in question:
The idea was fairly intensely debated, but silencing those warnings by default won out in the end.
In 2017, it has become evident that this decision has kept some important warnings out of the sight of people who should see them, with the result that many people may face an unpleasant surprise when an upgrade to 3.7 abruptly breaks previously working programs. That was the cue for a new intensely debated thread over whether deprecation warnings should be enabled again.
Neil Schemenauer started things off with a suggestion that the warnings should be re-enabled by default; Coghlan subsequently proposed reverting to the way things were. He went on to say that, if application developers don't want their users to see deprecation warnings, they should disable those warnings explicitly. The invisibility of deprecation warnings has hurt users, he said, and some classes of users in particular:
Application developers are, one hopes, using testing frameworks for their modules, and those frameworks typically turn the warnings back on. But the above-mentioned users will not be performing such testing and will be unnecessarily surprised if Python 3.7 breaks their scripts.
The proposal led to some familiar complaints, though. Van Rossum worried that it would inflict a bunch of warning noise on users of scripts who are in no position to fix them. Antoine Pitrou suggested that small-script developers would be deluged by warnings originating in modules that they import — warnings that, once again, they cannot fix. Over time, the thread seemed to coalesce on the idea that the warnings should not be re-enabled unconditionally; they should, instead, remain disabled for "third-party" code that the current user is unlikely to have control over.
That is a fine idea, with only one little problem: how does one define "third-party code" in this setting? There were a few ideas raised, such as emitting warnings for all code located under the directory containing the initial script, but the search for heuristics threatened to devolve into a set of complex special cases that nobody would be able to remember. So the solution that was written up by Coghlan as PEP 565 was rather simpler: enable DeprecationWarning in the __main__ module, while leaving it suppressed elsewhere. In essence, any code run directly by a user would have warnings enabled, while anything imported from another module would not.
This change will almost certainly not bring deprecation warnings to the attention of everybody who needs to see them. But it will cause them to be emitted for users who are running single-file scripts or typing commands directly at the Python interpreter. That solves what Coghlan sees as the biggest problem: casual Python scripters who will otherwise be unpleasantly surprised when a distribution upgrade causes their scripts to fail. It is a partial solution that appears to be better than the status quo.
Van Rossum agreed with that assessment. He
acknowledged that it's "not going to make everyone happy
", but
said that it's an improvement and that he intends to approve it in the near
future in the absence of more objections. Naturally, such a pronouncement
brought out some objections, but none of them would appear to have the
strength to keep PEP 565 from being a part of the Python 3.7
release. The 3.7 interpreter's usurpation of await and async
may be an unpleasant surprise to some users, but hopefully future
changes will be less surprising.
Trying Tryton
The quest to find a free-software replacement for the QuickBooks accounting tool continues. In this episode, your editor does his best to put Tryton through its paces. Running Tryton proved to be a trying experience, though; this would not appear to be the accounting tool we are searching for.
Tryton is a Python 3 application distributed under the GPLv3 license. Its
home page mentions that it is based on PostgreSQL, but there is support for
MySQL and SQLite as well. Tryton, it is said, is "a three-tier
high-level general purpose application platform
" that is "the
core base of a complete business solution providing modularity, scalability
and security
". The "core base" part of that claim is relevant:
Tryton may well be a solid base for the creation of a small-business
accounting system, but it is not, out of the box, such a system itself.
Running Tryton
The Tryton documentation is not especially friendly to the first-time user. The installation instructions suggest going with what one's distribution provides. One can see why; following the links for a source installation leads to a lengthy directory listing with dozens of independent tarballs. There is a Mercurial repository out there, but one has to search for it and there is no documentation on how to build or install from a copy of the repository.
Your editor opted for the Fedora Tryton packages — of which there are 54 to choose from. Tryton is broken up into a lot of modules, so there is naturally a package for each. Nobody has documented this, but getting the Fedora packages running requires creating a PostgreSQL database and user, editing the trytond.conf configuration file to point there, adding the tryton group to ones account, and running the trytond-admin application to initialize the database. Once that is done, the tryton application will consent to run and put up a simple window.
In the process of figuring this out, it became clear that the Tryton developers have not put a huge amount of effort into error handling. The usual response when something goes wrong is a Python traceback, which tends to not be particularly helpful.
Getting the tryton application to use one's local database requires messing around with "profiles", even though the configuration file specifying the database setup was passed on the application's command line. Things have to be just right or access simply fails to work. Once that obstacle was passed, the result was a general interface describing "records" of various types. The accounting module was installed and provided its own record types. A basic chart of accounts was set up. But there was nothing resembling an interface to do even basic things like creating a bank account or entering a transaction. Your editor tried installing more modules (all of them, actually) to get more functionality, but the result was an application that wouldn't run at all — it died with a traceback due to apparently missing Tryton module dependencies. Some of the Fedora module packages simply don't work, in other words.
As it happens, the version of Tryton packaged with Fedora 27 is 4.0, which is somewhat behind the current release (4.6). It seems reasonable to believe that a more recent release might yield better results. The openSUSE Tumbleweed distribution packages 4.2 instead, but it never proved possible to get past the profile screen with those packages installed. One might plausibly claim that support for Tryton is not the highest-priority objective for some distributors, at least.
It is also alleged to be possible to install Tryton directly from the Python Package Index using pip. Your newly hopeful editor duly created a virtualenv and populated it with a set of packages, but the 4.6 tryton application would not even start. It is, it would seem, still tied to the GTK+ 2 toolkit, which is not all that well supported on current distributions, especially for a Python 3 application.
Moving on
There is little doubt that somebody with greater skills and patience could find a way to make a current Tryton release work on a current Linux distribution. The result would likely be gratifying in a number of ways; Tryton appears to have a well-designed and well-documented base that one could build a good accounting application on top of. Integration with a business's other processes (one of the key criteria in this search) would seem to be relatively straightforward. But even an easily installed, perfectly working Tryton would fall far short of what is needed here.
The point is this: it's a rare small-business owner who feels the urge to build an accounting system on top of anything. Accounting is a task that needs to be done, not an objective in its own right. Any accounting system that arrives as a box of small parts with "some assembly required" written on the outside does not meet the needs of this kind of user. For all its faults, Intuit understood that when it created QuickBooks. The developers behind the other systems reviewed so far (GnuCash and Odoo) also understand that.
Chances are, the Tryton developers understand that too, but creating an easily usable small-business accounting system would appear not to be at the top of their to-do list. There are a number of free-software business-management systems available. Many of these, your editor has long believed, are developed primarily as platforms for consultants. A system that is highly capable, but which is complex, minimally documented, and in need of a lot of setup work suits that business model well.
Such a system is also, of course, simply easier to implement and maintain. Creating an interactive accounting system that is usable by people with no inherent interest in accounting systems is a difficult task. It is, seemingly, not an itch that many developers feel the need to scratch without some sort of additional incentive. Nobody has the right to criticize developers for this, but the result is predictable: like many types of free software, free accounting systems tend to lack the user-level work needed to make them truly competitive with proprietary alternatives.
Still, it is not yet time to give up on this search; the list of candidate systems is not yet empty. Stay tuned as the quest to find a free accounting system that can displace the proprietary alternatives continues.
Container IDs for the audit subsystem
Linux containers are something of an amorphous beast, at least with respect to the kernel. There are lots of facilities that the kernel provides (namespaces, control groups, seccomp, and so on) that can be composed by user-space tools into containers of various shapes and colors; the kernel is blissfully unaware of how user space views that composition. But there is interest in having the kernel be more aware of containers and for it to be able to distinguish what user space considers to be a single container. One particular use case for the kernel managing container identifiers is the audit subsystem, which needs unforgeable IDs for containers that can be associated with audit trails.
Back in early October, Richard Guy Briggs posted the second version of his RFC for
kernel container IDs that can be used by the audit subsystem. The first
version was posted in mid-September, but is
not the only proposal out there. David Howells proposed turning containers into full-fledged
kernel objects back in May, but seemingly ran aground on objections that
the proposal "muddies the waters and makes things more
brittle
", in the words of namespaces
maintainer Eric W. Biederman.
Briggs's proposal is focused on the needs of the audit subsystem, rather than trying to solve any larger problem, however. He described some of the problems for the audit subsystem in a 2016 Linux Security Summit talk. In addition, he laid out some of the requirements for container tracking in response to a query from Carlos O'Donell about the first RFC:
- ability to filter unwanted, irrelevant or unimportant messages before they fill queue so important messages don't get lost. This is a certification requirement.
- ability to make security claims about containers, require tracking of actions within those containers to ensure compliance with established security policies.
- ability to route messages from events to relevant audit daemon instance or host audit daemon instance or both, as required or determined by user-initiated rules
As proposed, audit container IDs would be handled as follows. A container orchestration system would register the ID of a container (a 16-byte UUID) by writing to a special file in the /proc directory for the container's initial process. Briggs proposes a new capability (CAP_CONTAINER_ADMIN) that would be required for a process to be able to register a container ID, but no process would be able to change its own container ID even with the capability.
Registering the container ID would associate the process ID (PID) of the first process (in the initial PID namespace) and all of that process's namespaces (using the namespace filesystem device and inode numbers) with the ID in an AUDIT_CONTAINER record that gets logged. The container IDs would then be used in various audit log messages to associate auditable events with the container that performed them. Any child processes would inherit the container ID of their parent so that all of the processes and threads in a container would be associated with its ID. If the first process has already forked or created threads, the registration would either fail or all of the child processes/threads would be associated with the ID; the right course will be determined as part of the RFC and implementation process.
Audit events would be generated for all namespace creation and destruction operations; creation events would be associated with the container ID of the process performing the action, destruction events occur when there are no more references to a namespace, so just the device and inode of the namespace destroyed would be logged. Changes to a process's namespaces would also generate an audit event that records the new and old namespace information.
The new capability for container IDs was one of the first things questioned
about the proposal. Casey Schaufler asked
how there could be a kernel container capability when the RFC clearly
states that the kernel knows nothing about containers. Briggs likened container IDs to login user IDs
and session IDs "that the kernel tracks for the convenience of
userspace
". He suggested that if the CAP_CONTAINER_ADMIN
name was the problem, he would be fine with something like
CAP_AUDIT_CONTAINERID, but that was not the core of Schaufler's complaint:
If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's more than audit behavior you have to define what system security policy you're dealing with in order to pick the right capability.
We get this request pretty regularly. "I need my own capability because I have a niche thing that isn't part of the system security policy but that is important!" Fit the containerID into the system security policy, and if that results in using CAP_SYS_ADMIN, oh well.
There already are two capabilities for the audit subsystem (CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE) but, as Paul Moore explained, neither is quite right to govern the ability to register container IDs:
James Bottomley suggested sidestepping the capability question by making the container ID a write-once attribute; once set, nothing could change it. The idea of nested containers came up several times, though, which would require some way to change these container IDs. Bottomley suggested simply to allow appending to the container ID, so that the hierarchy is inherent in the chain of IDs. Moore agreed that write-once would work for the non-nested case:
But Aleksa Sarai pointed out that nested containers are a fairly common use case, for LXC system containers in particular (which will often have other container runtimes running inside them). Biederman noted that there is not, as yet, a solution for running the audit daemon in containers, so it may be premature to worry about nested container IDs at this point.
Schaufler is concerned that adding an ID for auditing containers is heading down the wrong path. He suggested the ptags Linux Security Module as a way forward; it would allow arbitrary tags with values to be set for a process.
Moore stressed that the effort was not
aimed at a more general mechanism, but simply to address the needs of the
audit subsystem at this point. He said that the ID is meant to be an
"audit container ID
" and not a more general "container
ID
". Using the audit ID for other purposes risks opening up
problems in other areas (such as container migration), so he and Briggs are
attempting to restrict the use cases.
At this point, there is no code on the table, it is purely a discussion on where things should go. Adding a new capability for registering these IDs seems to be a non-starter; the write-once scheme governed by one of the existing audit capabilities seems like it might plausibly pass muster. Though, as Moore said, there seems to be a bigger need here, but more general solutions have so far been hard to come by. Adding IDs willy-nilly may be suboptimal but, until something more general comes along, might just be the right way forward.
Restricting automatic kernel-module loading
The kernel's module mechanism allows the building of a kernel with a wide range of hardware and software support without requiring that all of that code actually be loaded into any given running system. The availability of all of those modules in a typical distributor kernel means that a lot of features are available — but also, potentially, a lot of exploitable bugs. There have been numerous cases where the kernel's automatic module loader has been used to bring buggy code into a running system. An attempt to reduce the kernel's exposure to buggy modules shows how difficult some kinds of hardening work can be.
Module autoloading
There are two ways in which a module can be loaded into the kernel without explicit action on the administrator's part. On most contemporary systems, it happens when hardware is discovered, either by a bus driver (on buses that support discovery) or from an external description like a device tree. Discovery causes an event to be sent to user space, where a daemon like udev applies whatever policies have been configured and loads the appropriate modules. This mechanism is driven by the available hardware and is relatively hard for an attacker to influence.
Within the kernel, though, lurks an older mechanism, in the form of the request_module() function. When a kernel function determines that a needed module is missing, it can call request_module() to send a request to user space to load the module in question. For example, if an application opens a char device with a given major and minor number and no driver exists for those numbers, the char device code will attempt to locate a driver by calling:
request_module("char-major-%d-%d", MAJOR(dev), MINOR(dev));
If a driver module has declared an alias with matching numbers, it will be automatically loaded into the kernel to handle the open request.
There are hundreds of request_module() calls in the kernel. Some are quite specific; one will load the ide-tape module should the user be unfortunate enough to have such a device. Others are more general; there are many calls in the networking subsystem, for example, to locate modules implementing specific network protocols or packet-filtering mechanisms. While the device-specific calls have been mostly supplanted by the udev mechanism, modules for features like network protocols still rely on request_module() for user-transparent automatic loading.
Autoloading makes for convenient system administration, but it can also make for convenient system exploitation. The DCCP protocol vulnerability disclosed in February, for example, is not exploitable if the DCCP module is not loaded in the kernel — which is normally the case, since DCCP has few users. But the autoloading mechanism allows any user to force that module to be loaded simply by creating a DCCP socket. Autoloading thus widens the kernel's attack surface to include anything in a module that unprivileged users can cause to be loaded — and there are a lot of modules in a typical distributor kernel.
Tightening the system
Djalal Harouni has been working on a patch set aimed at reducing the exposure from autoloading; the most recent version was posted on November 27. Harouni's work takes inspiration from the hardening found in the grsecurity patch set, but takes no code from there. In this incarnation (it has changed somewhat over time), it adds a new sysctl knob (/proc/sys/kernel/modules_autoload_mode) that can be used to restrict the kernel's autoloading mechanism. If this knob is set to zero (the default), autoloading works as it does in current kernels. Setting it to one restricts autoloading to processes with specific capabilities: processes with CAP_SYS_MODULE can cause any module to be loaded, while those with CAP_NET_ADMIN can autoload any module whose alias starts with netdev-. Setting this knob to two disables autoloading entirely. Once this value has been raised above zero, it cannot be lowered during the lifetime of the system.
The patch set also implements a per-process flag that could be set with the prctl() system call. This flag (which takes the same values as the global flag) could restrict autoloading for a specific process and all of its descendants without changing module-loading behavior in the system overall.
It is safe to say that this patch set will not be merged in its current
form for a simple reason: Linus Torvalds strongly disliked it. Disabling
autoloading is likely to break a lot of systems, meaning that distributors
will be unwilling to enable this option and it will not see much use.
"
The per-process flag looks like it could be a part of that solution. It
could be used, for example, to restrict autoloading for code running within
a container while leaving the system as a whole unchanged. It is not
uncommon to create a process within a container with the
CAP_NET_ADMIN capability to configure that container's networking
while wanting most of the code running in the container to be unable to
force module loading.
But, Torvalds said, a single flag will
never be able to properly control all of the situations where autoloading
comes into play. Some modules should perhaps always be loadable, while
others may need a specific capability. So he suggested retaining the
request_module_cap() function added by Harouni's patch set (which
performs the load only if a specific capability is present) and using it
more widely. But he did have a couple of changes to request.
The first is that request_module_cap() shouldn't actually block
module loading if the needed capability is absent — at least not
initially. Instead, it should log a message. That will allow a study of
where module autoloading is actually needed that would, with luck,
point out the places where autoloading could be restricted without breaking
existing systems. He also suggested that
the capability check is too simplistic. For example, the
"char-major-" autoload described above only happens if a process
is able to open a device node with the given major and minor numbers. In
such cases, a permission test (the ability to open that special file) has
already been passed and the module should load unconditionally. So there
may need to be other variants of request_module() to describe
settings where capabilities do not apply.
Finally, Torvalds had another idea related
to the idea that the worst bugs tend to lurk in modules that are poorly
maintained at best. The DCCP module mentioned above, for example, is known
to be little used and nearly unmaintained. If the modules that are
well maintained were marked with a special flag, it might be possible to
restrict unprivileged autoloading to those modules only. That would
prevent the autoloading of some of the cruftier modules while not breaking
autoloading
in general. This idea does raise one question that nobody asked, though:
when a module ceases being maintained, who will maintain it well enough to
remove the "well maintained" flag?
In any case, that flag will probably not be added right away, if this proposed plan from Kees Cook holds. He
suggested starting with the request_module_cap() approach with
warnings enabled. The per-process flag would be added for those who can
use it, but the global knob to restrict autoloading would not. Eventually
it might be possible to get rid of unprivileged module loading, but that
will be a goal for the future. The short-term benefit would be better
information about how autoloading is actually used and the per-process
option for administrators who want to tighten things down now.
This conversation highlights one of the fundamental tensions that can be
found around kernel hardening work. Few people are opposed to a more
secure kernel, but things get much more difficult as soon as the hardening
work can break existing systems — and that is often the case.
Security-oriented developers often get frustrated with the kernel
community's resistance to hardening changes with user-visible impacts,
while kernel
developers have little sympathy for changes that will lead to bug reports
and unhappy users. Some of those frustrations surfaced in this discussion,
but most of the developers involved were mostly interested in converging on
a solution that works for everybody involved.
The original Berkeley Packet
Filter (BPF) [PDF] was designed for capturing and filtering network
packets that matched specific rules. Filters are implemented as programs to
be run on a register-based virtual machine.
The ability to run user-supplied programs inside of the kernel proved
to be a useful design decision but other aspects of the original BPF
design didn't hold up so well. For one, the design of the virtual
machine and its instruction set architecture (ISA) were left behind as
modern processors moved to 64-bit registers and invented new
instructions required for multiprocessor systems, like the atomic
exchange-and-add instruction (XADD). BPF's focus on providing a small
number of RISC instructions no longer matched the realities of modern
processors.
So, Alexei Starovoitov introduced the extended
BPF (eBPF)
design to take advantage of
advances in modern hardware. The eBPF virtual machine more closely
resembles contemporary processors, allowing eBPF instructions to be
mapped more closely to the hardware ISA for improved performance.
One of the most notable changes was a move to 64-bit registers and an
increase in the number of registers from two to ten. Since modern
architectures have far more than two registers, this allows parameters
to be passed to functions in eBPF virtual machine registers, just like
on native hardware. Plus, a new BPF_CALL instruction made it possible
to call in-kernel functions cheaply.
The ease of mapping eBPF to native instructions lends itself to
just-in-time compilation, yielding improved performance. The
original
patch that added support for
eBPF in the 3.15 kernel showed that eBPF was up to four times faster on
x86-64 than
the old classic BPF (cBPF) implementation for some network filter
microbenchmarks, and most were 1.5 times faster.
Many architectures support the just-in-time (JIT) compiler (x86-64, SPARC,
PowerPC, ARM, arm64, MIPS, and s390).
Originally, eBPF was only used internally by the kernel and cBPF programs
were translated seamlessly under the hood. But with commit
daedfb22451d in 2014, the eBPF virtual machine was exposed directly to user space.
An eBPF program is "attached" to a designated code path in the kernel.
When the code path is traversed, any attached eBPF programs are
executed. Given its origin, eBPF is especially suited to writing
network programs and it's possible to write programs that attach to a
network socket to filter traffic, to classify
traffic, and to run network classifier actions. It's even possible to modify the settings of an
established network socket with an eBPF program.
The XDP project, in
particular, uses eBPF to do high-performance packet processing by
running eBPF programs at the lowest level of the network stack,
immediately after a packet is received.
Another type of filtering performed by the kernel is restricting which
system calls a process can use. This is done with seccomp BPF.
eBPF is also useful for debugging the kernel and carrying out
performance analysis; programs can be attached to tracepoints,
kprobes, and perf events. Because eBPF programs can access kernel data
structures, developers can write and test new debugging code without
having to recompile the kernel. The implications are obvious for busy
engineers debugging issues on live, running systems. It's even
possible to use eBPF to debug user-space programs by using Userland
Statically Defined Tracepoints.
The power of eBPF flows from two advantages: it's fast and it's safe.
To fully appreciate it, you need to understand how it works.
There are inherent security and stability risks with allowing user-space
code to run inside the kernel. So, a number of checks are
performed on every eBPF program before it is loaded.
The first test ensures that the eBPF program terminates and does not
contain any loops that could cause the kernel to lock up.
This is checked by doing a depth-first search of the program's control
flow graph (CFG). Unreachable instructions are strictly prohibited;
any program that contains unreachable instructions will fail to load.
The second stage is more involved and requires the verifier to
simulate the execution of the eBPF program one instruction at a time.
The virtual machine state is checked before and after the execution of
every instruction to ensure that register and stack state are valid.
Out of bounds jumps are prohibited, as is accessing out-of-range data.
The verifier doesn't need to walk every path in the program, it's
smart enough to know when the current state of the program is a subset
of one it's already checked. Since all previous paths must be valid
(otherwise the program would already have failed to load), the current
path must also be valid. This allows the verifier to "prune" the
current branch and skip its simulation.
The verifier also has a "secure mode" that prohibits pointer
arithmetic. Secure mode is enabled whenever a user without the
CAP_SYS_ADMIN privilege loads an eBPF program. The idea is to make
sure that kernel addresses do not leak to unprivileged users and that
pointers cannot be written to memory.
If secure mode is not enabled, then pointer arithmetic is allowed but only
after additional
checks are performed. For example, all pointer accesses are checked
for type, alignment, and bounds violations.
Registers with uninitialized contents (those that have never been
written to) cannot be read; doing so cause the program load to fail.
The contents of registers R0-R5 are marked as unreadable across
functions calls by storing a special value to catch
any reads of an uninitialized register. Similar checks are done for
reading variables on the stack and to make sure that no instructions write to
the read-only frame-pointer register.
Lastly, the verifier uses the eBPF program type (covered later) to
restrict which kernel functions can be called from eBPF programs and which
data structures can be accessed. Some program types are allowed to
directly access network packet data, for example.
Programs are loaded using the bpf() system
call with the
BPF_PROG_LOAD command. The prototype of the system call is:
The bpf_attr union allows data to be passed between the kernel and
user space; the exact format depends on the cmd argument. The
size
argument gives the size of the bpf_attr union object in bytes.
Commands are available for creating and modifying eBPF maps; maps are the
generic key/value data structure used for
communicating between eBPF programs and the kernel or user space. Additional
commands allow attaching eBPF programs to a control-group directory or socket
file descriptor, iterating over all maps and programs, and pinning eBPF
objects to files
so that they're not destroyed when the process that loaded
them terminates (the latter is used by the tc classifier/action code
so that eBPF programs persist without requiring the loading process to
stay alive).
The full list of commands can be found in the bpf() man
page.
Though there appear to be many different commands, they can be
broken down into three categories: commands for working with eBPF
programs, working with eBPF maps, or commands for working with both
programs and maps (collectively known as objects).
The type of program loaded with BPF_PROG_LOAD dictates four
things:
where the program can be attached,
which in-kernel helper functions the verifier will allow to be called,
whether network packet data can be accessed directly, and the type of
object passed as the first argument to the program. In fact, the
program type essentially defines an API.
New program types have even been created purely to distinguish between
different lists of allowed callable functions
(BPF_PROG_TYPE_CGROUP_SKB versus
BPF_PROG_TYPE_SOCKET_FILTER, for example).
The current set of eBPF program types supported by the kernel is:
As new program types were added, kernel developers discovered a need
to add new data structures too.
The main data structure used by eBPF programs is the eBPF map,
a generic data structure that allows data to be passed back
and forth within the kernel or between the kernel and user space. As
the name "map" implies, data is stored and retrieved using a key.
Maps are created and manipulated using the bpf() system call. When a
map is successfully created, a file descriptor associated with that
map is returned. Maps are normally destroyed by closing the
associated file descriptor.
Each map is defined by four values: a type, a maximum number of elements, a
value size in bytes, and a key size in bytes. There are different map
types and each provides a different behavior and set of tradeoffs:
All maps can be accessed from eBPF or user-space programs using the
bpf_map_lookup_elem() and
bpf_map_update_elem() functions. Some map types, such as socket maps,
work with additional eBPF helper functions that perform special tasks.
Historically, it was necessary to write eBPF assembly by hand and use
the kernel's bpf_asm assembler to generate BPF bytecode.
Fortunately, the LLVM Clang compiler has grown support for an eBPF backend that
compiles C into bytecode. Object files containing this bytecode can
then be directly loaded with the bpf() system call and
BPF_PROG_LOAD command.
You can write your own eBPF program in C by compiling with Clang
using the -march=bpf parameter. There are plenty of eBPF program
examples in the kernel's samples/bpf/
directory; the majority have a "_kern.c" suffix in their file name.
The object file (eBPF bytecode) emitted by Clang needs to be
loaded by a program that runs natively on your machine (these samples
usually have "_user.c" in their filename). To make it easier to write
eBPF programs, the kernel provides the libbpf library, which includes
helper functions for loading programs and creating and manipulating
eBPF objects.
For example, the high-level flow of an eBPF program and user program
using libbpf
might go something like:
However, all of the sample code suffers from one major drawback: you
need to compile your eBPF program from within the kernel source tree.
Luckily, the BCC project was created to solve this problem. It includes
a complete toolchain for writing eBPF programs and loading them
without linking against the kernel source tree.
BCC is covered in the next article in this series; the full set is:
Voice computing has long been a staple of science fiction, but it has
only relatively recently made its way into fairly common mainstream use.
Gadgets like mobile
phones and "smart" home assistant devices (e.g. Amazon Echo, Google Home)
have brought voice-based user interfaces to the masses. The voice
processing for those gadgets relies on various proprietary services "in the
cloud", which generally leaves the free-software world out in the cold.
There have
been FOSS speech-recognition efforts over
the years, but Mozilla's recent
announcement of the release of its voice-recognition code and voice
data set should help further the goal of FOSS voice interfaces.
There are two parts to the release, DeepSpeech, which is a
speech-to-text (STT) engine and model, and Common
Voice, which is a set of voice data that can be used to train
voice-recognition systems. While DeepSpeech is available for those who
simply want to do some kind of STT task, Common Voice is meant for those
who want to create their own voice-recognition system—potentially one that
does even better (or better for certain types of applications) than DeepSpeech.
The DeepSpeech project is based on two papers from Chinese
web-services company Baidu; it uses a neural
network implemented using Google's TensorFlow. As detailed in a blog
post by Reuben Morais, who works in the Machine Learning
Group at Mozilla Research, several data sets were used to train
DeepSpeech, including transcriptions
of TED talks, LibriVox audio books from the LibriSpeech corpus, and data from
Common Voice; two proprietary
data sets were also mentioned, but it is not clear how much of that was
used in the final DeepSpeech model. The goal was to have a word error
rate of less than 10%, which was met; "
The blog post goes into a fair amount of detail that will be of interest to
those who are curious about machine learning. It is clear that doing this
kind of
training is not for the faint of heart (or those with small wallets). It
is a computationally intensive task that takes a fairly sizable amount of
time even using specialized hardware:
We started with a single machine running four Titan X Pascal GPUs, and then
bought another two servers with 8 Titan XPs each. We run the two 8 GPU
machines as a cluster, and the older 4 GPU machine is left independent to
run smaller experiments and test code changes that require more compute
power than our development machines have. This setup is fairly efficient,
and for our larger training runs we can go from zero to a good model in
about a week.
A security option that people can't use without breaking their
system
is pointless
", he said. The
discussion got heated at times, but Torvalds is not opposed to the idea of
reducing the kernel's exposure to autoloaded vulnerabilities. It was just
a matter of finding the right solution.
A thorough introduction to eBPF
In his linux.conf.au
2017 talk [YouTube] on the eBPF in-kernel virtual machine, Brendan Gregg
proclaimed that "super powers have finally come to Linux". Getting
eBPF to that point has been a long road of evolution and design. While
eBPF was originally used for network packet filtering, it turns out
that running user-space code inside a sanity-checking virtual machine
is a powerful tool for kernel developers and production engineers.
Over time, new eBPF users have appeared to take advantage of its
performance and convenience. This article explains how eBPF evolved
how it works, and how it is used in the kernel.
The evolution of eBPF
What can you do with eBPF?
The eBPF in-kernel verifier
The bpf() system call
int bpf(int cmd, union bpf_attr *attr, unsigned int size);
eBPF program types
eBPF data structures
How to write an eBPF program
Mozilla releases tools and data for speech recognition
DeepSpeech
Our word error rate on
LibriSpeech's test-clean set is 6.5%, which not only achieves our initial
goal, but gets us close to human level performance.
"
A "human level" word error rate is 5.83%, according to the Baidu papers, Morais said, so 6.5% is fairly impressive. Running the model has reasonable performance as well, though getting it to the point where it can run on a Raspberry Pi or mobile device is desired.
Common Voice
Because the machine-learning group had trouble in finding quality
data sets for training DeepSpeech, Mozilla started the Common Voice project
to help create one. The first release of data from the project is the
subject of a blog
post from Michael Henretty. The data, which was collected from
volunteers and
has been released into the
public domain, is quite expansive:
"This collection contains nearly 400,000 recordings from 20,000 different
people, resulting in around 500 hours of speech.
" In fact, it is
the second largest publicly available data set; it is also growing daily as
people add and validate new speech samples.
The initial release is only for the English language, but there are plans to support adding speech in other languages. The announcement noted that a diversity of voices is important for Common Voice:
To this end, while we've started with English, we are working hard to ensure that Common Voice will support voice donations in multiple languages beginning in the first half of 2018.
The Common Voice site has links to other voice data sets (also all in English, so far). There is also a validation application on the home page, which allows visitors to listen to a sentence to determine if the speaker accurately pronounced the words. There are no real guidelines for how forgiving one should be (and just simple "Yes" and "No" buttons), but crowdsourcing the validation should help lead to a better data set. In addition, those interested can record their own samples on the web site.
A blog post announcing the Common Voice project (but not the data set, yet) back in July outlines some of the barriers to entry for those wanting to create STT applications. Each of the major browsers has its own API for supporting STT applications; as might be guessed, Mozilla is hoping that browser makers will instead rally around the W3C Web Speech API. That post also envisions a wide array of uses for STT technology:
It's fun to think about where this work might lead. For instance, how might we use silent speech interfaces to keep conversations private? If your phone could read your lips, you could share personal information without the person sitting next to you at a café or on the bus overhearing. Now that's a perk for speakers and listeners alike.
While applications for voice interfaces abound (even if only rarely used by ever-increasing Luddites such as myself), there are, of course, other problems to be solved before we can throw away our keyboard and mouse. Turning speech into text is useful, but there is still a need to derive meaning from the words. Certain applications will be better suited than others to absorb voice input, and Mozilla's projects will help them do so. Text to speech has been around for some time, and there are free-software options for that, but full-on, general purpose voice interfaces will probably need a boost from artificial intelligence—that is likely still a ways out.
Page editor: Jonathan Corbet
Next page:
Brief items>>