Privacy and the Google/Apple Exposure Notification System

by Steve Marx on September 3, 2020

Earlier this year, Google and Apple teamed up to create the Exposure Notification System (ENS)¹, a digital contact tracing system for the COVID-19 pandemic. When someone who uses ENS is diagnosed with COVID-19, the system notifies other users who have had recent contact with the sick person.

Amazingly, this is done while preserving participants’ privacy.

In this post, I’d like to reinvent the ENS protocol step by step. This is a good way to appreciate the system’s design and understand its privacy-preserving properties.

Tracking contact without location or identity

One easy way to find out who had contact with whom is to log everybody’s location on a server. If Alice and Bob were both at the same location at the same time, then when Alice is diagnosed with COVID-19, Bob should be notified. This type of system is what a lot of people imagine when they first hear about digital contact tracing.

Fortunately for our privacy, location and identity are both unnecessary. For Bob to be notified that he was exposed, the system just needs to know that he came in contact somewhere with someone who had the disease.

The idea behind ENS is to use cellphones as proxies for humans. If two cellphones are close together, that’s a good indication that their owners are also close together. The phones themselves can notice their proximity using Bluetooth Low Energy (BLE), a wireless technology for local communication. If two phones can communicate via BLE, they’re close together. The signal strength can be used to guess how close.

As long as the phones broadcast a randomly chosen device ID and not the real identity of the user, this also eliminates the need to tie identities to the contact events.

At this point, I can propose a bad solution for exposure notification. To be clear, this is not how ENS works, but it will serve as a starting point for discussion:

Every phone broadcasts a device ID.
When a phone observes another phone’s broadcast, it sends a timestamped event to the server noting the two device IDs in contact.
When a user is diagnosed with COVID-19, they send their device ID to the server.
The server sends exposure notifications to all devices that had contact with the infected user’s device ID (during, say, the past 14 days).

Pseudonymity isn’t anonymity

The above proposal uses pseudonyms in the form of the device IDs. Pseudonyms are helpful in terms of privacy, but they still have major drawbacks:

Pseudonyms still allow tracking. For example, a grocery store might use a BLE beacon to keep track of device movement throughout the store.² An app on your phone might use this data to offer you targeted ads, even if it doesn’t know your true identity.
Pseudonyms can be stripped away. A BLE beacon at the cash register could be used to associate your pseudonymous device ID with the name on your credit card. Now all the collected data for your device ID can be associated with your real identity.

This is actually already a problem for cellphones, so they use rotating MAC addresses when they broadcast advertising packets. These addresses are only around for about ten minutes at a time, so they can’t be used to correlate a user’s activity over time.³

Ephemeral identities

I can improve on my straw man design by adopting the same solution as OS developers already use. Instead of a persistent device ID, I’ll use a random ID that changes every ten minutes. This gives users something a lot closer to true anonymity. If a user of such a system walked through a grocery store, they’d look to BLE beacons like a completely different person by the time they reached the cash register.

Unfortunately, this change breaks the earlier design. When someone is diagnosed with COVID-19, the server needs to notify everyone who had contact with that person. But given the new ephemeral identities, there’s no way for the server to know who to notify.

This actually presents an opportunity to do even better in terms of privacy. It’s possible to make this all work without telling the server about contact events at all. Here’s a new proposal, which is pretty close to how ENS actually works:

Every phone broadcasts an ephemeral ID, which is randomly chosen and changes every 10 minutes. Phones remember their past ephemeral IDs for 14 days.
When a phone sees another phone’s ID, it records that ID locally, also for 14 days.
When someone is diagnosed with COVID-19, they upload their recent ephemeral IDs to a server, which makes these publicly available.
Every phone downloads those IDs each day and compares them with the IDs it remembers seeing. If there’s a match, the user is notified of potential exposure.

A pragmatic privacy tradeoff

Until this point, I’ve focused exclusively on the system’s privacy properties, in an effort to motivate the overall design. But the actual ENS protocol differs slightly from what I’ve described, and that difference has a material effect on privacy. To understand this final detail, it’s necessary to do some practical analysis.

Let’s do some arithmetic

The last step of the current design is that every phone downloads every ephemeral ID for infected people. Ephemeral IDs change every 10 minutes, so there are 6 IDs per hour. Under the assumption that someone gets a diagnosis within 10 days of their earliest infection date, I’ll assume each infected user uploads 10 days worth of ephemeral IDs. That’s 6 IDs per hour × 24 hours per day × 10 days = 1,440 IDs.

So how much data does each phone need to download per day? At the time I’m writing this, in the US, we’ve seen an average of about 40,000 cases per day over the past week.⁴ It’s easy to break this up by region or state, though, so I’ll arbitrarily say that each person needs to care about 1,000 cases per day.

1,000 cases per day × 1,440 IDs per case = 1.44 million IDs to download each day. ENS uses 16-byte IDs. That size is a compromise between keeping IDs as small as possible while giving a low probability of collisions. 1.44 million IDs × 16 bytes per ID = 23MB of data per day.

The tradeoff

That’s a lot of data, especially for people without consistent access to WiFi. There’s not much I can do about the number of new cases per day or the number of hours in a day⁵, but I can change the number of ephemeral IDs that have to be uploaded.

This is where the privacy tradeoff comes into play. What if ephemeral IDs only changed once per day? Then infected individuals would just upload a single 16-byte ID. That would cut the data by a factor of 144, bringing us down to a much more manageable 160KB of data per day.

10-minute ephemeral IDs are better for privacy, but 1-day ephemeral IDs are good for data transfer. Fortunately, there’s a way to get (almost) the best of both worlds.

Having our cake and eating it too

In ENS, each device picks a new random number every day, called a Temporary Exposure Key (TEK). It uses that key to generate 144 Rolling Proximity Identifiers (RPIs) by simply encrypting the timestamp at each 10-minute interval throughout the day.⁶

Importantly, if you knew the TEK from a given day, you could recompute all the RPIs. This is how we get to have our cake and eat it too. The RPIs are 10-minute ephemeral keys, but the TEKs are one-per-day. The grocery store trying to track you when you walk around the store sees new IDs every 10 minutes, but if someone gets COVID-19, they only need to upload one key per day.

So far, it’s a win-win, but note that this means the infected person is retroactively giving up some of their privacy. Once an infected user uploads their TEK for a given day, anyone can correlate all of that user’s RPIs for that day.

This is an altruistic action by the infected user, who is giving up some of their privacy in order to help protect others.

Possible attacks

I’ll wrap up by describing a couple possible attacks and how they’re mitigated (or not) by ENS.

Spam

What would happen in the system I’ve described if someone just uploaded a bunch of made up TEKs? Everyone would have to download them all, derive RPIs from them, and check whether they’d recorded contact with those IDs. This would be a very effective denial of service attack.

It’s also possible for someone to run around the city getting close to as many people as possible and then claim that they’re sick by uploading their TEK for the day. This would result in a large number of false alarm notifications, causing a bunch of unnecessary quarantining and/or reducing trust in the system.

ENS deals with this by requiring a trusted entity to authorize each upload. The server rejects uploads that don’t have a digital signature from a preregistered public health authority.

Spoofing

The following attack has no mitigation that I’m aware of.

Any device can spoof any other by just sending out its RPI. This means you can cause contacts to be recorded that never happened.

For example, you could set up a BLE beacon in two different cities. When one beacon sees an RPI, it transmits it to the other beacon, which then broadcasts the same RPI. This would create something of a wormhole effect: every device that passed one location would also appear to pass through the other location. When someone inevitably got infected, spurious exposure notifications would get generated.

Setting up more beacons/repeaters would increase the scope of the attack. Fortunately, this requires physical access, so the bar is a bit higher than the (preventable) spam discussed in the previous section.

Summary

The Google/Apple Exposure Notification System is designed well with privacy in mind.
There’s a privacy tradeoff to keep the data transfer manageable.
This tradeoff means that infected users who upload their Temporary Exposure Keys are giving up a small amount of privacy.

Privacy and the Google/Apple Exposure Notification System

Tracking contact without location or identity

Pseudonymity isn’t anonymity

Ephemeral identities

A pragmatic privacy tradeoff

Let’s do some arithmetic

The tradeoff

Having our cake and eating it too

Possible attacks

Spam

Spoofing

Summary

Further reading

Me. In your inbox?

Admit it. You're intrigued.

Related posts

Convergent Encryption and Why No One Uses It

Cracking BIP39 Seed Phrases

BIP39: Mnemonics for Recording Long Keys

TOTP: How Most 2FA Apps Work