Clear-Site-Data for partitioned storage can be used for cross-site tracking #11

Open

Clear-Site-Data for partitioned storage can be used for cross-site tracking#11

Back when WebKit considered whether or not to implement Clear-Site-Data, we noted that clearing partitioned data upon receiving that header can be used for cross-site tracking purposes. Since not many others were considering partitioned storage at the time, we never filed issues about it, at least not that I'm aware of.

The attack is about one first party site having control over website data under another first party site.

Imagine site.example registering these 33 domains: haveSetPartitionedData.example and bucket1.example through bucket32.example.

site.example runs script in the first party context on a great many websites. As part of its execution on those sites, it injects 33 invisible iframes for the domains mentioned above.

Let's say site.example is executing its script on news.example. If a cross-site user ID has not yet been planted yet for news.example, the haveSetPartitionedData.example iframe will not have website data yet and communicates to the bucket1.example through bucket32.example iframes to start fresh. The bucket1.example through bucket32.example iframes all store '1' in their partitioned storage and report back to the haveSetPartitionedData.example iframe when they are done. Now the haveSetPartitionedData.example iframe stores the fact that 32 '1's have been stored in the news.example partiton.

Every time the user visits site.example, site.example gets to see its unpartitioned cookies which identifies the user. Let's say it uses a 32-bit ID for the user. It now makes sure to send Clear-Site-Data response headers matching the '0's in the unpartitioned cookie ID for the corresponding bucket domains. For example, let's say the user ID has '0's in bit 4, 6, and 20. Then site.example would make sure website data is cleared for bucket4.example, bucket6.example, and bucket20.example.

Now when the user visits news.example, the haveSetPartitionedData.example's iframe will have website data set and communicates to the bucket1.example through bucket32.example iframes to report their '1's and '0's (no website data means '0') to the site.example script on news.example.

Voilà, cross-site user ID established.

Only accepting Clear-Site-Data from the current first party website would mitigate this attack but not fix it. Further, if this attack is combined with browser/device fingerprinting, it only needs to add enough cross-site bits to reach ≈32 bits in total.

johnwilander

mentioned this

on Jun 12, 2020

Do service/shared workers and BroadcastChannel deserve a special strategy? #9

wanderview

A couple questions.

How does the server know which client to send the clear-site-data header to for each bucket frame domain?
How is this different than the server communicating state to each bucket frame through cookies?

It seems to me if the server can know to send a clear-site-data header for an iframe request it could know to send a cookie header.

Edit: Or know to respond to an XHR with equivalent state.

johnwilander

Author

A couple questions.

How does the server know which client to send the clear-site-data header to for each bucket frame domain?

When the user is on site.example as first party website, it makes three requests:

bucket4.example/?command=respondWithClearSiteData
bucket6.example/?command=respondWithClearSiteData
bucket20.example/?command=respondWithClearSiteData

… to which those servers respond with a Clear-Site-Data header to set zeroes for bit 4, 6, and 20 in all partitions at once.

How is this different than the server communicating state to each bucket frame through cookies?

I'm assuming that cookies and website data is partitioned. That's the premise.

It seems to me if the server can know to send a clear-site-data header for an iframe request it could know to send a cookie header.

Edit: Or know to respond to an XHR with equivalent state.

When the user is on site.example, site.example can only affect website data for itself and third parties in its partiton. However, if Clear-Site-Data clears data in other partitions, site.example can affect data cross-site, which is why this can be turned into a cross-site tracking vector.

wanderview

bucket4.example/?command=respondWithClearSiteData

So this is link decoration then; albeit with only one bit of entropy.

I'm assuming that cookies and website data is partitioned. That's the premise.

Yes, but the server could still respond with a cookie when it sees your link decoration. The cookie would be stored in the partition cookie jar. Then the cookie state could be queried the same way you propose above (I assume with postMessage). It doesn't seem like clear-site-data is needed at all in this case?

However, if Clear-Site-Data clears data in other partitions, site.example can affect data cross-site, which is why this can be turned into a cross-site tracking vector.

I'm sorry, but I don't understand. Above you had the iframes using link decoration to get the header added in their own partitioned context. I don't see where clear-site-data across partitions is coming in?

I do agree clear-site-data affecting across partitions would be an information leak, but is that spec'd or implemented anywhere?

wanderview

@mkruisselbrink explained to me that the link decoration is on XHR subresource requests. The issue makes more sense to me now. Sorry for my confusion.

It does seem clear-site-data should not cross partition boundaries.

wanderview

FWIW, I am told chrome does not honor clear-site-data on 3rd party subresource requests today. It seems the spec does support it, though.

othermaciej

bucket4.example/?command=respondWithClearSiteData

So this is link decoration then; albeit with only one bit of entropy.

No, it has nothing to do with link decoration. The URL on bucket4.example.com can be anything, and can be a fixed value. The point is that you establish 32 1-bit values which can be read in third-party context from all partitions.

I'm assuming that cookies and website data is partitioned. That's the premise.

Yes, but the server could still respond with a cookie when it sees your link decoration. The cookie would be stored in the partition cookie jar. Then the cookie state could be queried the same way you propose above (I assume with postMessage). It doesn't seem like clear-site-data is needed at all in this case?

I don't think your understanding of the attack matches what John is outlining. It's not link decoration. John happened to use a URL with a '?' in it, but that doesn't have to be the case.

However, if Clear-Site-Data clears data in other partitions, site.example can affect data cross-site, which is why this can be turned into a cross-site tracking vector.

I'm sorry, but I don't understand. Above you had the iframes using link decoration to get the header added in their own partitioned context. I don't see where clear-site-data across partitions is coming in?

That's not what is happening. Let me explain a slightly simpler version in more detail. Imagine each bucketN.example supports three URLs:

bucket1.example/read --> returns an observably different result depending on whether a cookie is set (in the current partition); this doesn't need an iframe, it can be an image that's selectively either 1x1 or 2x2.
bucket1.example/set --> responds with a Cookie header, thus setting the cookie in the current partition only (since this is in a context of partitioning).
bucket1.example/clear-all --> responds with a Clear-Site-Data header, thus clearing its cookie in all partitions.

Now, imagine social.example wants to abuse servers supporting these operations to link user ID across sites without the user's consent. Let's say social.example is a very popular first party visit, and is also embedded in an iframe on many sites.

User visits news.example/article, which embeds social.example/widget in an iframe. The iframe checks for a Social-User-ID cookie, which would read from social.example's partition under news.example. If it's set, then it already has the user ID, and user identity has been linked cross site. Game over. So let's say it's not. Then it loads resources bucket1.example/read through bucket32.example/read. Are they all 0 or all 1? If not, then use that as the user ID, and save in the Social-User-ID cookie. If they were all 0, user ID is not yet set in this partition, so load bucket1.example/set through bucket32.example/set. Now the bits are all 1.

Later, the user visits social.example directly, where they are logged in. social.example retrieves a 32-bit user ID from a cookie. For all bits N in that user ID that are 0, it loads bucketN.example.clear-all. Because that operation clears in all partitions, it's now made the bits as read from the news.example partition reflect the bits of the user ID.

On the next visit to news.example, let's say news.example/video, there's another social.example/widget embedded in an iframe. It follows the same process as before. Now it sees a user ID that's not all 0 (never visited this site before) or all 1 (haven't yet been back to social.example as first party). So it assembles the bits and saves the user ID in the Social-User-ID cookie in the news.example partition. User identity has now been linked across sites, without the need for any collusion beyond an iframe embed.

In summary, a Clear-Site-Data header that affects all partitions allows state to be broadcast into all partitions with some setup, and thus enables passive cross-site tracking. There is no link decoration! There was never a direct link from news.example to social.example in my example above. All the loads use fully generic URLs that do not contain a user ID.

I do agree clear-site-data affecting across partitions would be an information leak, but is that spec'd or implemented anywhere?

annevk

Collaborator

Having read this thread, I'm missing an explanation for:

Only accepting Clear-Site-Data from the current first party website would mitigate this attack but not fix it.

johnwilander

Author

Having read this thread, I'm missing an explanation for:

Only accepting Clear-Site-Data from the current first party website would mitigate this attack but not fix it.

The attacker would have to navigate the user to or open popups for on average 16 bucket domains to set the zeroes in those partitions. 16 because it’s half of 32 in the 32 bit user ID.

For tracking at scale, this would have to be done continuously, for instance once a day or week to set the zeroes in the partitions of any new websites the user has visited.

annevk

Collaborator

Is that assuming that it would also clear the partitioned data of that origin?

johnwilander

Author

Is that assuming that it would also clear the partitioned data of that origin?

Right, that is the issue. If Clear-Site-Data clears for all partitions, or can do so, it opens up for this attack.

jyasskin

Contributor

Therefore, Clear-Site-Data must operate either

within a single storage shelf or
across the shelves keyed by the same top-level site (with varying second-level keys).

Right? I suspect (2) is roughly right, so that having foo.com send Clear-Site-Data will also clear all the partitioned storage for iframes nested inside it.

Is there any disagreement about the desired state for this, or is it just that specs need to be updated to use the terminology about keying that this Work Item hasn't yet added to the Storage spec?

othermaciej

mentioned this

on Jun 14, 2020

Add API to allow origin to purge all storage whatwg/storage#4

othermaciej

I'm not aware of disagreement on making a change, it just needs to be specified (with appropriate tests, ideally). Storage spec does not yet provide the right infrastructure for this, but the Clear Site Data spec does not currently have a dependency on Storage.

It seems right to me that either (1) or (2) from #11 (comment) would avoid this vulnerability.

Note also: there's a proposal to add an API that does something similar to the Storage Living Standard, care must be taken to avoid the vulnerability in that case as well.

annevk

Collaborator

The Storage Standard will take over part of the definition of Clear-Site-Data. The plan to deal with partitioning there is through the storage key. If we go with 2 above that might require some awkward lookups though as you'd have to go through all the keys, but nothing prose can't handle. (And I think we want partitioned and non-partitioned data to be next to each other without some kind of hierarchical relationship between them to avoid subtle leaks, even though sometimes it might make sense to perform hierarchical operations on them.)

whatwg/storage#88 discusses how Clear-Site-Data might end up working. My idea was also that if that replaces with an empty box, we could use the same setup for migrating to non-partitioned data by replacing with non-partitioned data.

28 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear-Site-Data for partitioned storage can be used for cross-site tracking #11

28 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clear-Site-Data for partitioned storage can be used for cross-site tracking #11

Description

Activity

wanderview commented on Jun 12, 2020

johnwilander commented on Jun 12, 2020

wanderview commented on Jun 12, 2020

wanderview commented on Jun 12, 2020

wanderview commented on Jun 12, 2020

othermaciej commented on Jun 12, 2020

annevk commented on Jun 14, 2020

johnwilander commented on Jun 14, 2020

annevk commented on Jun 14, 2020

johnwilander commented on Jun 14, 2020

jyasskin commented on Jun 14, 2020

othermaciej commented on Jun 14, 2020

annevk commented on Jun 15, 2020

28 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions