Description
Back when WebKit considered whether or not to implement Clear-Site-Data, we noted that clearing partitioned data upon receiving that header can be used for cross-site tracking purposes. Since not many others were considering partitioned storage at the time, we never filed issues about it, at least not that I'm aware of.
The attack is about one first party site having control over website data under another first party site.
Imagine site.example registering these 33 domains: haveSetPartitionedData.example and bucket1.example through bucket32.example.
site.example runs script in the first party context on a great many websites. As part of its execution on those sites, it injects 33 invisible iframes for the domains mentioned above.
Let's say site.example is executing its script on news.example. If a cross-site user ID has not yet been planted yet for news.example, the haveSetPartitionedData.example iframe will not have website data yet and communicates to the bucket1.example through bucket32.example iframes to start fresh. The bucket1.example through bucket32.example iframes all store '1' in their partitioned storage and report back to the haveSetPartitionedData.example iframe when they are done. Now the haveSetPartitionedData.example iframe stores the fact that 32 '1's have been stored in the news.example partiton.
Every time the user visits site.example, site.example gets to see its unpartitioned cookies which identifies the user. Let's say it uses a 32-bit ID for the user. It now makes sure to send Clear-Site-Data response headers matching the '0's in the unpartitioned cookie ID for the corresponding bucket domains. For example, let's say the user ID has '0's in bit 4, 6, and 20. Then site.example would make sure website data is cleared for bucket4.example, bucket6.example, and bucket20.example.
Now when the user visits news.example, the haveSetPartitionedData.example's iframe will have website data set and communicates to the bucket1.example through bucket32.example iframes to report their '1's and '0's (no website data means '0') to the site.example script on news.example.
Voilà, cross-site user ID established.
Only accepting Clear-Site-Data from the current first party website would mitigate this attack but not fix it. Further, if this attack is combined with browser/device fingerprinting, it only needs to add enough cross-site bits to reach ≈32 bits in total.
Activity
wanderview commentedon Jun 12, 2020
A couple questions.
It seems to me if the server can know to send a clear-site-data header for an iframe request it could know to send a cookie header.
Edit: Or know to respond to an XHR with equivalent state.
johnwilander commentedon Jun 12, 2020
When the user is on site.example as first party website, it makes three requests:
… to which those servers respond with a Clear-Site-Data header to set zeroes for bit 4, 6, and 20 in all partitions at once.
I'm assuming that cookies and website data is partitioned. That's the premise.
When the user is on site.example, site.example can only affect website data for itself and third parties in its partiton. However, if Clear-Site-Data clears data in other partitions, site.example can affect data cross-site, which is why this can be turned into a cross-site tracking vector.
wanderview commentedon Jun 12, 2020
So this is link decoration then; albeit with only one bit of entropy.
Yes, but the server could still respond with a cookie when it sees your link decoration. The cookie would be stored in the partition cookie jar. Then the cookie state could be queried the same way you propose above (I assume with postMessage). It doesn't seem like clear-site-data is needed at all in this case?
I'm sorry, but I don't understand. Above you had the iframes using link decoration to get the header added in their own partitioned context. I don't see where clear-site-data across partitions is coming in?
I do agree clear-site-data affecting across partitions would be an information leak, but is that spec'd or implemented anywhere?
wanderview commentedon Jun 12, 2020
@mkruisselbrink explained to me that the link decoration is on XHR subresource requests. The issue makes more sense to me now. Sorry for my confusion.
It does seem clear-site-data should not cross partition boundaries.
wanderview commentedon Jun 12, 2020
FWIW, I am told chrome does not honor clear-site-data on 3rd party subresource requests today. It seems the spec does support it, though.
othermaciej commentedon Jun 12, 2020
No, it has nothing to do with link decoration. The URL on bucket4.example.com can be anything, and can be a fixed value. The point is that you establish 32 1-bit values which can be read in third-party context from all partitions.
I don't think your understanding of the attack matches what John is outlining. It's not link decoration. John happened to use a URL with a '?' in it, but that doesn't have to be the case.
That's not what is happening. Let me explain a slightly simpler version in more detail. Imagine each bucketN.example supports three URLs:
bucket1.example/read
--> returns an observably different result depending on whether a cookie is set (in the current partition); this doesn't need an iframe, it can be an image that's selectively either 1x1 or 2x2.bucket1.example/set
--> responds with aCookie
header, thus setting the cookie in the current partition only (since this is in a context of partitioning).bucket1.example/clear-all
--> responds with aClear-Site-Data
header, thus clearing its cookie in all partitions.Now, imagine
social.example
wants to abuse servers supporting these operations to link user ID across sites without the user's consent. Let's saysocial.example
is a very popular first party visit, and is also embedded in an iframe on many sites.User visits
news.example/article
, which embedssocial.example/widget
in an iframe. The iframe checks for aSocial-User-ID
cookie, which would read fromsocial.example
's partition undernews.example
. If it's set, then it already has the user ID, and user identity has been linked cross site. Game over. So let's say it's not. Then it loads resourcesbucket1.example/read
throughbucket32.example/read
. Are they all 0 or all 1? If not, then use that as the user ID, and save in theSocial-User-ID
cookie. If they were all 0, user ID is not yet set in this partition, so loadbucket1.example/set
throughbucket32.example/set
. Now the bits are all 1.Later, the user visits
social.example
directly, where they are logged in.social.example
retrieves a 32-bit user ID from a cookie. For all bits N in that user ID that are 0, it loadsbucketN.example.clear-all
. Because that operation clears in all partitions, it's now made the bits as read from thenews.example
partition reflect the bits of the user ID.On the next visit to
news.example
, let's saynews.example/video
, there's anothersocial.example/widget
embedded in an iframe. It follows the same process as before. Now it sees a user ID that's not all 0 (never visited this site before) or all 1 (haven't yet been back tosocial.example
as first party). So it assembles the bits and saves the user ID in theSocial-User-ID
cookie in thenews.example
partition. User identity has now been linked across sites, without the need for any collusion beyond an iframe embed.In summary, a
Clear-Site-Data
header that affects all partitions allows state to be broadcast into all partitions with some setup, and thus enables passive cross-site tracking. There is no link decoration! There was never a direct link fromnews.example
tosocial.example
in my example above. All the loads use fully generic URLs that do not contain a user ID.annevk commentedon Jun 14, 2020
Having read this thread, I'm missing an explanation for:
johnwilander commentedon Jun 14, 2020
The attacker would have to navigate the user to or open popups for on average 16 bucket domains to set the zeroes in those partitions. 16 because it’s half of 32 in the 32 bit user ID.
For tracking at scale, this would have to be done continuously, for instance once a day or week to set the zeroes in the partitions of any new websites the user has visited.
annevk commentedon Jun 14, 2020
Is that assuming that it would also clear the partitioned data of that origin?
johnwilander commentedon Jun 14, 2020
Right, that is the issue. If Clear-Site-Data clears for all partitions, or can do so, it opens up for this attack.
jyasskin commentedon Jun 14, 2020
Therefore,
Clear-Site-Data
must operate eitherRight? I suspect (2) is roughly right, so that having
foo.com
sendClear-Site-Data
will also clear all the partitioned storage for iframes nested inside it.Is there any disagreement about the desired state for this, or is it just that specs need to be updated to use the terminology about keying that this Work Item hasn't yet added to the Storage spec?
othermaciej commentedon Jun 14, 2020
I'm not aware of disagreement on making a change, it just needs to be specified (with appropriate tests, ideally). Storage spec does not yet provide the right infrastructure for this, but the Clear Site Data spec does not currently have a dependency on Storage.
It seems right to me that either (1) or (2) from #11 (comment) would avoid this vulnerability.
Note also: there's a proposal to add an API that does something similar to the Storage Living Standard, care must be taken to avoid the vulnerability in that case as well.
annevk commentedon Jun 15, 2020
The Storage Standard will take over part of the definition of
Clear-Site-Data
. The plan to deal with partitioning there is through the storage key. If we go with 2 above that might require some awkward lookups though as you'd have to go through all the keys, but nothing prose can't handle. (And I think we want partitioned and non-partitioned data to be next to each other without some kind of hierarchical relationship between them to avoid subtle leaks, even though sometimes it might make sense to perform hierarchical operations on them.)whatwg/storage#88 discusses how
Clear-Site-Data
might end up working. My idea was also that if that replaces with an empty box, we could use the same setup for migrating to non-partitioned data by replacing with non-partitioned data.28 remaining items