Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: lowercase emails #1182 #1192

Merged
merged 3 commits into from
Sep 17, 2019
Merged

WIP: lowercase emails #1182 #1192

merged 3 commits into from
Sep 17, 2019

Conversation

groovecoder
Copy link
Member

Superseding #1188 with this WIP more extensive fix.

@pdehaan
Copy link
Contributor

pdehaan commented Sep 16, 2019

Wow, OK, I'm getting lost in a fascinating abyss of Google searching.
https://stackoverflow.com/questions/9807909/are-email-addresses-case-sensitive

So, it looks like officially (per RFC 5321), that the local-part of an email address can be case sensitive (with handfuls of caveats based on specific email providers):

The local-part of a mailbox MUST BE treated as case sensitive.

https://stackoverflow.com/questions/9807909/are-email-addresses-case-sensitive

Not entirely sure what a majority of this means to us. Does HIBP lowercase everything? If I say my email address is Peter@foo.com, should we keep that mixed case in the database and only lowercase when checking against HIBP API?

If somebody was using a case sensitive email provider and we potentially bulk update their address and convert to lowercase, is it possible they'd no longer get emails from us in the event of a future breach?

Do we have stats on the number of users in the database who currently have upper+lowercase characters in their email addresses?

@groovecoder
Copy link
Member Author

Not entirely sure what a majority of this means to us. Does HIBP lowercase everything? If I say my email address is Peter@foo.com, should we keep that mixed case in the database and only lowercase when checking against HIBP API?

HIBP does lowercase everything. Hmm ... it's an option to keep mixed-cased plaintext addresses in our DB, and only lowercase-and-hash them during our HIBP API operations.

We would still need to update the subscribers.primary_sha1 and email_addresses.sha1 column values to their lowercased-then-hashed equivalents, because that's how we look up addresses during HIBP breach alert callbacks. (Or else add another column for subscribers.lowercased_primary_sha1 and email_addresses.lowercased_sha1)

If somebody was using a case sensitive email provider and we potentially bulk update their address and convert to lowercase, is it possible they'd no longer get emails from us in the event of a future breach?

That's true, and a good reason to keep mixed-cased plaintext addresses in our DB, while using lowercase-and-hash with HIBP.

Do we have stats on the number of users in the database who currently have upper+lowercase characters in their email addresses?

I'll look into this.

for #1182: script to lower-case sha1 db records
@@ -171,7 +172,7 @@ const DB = {
// Always add a verification_token value
const verification_token = uuidv4();
const res = await knex("subscribers")
.insert({ primary_sha1: sha1, primary_email: email, signup_language, primary_verification_token: verification_token, primary_verified: verified })
.insert({ primary_sha1: getSha1(email.toLowerCase()), primary_email: email, signup_language, primary_verification_token: verification_token, primary_verified: verified })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here. What's the difference between sha1 (old) vs getSha1(email.toLowerCase()) (new)?
I was thinking that sha1 argument was now unused, but it looks like we still use it above on L155;

await this._getSha1EntryAndDo(sha1, async aEntry => {...})

The only place I could find that calls this async _addEmailHash() {} method is below on L198;

const emailHash = await this._addEmailHash(getSha1(email), email, signupLanguage, true);

Which feels confusing, because we are now passing getSha1(email) and then internally doing getSha1(email.toLowerCase()).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objective is to store the sha1 of the lowercase version of the email address.

This code is kinda complex and I'll see if I can clean this up while I'm in here. But if that turns into a rabbit-hole, I'd at least like to keep this getSha1(email.toLowerCase()) fix to the existing bug.

@groovecoder
Copy link
Member Author

I was able to reproduce the original bug and spot-check this fix by:

  1. On master, subscribe an FxA account with Test4@mailinator.com
  2. Visit http://localhost:6060/user/dashboard
    • Notice the user dashboard shows 0 results 🐛
  3. Add email address Test5@mailinator.com
  4. Visit http://localhost:6060/user/dashboard
    • Notice that user dashboard still shows 0 results (i.e., 0 for Test4 and 0 for Test5) 🐛
  5. Send breach alert callback for test4@mailinator.com (i.e., how the address is normalized in HIBP. curl http://localhost:6060/hibp/notify -H 'Content-Type: application/json' -H 'Authorization: Bearer unsafe-default-token-for-dev' -d '{"breachName": "Adobe", "hashPrefix": "4fc3f7", "hashSuffixes": ["00ab2e4ac440c5c576a8fe7d090881f6ff"]}'
    • See 0 notified subscribers in the output 🐛
  6. Check out this branch of the code
  7. Run node scripts/node scripts/lowercase-all-records.js to update the subscribers and email_addresses sha1 hashes to their lower-case equivalents
  8. Visit http://localhost:6060/user/dashboard
    • See 5 breaches for Test4 and 9 breaches for Test5 ✅ 🎉
  9. Send breach alert callback for test4@mailinator.com
    • See 1 notified subscriber in the output ✅ 🎉

So, we can say that:

  • The website bug will be fixed as soon as this code is deployed and running on production
  • The breach alerts will be fixed after the scripts/lowercase-all-records.js is run on production

@groovecoder
Copy link
Member Author

I'm going to merge this so we can test it on https://fx-breach-alerts.herokuapp.com/

@groovecoder groovecoder merged commit 54d4fa7 into master Sep 17, 2019
@groovecoder groovecoder deleted the lowercase-emails-1182 branch September 17, 2019 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants