-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: lowercase emails #1182 #1192
Conversation
6f89b88
to
dc5774c
Compare
Wow, OK, I'm getting lost in a fascinating abyss of Google searching. So, it looks like officially (per RFC 5321), that the local-part of an email address can be case sensitive (with handfuls of caveats based on specific email providers):
https://stackoverflow.com/questions/9807909/are-email-addresses-case-sensitive Not entirely sure what a majority of this means to us. Does HIBP lowercase everything? If I say my email address is If somebody was using a case sensitive email provider and we potentially bulk update their address and convert to lowercase, is it possible they'd no longer get emails from us in the event of a future breach? Do we have stats on the number of users in the database who currently have upper+lowercase characters in their email addresses? |
HIBP does lowercase everything. Hmm ... it's an option to keep mixed-cased plaintext addresses in our DB, and only lowercase-and-hash them during our HIBP API operations. We would still need to update the
That's true, and a good reason to keep mixed-cased plaintext addresses in our DB, while using lowercase-and-hash with HIBP.
I'll look into this. |
dc5774c
to
5ada699
Compare
for #1182: script to lower-case sha1 db records
5ada699
to
e921363
Compare
@@ -171,7 +172,7 @@ const DB = { | |||
// Always add a verification_token value | |||
const verification_token = uuidv4(); | |||
const res = await knex("subscribers") | |||
.insert({ primary_sha1: sha1, primary_email: email, signup_language, primary_verification_token: verification_token, primary_verified: verified }) | |||
.insert({ primary_sha1: getSha1(email.toLowerCase()), primary_email: email, signup_language, primary_verification_token: verification_token, primary_verified: verified }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused here. What's the difference between sha1
(old) vs getSha1(email.toLowerCase())
(new)?
I was thinking that sha1
argument was now unused, but it looks like we still use it above on L155;
await this._getSha1EntryAndDo(sha1, async aEntry => {...})
The only place I could find that calls this async _addEmailHash() {}
method is below on L198;
const emailHash = await this._addEmailHash(getSha1(email), email, signupLanguage, true);
Which feels confusing, because we are now passing getSha1(email)
and then internally doing getSha1(email.toLowerCase())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The objective is to store the sha1 of the lowercase version of the email address.
This code is kinda complex and I'll see if I can clean this up while I'm in here. But if that turns into a rabbit-hole, I'd at least like to keep this getSha1(email.toLowerCase())
fix to the existing bug.
I was able to reproduce the original bug and spot-check this fix by:
So, we can say that:
|
I'm going to merge this so we can test it on https://fx-breach-alerts.herokuapp.com/ |
Superseding #1188 with this WIP more extensive fix.