-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Street suffixes #13
Comments
Hi @shardsofblue , thanks this is an interesting idea! I should have some time soon to think about this and work on it some. I'll keep you posted, thanks again! |
Hi @shardsofblue , I messed around with this some today. I pushed initial commits to branch address-dev. Here's a quick demo: x <- c(
"John Smith Nulla St. Mankato Mississippi 96522",
"John Smith Nulla street Mankato Mississippi 96522",
"John Smith Nulla Rd. Mankato Mississippi 96522",
"John Smith Nulla Road Mankato Mississippi 96522"
)
refinr::key_collision_merge(x)
The original code would not make any edits to any of the strings. For now, the updated code will operate on strings If you have a chance to try it out, I'd love to get feedback. Let me know what you think. Thanks! |
That's looking good! Thanks for looking into adding this feature! I'd suggest also adding Highway/Hwy, Court/Ct, Lane/Ln, Circle/Cir, Parkway/Pkwy. I tried it with your demo and got the same results, but when I tried it on a test data frame from my own data, I am not seeing results. Am I using it incorrectly?
|
Hello, thanks for testing and the feedback. Great suggestion to add So with the example that you gave, none of the address strings will end up being grouped together and merged, as none of them are similar enough. For example, |
Ah of course, when you put it that way it makes perfect sense. I was not using it as intended. I was expecting it to turn all variations of |
Yep you got it, everything you said is correct 👍 I updated the branch to cover more cases, here's an example: x <- c(
"John Smith Nulla St., Mankato Mississippi 96522",
"John Smith Nulla street Mankato Mississippi 96522",
"John Smith Nulla Rd. Mankato Mississippi 96522",
"John Smith Nulla Road, Mankato Mississippi 96522",
"John Smith Nulla BLVD. Mankato Mississippi 96522",
"John Smith Nulla Boulevard Mankato Mississippi 96522",
"John Smith Nulla hwy., Mankato Mississippi 96522",
"John Smith Nulla HWY Mankato Mississippi 96522",
"John Smith Nulla highway Mankato Mississippi 96522",
"John Smith Nulla highWay, Mankato Mississippi 96522",
"John Smith Nulla circle, Mankato Mississippi 96522",
"John Smith Nulla cir. Mankato Mississippi 96522",
"John Smith Nulla ct Mankato Mississippi 96522",
"John Smith Nulla couRt Mankato Mississippi 96522",
"John Smith Nulla ln Mankato Mississippi 96522",
"John Smith Nulla lane, Mankato Mississippi 96522",
"John Smith Nulla pkwy Mankato Mississippi 96522",
"John Smith Nulla parkway Mankato Mississippi 96522"
)
refinr::key_collision_merge(x)
I will clean the edits up some, merge to master, and at some point soon I will send the edits to CRAN. Thanks again for the great idea! Also, this is super random, but I just started getting involved with a non-profit that's focused on criminal justice reform based in Richmond, VA. The founder has a big need for VA court case data; we talked about web scraping options, but I noticed that you have done a few different analytical deep dives that used data from http://virginiacourtdata.org/ . I brought the resource to the founder's attention and we wasn't aware of it, so I might try to procure the de-anonymized data from the website on behalf of the non-profit. If I do that, I for sure plan on reading through your data prep documentation, but would you mind if I also reached out to you if I have questions about the data cleaning and data processing steps that you used? Thanks! |
Thanks for implementing this! I'm sure it will significantly speed up the cleaning process for addresses. I'll pass notice of this update on to the Slack I'm in for data journos. As for your request, the VA court data was some of the first data processing I ever worked on, so my work there is messy at best. But you're quite welcome to contact me if I can be of help. I published my prep process notes here. GitHub's notification system seems somewhat unreliable, so feel free to use my email: rready at umd dot edu. |
Awesome, glad the feature will be helpful to you, and thanks again for opening an issue about it. At some point this weekend I will update the README / Vignette / Docs, then merge to master. Cool, thank you so much for being willing to help, I appreciate it! |
Would it be reasonable to create something similar to the bus_suffix function, but for common street suffixes, such as
"avenue," "Ave.", "ave"
,"street", "st.", "St."
, and so on? Avenues, Streets, Boulevards, etc. would have to remain distinct from one another, but it would be helpful for the package to be insensitive to common variations within each type.The text was updated successfully, but these errors were encountered: