Using reweight.names in fastlink() returns only completely NA rows #62

brittlh · 2022-07-20T15:17:00Z

I've run the fastLink function both with and without the reweight.names option to ensure the data is matched without issue otherwise.

Code:

fastLink(dfA = dfA, dfB = dfB, varnames = c("first", "last", "company"), stringdist.match = c("first", "last", "company"), stringdist.method = "lv", return.df = TRUE, reweight.names = TRUE, firstname.field = "first", dedupe.matches = FALSE, verbose = TRUE)

The matched data output includes NA cases; each field for each case is "NA":

Any idea what's gone wrong here? Thank you for looking into this.

The text was updated successfully, but these errors were encountered:

tedenamorado · 2022-07-20T15:44:25Z

Hi,

Your code looks OK. Do you happen to have a reproducible example you could share with us? More than happy to take a look.

All my best,

Ted

brittlh · 2022-07-21T16:23:52Z

I wasn't able to create a reproducible scaled-down example, which led me to taking a SRS of the two datasets (10% of each) I'm working with to try again. This time, I received 18 rows back, of which 8 were NA and 10 were match rows. Is it possible the issue is linked to the size of data sets? (dfA has about 1k rows, dfB about 220k).

tedenamorado · 2022-08-04T00:04:51Z

Hi,

Are there NAs in the name variable?

All my best,

Ted

brittlh · 2022-08-12T22:44:00Z

Ted,

Did the check, no NAs. There were 2 "" blank strings. Once I filtered out for testing, I reran fastLink and got the same result as I described above.

Appreciate your help. I'm going to keep looking into this in my spare time and see if any other data anomalies catch my attention that might trigger this issue.

aalexandersson · 2022-08-19T12:50:42Z

Disclaimer: I am a regular fastLink user, not a fastLink developer.

Is the scaled-down dataset dfA about 1K rows or about 100 rows? Do the read in datasets look fine to you? Approximately how much missingness is there? How many exact matches are there? Can you show the linkage patterns for the 18 returned rows? No/Little overlap could be the cause...

Anders

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using reweight.names in fastlink() returns only completely NA rows #62

Using reweight.names in fastlink() returns only completely NA rows #62

brittlh commented Jul 20, 2022

tedenamorado commented Jul 20, 2022

brittlh commented Jul 21, 2022

tedenamorado commented Aug 4, 2022

brittlh commented Aug 12, 2022

aalexandersson commented Aug 19, 2022

Using reweight.names in fastlink() returns only completely NA rows #62

Using reweight.names in fastlink() returns only completely NA rows #62

Comments

brittlh commented Jul 20, 2022

tedenamorado commented Jul 20, 2022

brittlh commented Jul 21, 2022

tedenamorado commented Aug 4, 2022

brittlh commented Aug 12, 2022

aalexandersson commented Aug 19, 2022