Skip to content

Commit

Permalink
add Livebook notebooks and add info to readme
Browse files Browse the repository at this point in the history
  • Loading branch information
vanessaklee committed Sep 3, 2023
1 parent ab3565b commit 3d8c5be
Show file tree
Hide file tree
Showing 3 changed files with 102 additions and 1 deletion.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@ Akin

Akin is a collection of string comparison algorithms for Elixir. This solution was born of a [Record Linking](https://en.wikipedia.org/wiki/Record_linkage) project. It combines and modifies [The Fuzz](https://github.com/smashedtoatoms/the_fuzz) and [Fuzzy Compare](https://github.com/patrickdet/fuzzy_compare). Algorithms can be called independently or in total to return a map of metrics. This library was built to facilitiate the disambiguation of names but can be used to compare any two binaries.

## New! Notebooks

Disambiguation
[![Run Disambiguation in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fgithub.com%2Fvanessaklee%2Fakin%2Fblob%2Fmain%2Fnotebooks%2Fdisambiguation.livemd)

Name Disambiguation
[![Run Name Disambiguation in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fgithub.com%2Fvanessaklee%2Fakin%2Fblob%2Fmain%2Fnotebooks%2Fname_disambiguation.livemd)

<details>
<summary>Table of Contents</summary>

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Akin Examples
# Disambiguation

## Akin

Expand Down
93 changes: 93 additions & 0 deletions notebooks/name_disambiguation.livemd
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Name Disambiguation

## Match

_UNDER DEVELOPMENT_

Identity is the challenge of author name disambiguation (AND). The aim of AND is to match an author's name to that author when the author appears in a list of many authors. Complexity arises from homonymity (many people with the same name) and synonymity (when one person uses different forms/spellings of their name in publications).

Given the name of an author which is divided into the given, middle, and family name parts (i.e. "Virginia", nil, "Woolf") and a list of possible matching author names, find and return the matches for the author in the list. If initials exist in the left name, a separate comparison is performed for the initals and the sets of the right string.

If the comparison metrics produce a score greater than or equal to 0.9, they considered a match and returned in the list.

We want to find possible matches to the name "V. Woolf"

```elixir
name = "Virginia Woolf"
```

in a list of other names

```elixir
other_names = [
"V Woolf",
"V Woolfe",
"Virginia Woolf",
"V White",
"Viginia Wolverine",
"Virginia Woolfe"
]
```

The most likely matches are returned.

```elixir
Akin.match_names(name, other_names)
```

Use options to require stricter matching.

```elixir
other_names = [
"Victor Woolf",
"V Woolf",
"V Woolfe",
"Virginia Woolf",
"V White",
"Viginia Wolverine",
"Virginia Woolfe"
]
```

```elixir
opts = [match_at: 0.99]

Akin.match_names(name, other_names, opts)
```

### Initials

The results are good even if we only have an initial for part of the name we are disambiguating.

```elixir
name = "V. Woolf"
```

```elixir
Akin.match_names(name, other_names)
```

### Not Perfect

The results are imperfect and can lead to unwanted matches. See how "Victor" fairs.

```elixir
other_names = [
"Victor Woolf",
"V Woolfe",
"Virginia Woolf",
"V White",
"Viginia Wolverine",
"Virginia Woolfe"
]
```

```elixir
Akin.match_names(name, other_names)
```

```elixir
opts = [match_at: 0.99, algorithms: ["bag_distance", "jaccard", "jaro_winkler"]]

Akin.match_names(name, other_names, opts)
```

0 comments on commit 3d8c5be

Please sign in to comment.