Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
bzaar committed Apr 27, 2014
1 parent b2c2328 commit 328d480
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ DawgSharp, a clever string dictionary in C#

DAWG stands for 'Directed Acyclic Word Graph' and is a data structure for effectivly storing and searching large word lists. It is basically a Dictionary <string, T>, only a lot faster. Just for your reference, replacing the standard Dictionary with DawgSharp in my website that uses a dictionary of over 2 million words has cut down the load time from 7 seconds to 0.3 seconds and the file size from 56M to 1.4M.

How is this possible? Why is the standard Dictionary not as clever as the DAWG? The thing is, the DAWG works well with natural language strings and may not work as well for generated strings such as license keys (). Human language words tend to have lots of common letter sequences eg _-ility_ in _ability_, _possibility_, _agility_ etc and the algorithm takes advantage of that by finding those sequences and storing them only once for all words. The history of DAWG dates back as far as 1985. For more backgroud google DAWG or DAFSA (Deterministic Acyclic Finite State Automaton).
How is this possible? Why is the standard Dictionary not as clever as the DAWG? The thing is, the DAWG works well with natural language strings and may not work as well for generated strings such as license keys (OIN1r4Be2su+UXSeOj0TaQ). Human language words tend to have lots of common letter sequences eg _-ility_ in _ability_, _possibility_, _agility_ etc and the algorithm takes advantage of that by finding those sequences and storing them only once for all words. The history of DAWG dates back as far as 1985. For more backgroud google DAWG or DAFSA (Deterministic Acyclic Finite State Automaton).

The above is true for the DAWG data structure in general. Now more about what is specific to DawgSharp.

Expand Down Expand Up @@ -55,7 +55,7 @@ if (dawg ["chihuahua"])
<TPayload>
----------

The Dawg and DawgBuilder classes have a template parameter called ```<TPayload>```. It can be any type you want. Just to be able to test if a word is in the dictionary, a bool is enough. You can also make it an int or a string or a custom class. But beware of one important limitation. The DAWG works well only when the set of values that TPayload can take is comparatively small. The smaller the better. Eg if you add a definition for each word, it will make each entry unique and it won't be able to compact the graph and thus you will loose all the benefits of DAWG.
The Dawg and DawgBuilder classes take a template parameter called ```<TPayload>```. It can be any type you want. Just to be able to test if a word is in the dictionary, a bool is enough. You can also make it an int or a string or a custom class. But beware of one important limitation. The DAWG works well only when the set of values that TPayload can take is comparatively small. The smaller the better. Eg if you add a definition for each word, it will make each entry unique and it won't be able to compact the graph and thus you will loose all the benefits of DAWG.

Now, about those lambdas that you pass to ```Load``` and ```Save```. This is how you tell these methods how to serialize and deserialize TPayload. Since you choose the type, you must tell the library how to serialize it. You must write something to the BinaryWriter, even if the value of TPayload is ```null```.

Expand Down

0 comments on commit 328d480

Please sign in to comment.