From 328d4808ffbd5fd3c303219cf01104957b19dc43 Mon Sep 17 00:00:00 2001 From: bzaar Date: Sun, 27 Apr 2014 14:48:10 +0100 Subject: [PATCH] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index b5796f2..00432c0 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ DawgSharp, a clever string dictionary in C# DAWG stands for 'Directed Acyclic Word Graph' and is a data structure for effectivly storing and searching large word lists. It is basically a Dictionary <string, T>, only a lot faster. Just for your reference, replacing the standard Dictionary with DawgSharp in my website that uses a dictionary of over 2 million words has cut down the load time from 7 seconds to 0.3 seconds and the file size from 56M to 1.4M. -How is this possible? Why is the standard Dictionary not as clever as the DAWG? The thing is, the DAWG works well with natural language strings and may not work as well for generated strings such as license keys (). Human language words tend to have lots of common letter sequences eg _-ility_ in _ability_, _possibility_, _agility_ etc and the algorithm takes advantage of that by finding those sequences and storing them only once for all words. The history of DAWG dates back as far as 1985. For more backgroud google DAWG or DAFSA (Deterministic Acyclic Finite State Automaton). +How is this possible? Why is the standard Dictionary not as clever as the DAWG? The thing is, the DAWG works well with natural language strings and may not work as well for generated strings such as license keys (OIN1r4Be2su+UXSeOj0TaQ). Human language words tend to have lots of common letter sequences eg _-ility_ in _ability_, _possibility_, _agility_ etc and the algorithm takes advantage of that by finding those sequences and storing them only once for all words. The history of DAWG dates back as far as 1985. For more backgroud google DAWG or DAFSA (Deterministic Acyclic Finite State Automaton). The above is true for the DAWG data structure in general. Now more about what is specific to DawgSharp. @@ -55,7 +55,7 @@ if (dawg ["chihuahua"]) <TPayload> ---------- -The Dawg and DawgBuilder classes have a template parameter called ``````. It can be any type you want. Just to be able to test if a word is in the dictionary, a bool is enough. You can also make it an int or a string or a custom class. But beware of one important limitation. The DAWG works well only when the set of values that TPayload can take is comparatively small. The smaller the better. Eg if you add a definition for each word, it will make each entry unique and it won't be able to compact the graph and thus you will loose all the benefits of DAWG. +The Dawg and DawgBuilder classes take a template parameter called ``````. It can be any type you want. Just to be able to test if a word is in the dictionary, a bool is enough. You can also make it an int or a string or a custom class. But beware of one important limitation. The DAWG works well only when the set of values that TPayload can take is comparatively small. The smaller the better. Eg if you add a definition for each word, it will make each entry unique and it won't be able to compact the graph and thus you will loose all the benefits of DAWG. Now, about those lambdas that you pass to ```Load``` and ```Save```. This is how you tell these methods how to serialize and deserialize TPayload. Since you choose the type, you must tell the library how to serialize it. You must write something to the BinaryWriter, even if the value of TPayload is ```null```.