TriSpell is a small and simple console application that provides basic spell checking.
As a fun little side project of mine, TriSpell was intended to explore how basic spellchecking could be implemented using the so called Levenshtein distance metric. Named after the soviet mathematician Vladimir Levenshtein who originally defined it in 1965, this metric computes the similarity of two sequences of characters (more commonly referred to as "strings" in most programming languages of today):
- A source string, which can be thought of as a word entered by the user.
- A target string representing a word that the user may have intended to type (but misspelled).
The metric itself, sometimes also just called "edit distance", is defined as the minimum number of of single-character edits (i. e. insertions, deletions, or substitutions) required to transform the source string into the into the target string. It may be implemented using a variety of different algorithms, and as such TriSpell offers three of the more common implementations for the user to choose from:
Algorithm | Efficiency | Description |
---|---|---|
Recursive | -- | A naive, recursive implementation that more or less directly corresponds to the original mathematical definition by Vladimir Levenshtein. It shows a very poor efficiency (especially on larger inputs), since a lot of the prefixes of the source and target strings are unnecessarily compared more than once (unless memoization is used). |
Iterative Full Matrix | + | This slightly more advanced implementation employs techniques of dynamic programming. It uses a two-dimensional matrix to cache the edit distances between all prefixes of the source and target strings, thus avoiding unnecessary calculations. |
Iterative Optimized Matrix | ++ | Even more advanced implementation that makes use of an important optimization: For calculating the edit distance at a certain position, only the previous and current row of edit distances are ever needed. This usually reduces the memory footprint and runtime even further in comparison to the full matrix algorithm. |
For more details on the individual algorithms see this article from Wikipedia, which also contains useful pseudocode I used as reference for the implementation.
In addition to the algorithms, the user may choose from three different accuracy levels. While the specific implementation (i. e. the maximum edit distance per level) may be subject to change in the future, it is currently defined as follows:
Accuracy Level | Maximum Edit Distance |
---|---|
Low | 3 |
Medium | 2 |
High | 1 |
Feel free to experiment and adjust this to your own preference ;).
TriSpell was originally developed using .NET 8
and C# 12
. I plan to update the project to
to take advantage of new language features in future versions if feasible. For the time being,
just follow these steps to try out TriSpell:
-
Make sure you have
.NET 8
or a later version installed on your machine. -
Clone the repository (or download the source code) to a directory of your choice.
git clone https://github.com/Piwimau/TriSpell.git ./TriSpell cd ./TriSpell
-
Run the application in release mode to achieve the best performance.
dotnet run -c Release
Alternatively, if you have Visual Studio installed on your machine, simply open the provided solution file and proceed from there.
I was originally inspired to create TriSpell after watching this video by Creel, which features some nice animations and explanations of the algorithms. I can highly recommend the video and the channel in general for anyone interested in low-level programming concepts.
TriSpell uses a dictionary of english words which is licensed under the Unlicense and can be found here.
The three edit distance algorithms implemented are based on the information from this and this article.
TriSpell is licensed under the MIT License. Feel free to experiment with the code, adapt it to your own preferences, and share it with others.