Welcome to the Panlingo repository! π
This project presents a comprehensive collection of language identification libraries for .NET. Its primary purpose is to bring popular language identification models to the .NET ecosystem, allowing developers to seamlessly integrate language detection functionality into their applications.
Model | Authors | Original source code | Wrapper docs |
---|---|---|---|
CLD2 | Google, Inc. | @CLD2Owners/cld2 | link |
CLD3 | Google, Inc. | @google/cld3 | link |
FastText | Meta Platforms, Inc. | @facebookresearch/fastText | link |
Whatlang | Serhii Potapov | @greyblake/whatlang-rs | link |
MediaPipe | Google, Inc. | @google-ai-edge/mediapipe | link |
Lingua | Peter M. Stahl | @pemistahl/lingua-rs | link |
- Zero-dependency development.
- The original code of libraries (CLD2, CLD3, FastText, MediaPipe) is used as submodules without additional significant modifications or improvements (except for a small monkey-patching π). Third-party code is not included into this repository.
- Preserve the original library behavior without breaking changes.
Feature | CLD2 | CLD3 | FastText* | Whatlang | MediaPipe** | Lingua |
---|---|---|---|---|---|---|
Single language prediction | Yes | Yes | Yes | Yes | Yes | Yes |
Multi language prediction | Yes | Yes | Yes | No | Yes | Yes |
Supported languages | 80 | 107 | 176 or 217 | 69 | 110 | 75 |
Unknown language detection | Yes | Yes | No | No | Yes | No |
Algorithm | quadgrams | neural network | neural network | trigrams | neural network | trigrams |
Script detection | No | No | Yes (only lid218e) | Yes | No | No |
Written in | C++ | C++ | C++ | Rust | C++ | Rust |
* When using these models: lid176, lid218e
** When using MediaPipe Language Detector
Model | Linux | Windows | macOS |
---|---|---|---|
CLD2 | β | β | β |
CLD3 | β | β | π§ |
FastText | β | β | β |
Whatlang | β | β | β |
MediaPipe | β | β | β |
Lingua | β | β | β * |
β β Full support | β β No support | π§ β Under research
* arm64 CPU only (Apple silicon M series)
- Research support for other platforms (Windows, macOS).
- Increase unit testing coverage.
- Implement more native methods (FastText).
- Self-contained models (FastText + MediaPipe).
- Remove protobuf dependency (CLD3).
Feel free to open issues or contribute to the repository. Together, let's enhance the .NET language identification capabilities! π
Happy hacking! π©βπ»π¨βπ»