Welcome to the Panlingo repository! π
This project presents a comprehensive collection of language identification libraries for .NET. Its primary purpose is to bring popular language identification models to the .NET ecosystem, allowing developers to seamlessly integrate language detection functionality into their applications.
Model | Authors | Original source code | Wrapper docs |
---|---|---|---|
CLD2 | Google, Inc. | @CLD2Owners/cld2 | link |
CLD3 | Google, Inc. | @google/cld3 | link |
FastText | Meta Platforms, Inc. | @facebookresearch/fastText | link |
Whatlang | Serhii Potapov | @greyblake/whatlang-rs | link |
MediaPipe | Google, Inc. | @google-ai-edge/mediapipe | link |
Lingua | Peter M. Stahl | @pemistahl/lingua-rs | link |
- Zero-dependency development.
- The original code of libraries (CLD2, CLD3, FastText, MediaPipe) is used as submodules without additional significant modifications or improvements (except for a small monkey-patching π). Third-party code is not included into this repository.
- Preserve the original library behavior without breaking changes.
Feature | CLD2 | CLD3 | FastText* | Whatlang | MediaPipe** | Lingua |
---|---|---|---|---|---|---|
Single language prediction | Yes | Yes | Yes | Yes | Yes | Yes |
Multi language prediction | Yes | Yes | Yes | No | Yes | Yes |
Supported languages | 80 | 107 | 176 or 217 | 69 | 110 | 75 |
Unknown language detection | Yes | Yes | No | No | Yes | No |
Algorithm | quadgrams | neural network | neural network | trigrams | neural network | trigrams |
Script detection | No | No | Yes (only lid218e) | Yes | No | No |
Written in | C++ | C++ | C++ | Rust | C++ | Rust |
* When using these models: lid176, lid218e
** When using MediaPipe Language Detector
Model | Linux | Windows | macOS |
---|---|---|---|
CLD2 | β | β | β |
CLD3 | β | β | π§ |
FastText | β | β | β |
Whatlang | β | β | β |
MediaPipe | β | β | β |
Lingua | β | β | β * |
β β Full support | β β No support | π§ β Under research
* arm64 CPU only (Apple silicon M series)
- Research support for other platforms (Windows, macOS).
- Increase unit testing coverage.
- Implement more native methods (FastText).
- Self-contained models (FastText + MediaPipe).
- Remove protobuf dependency (CLD3).
We welcome contributions from developers of all skill levels. Whether you're fixing a bug, adding a new feature, or improving documentation, we appreciate your help in making this project better.
To get started with contributing, follow these simple steps:
-
Clone the Repository
First, clone the repository to your local machine with the following command:
git clone --recurse-submodules --remote-submodules https://github.com/gluschenko/panlingo.git
-
Create a Branch
Before you start making changes, create a new branch to keep your work organized. Use a descriptive name for your branch to make it easy to understand its purpose:
git checkout -b feature/your-feature-name
-
Make Changes
Now, you can make changes to the codebase. Please ensure your code follows our project's coding standards and includes relevant tests if applicable.
-
Commit Your Changes
Once you've made your changes, commit them with a clear and informative commit message:
git add . git commit -m "Add description of your changes"
-
Push Your Changes
Push your branch to the remote repository:
git push origin feature/your-feature-name
-
Open a Pull Request
Navigate to the repository on GitHub and open a pull request. Provide a detailed description of your changes and any additional information that might help reviewers understand your contribution.
After opening a pull request, it will be reviewed by one of the project maintainers. Feedback and suggestions might be provided to ensure the code meets our quality standards. Once approved, your changes will be merged into the main branch.
Please note that this project adheres to a Code of Conduct. By participating, you are expected to uphold this code.
Happy hacking! π©βπ»π¨βπ»