Doc-Searcher is a simple and flexible document search application, leveraging the capabilities of Rust and Elasticsearch (by default) to provide efficient and effective full-text search in documents. This project aims to offer a straightforward solution for indexing and searching through a large corpus of documents with the speed and accuracy provided by Elasticsearch.
The main goal - implement simple but powerful system of storing and indexing documents with searching functionality (full-text, semantic). I decided to use elasticsearch as default searching engine, but you may use own solutions by implementing several async traits for Tantivy, QDrant or own solution:
- CacherService - API of doc-notifier service interactions;
- EmbeddingsService - API of doc-notifier service interactions;
- MetricsService - API of metrics to monitoring;
- StorageService - API (CRUD) of indexed folders and documents;
- SearcherService - API of searcher functionalities (fulltext, vector, similar).
- Full-Text Search: Quickly find documents based on content based on choose searching engine;
- Semantic Search: Fast semantic searching by external embeddings service;
- Rust Performance: Benefit from the speed and safety of Rust;
- REST API: Easy to use REST API for searching documents and control management of indexing;
- Docker Support: Easy deployment with Docker and docker-compose;
- Caching Actor: Store data to cache service like Redis or own solutions;
- Remote logging: Send error or warning messages or other metrics to remote server;
- Swagger: Using swagger documentation service for all available endpoints;
- Cors Origins: Allows to provide web pages with access to resources of another domain;
- Parsing and storing: Allows to parse and store files to searching engine localy.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Rust
- Docker & docker-compose
- Cache (Redis)
- Elasticsearch
- Clone the repository
- Run
cargo install --path .
to build project - Setting up
.env
file with services creds - Run
cargo run --bin doc-searcher-init
to init elasticsearch schemas - Run
cargo run --bin doc-searcher-run
to launch service
Features to parse and store documents localy from current service (Not stable):
- enable-cacher - enable cacher service like redis oe other custom implementation;
- enable-semantic - enable llm service for semantic searching.