Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EmbeddingStore (Metadata) Filter API #610

Merged
merged 37 commits into from
Mar 8, 2024
Merged

EmbeddingStore (Metadata) Filter API #610

merged 37 commits into from
Mar 8, 2024

Conversation

langchain4j
Copy link
Owner

@langchain4j langchain4j commented Feb 7, 2024

New EmbeddingStore (metadata) Filter API

Many embedding stores, such as Pinecone and Milvus support strict filtering (think of an SQL "WHERE" clause) during similarity search.
So, if one has an embedding store with movies, for example, one could search not only for the most semantically similar movies to the given user query but also apply strict filtering by metadata fields like year, genre, rating, etc. In this case, the similarity search will be performed only on those movies that match the filter expression.

Since LangChain4j supports (and abstracts away) many embedding stores, there needs to be an embedding-store-agnostic way for users to define the filter expression.

This PR introduces a Filter interface, which can represent both simple (e.g., type = "documentation") and composite (e.g., type in ("documentation", "tutorial") AND year > 2020) filter expressions in an embedding-store-agnostic manner.

Filter currently supports the following operations:

  • Comparison:

    • IsEqualTo
    • IsNotEqualTo
    • IsGreaterThan
    • IsGreaterThanOrEqualTo
    • IsLessThan
    • IsLessThanOrEqualTo
    • IsIn
    • IsNotIn
  • Logical:

    • And
    • Not
    • Or

These operations are supported by most embedding stores and serve as a good starting point. However, the list of operations will expand over time to include other operations (e.g., Contains) supported by embedding stores.

Currently, the DSL looks like this:

Filter onlyDocs = metadataKey("type").isEqualTo("documentation");

Filter docsAndTutorialsAfter2020 = metadataKey("type").isIn("documentation", "tutorial").and(metadataKey("year").isGreaterThan(2020));
// or
Filter docsAndTutorialsAfter2020 = and(
    metadataKey("type").isIn("documentation", "tutorial"),
    metadataKey("year").isGreaterThan(2020)
);

Filter expression as a String

Filter expression can also be specified as a String. This might be necessary, for example, if the filter expression is generated dynamically by the application or by the LLM (as in self querying).

This PR introduces a FilterParser interface with a simple Filter parse(String) API, allowing for future support of multiple syntaxes (if this will be required).

For the out-of-the-box filter syntax, ANSI SQL's WHERE clause is proposed as a suitable candidate for several reasons:

  • SQL is well-known among Java developers
  • There is extensive tooling available for SQL (e.g., parsers)
  • LLMs are pretty good at generating valid SQL, as there are tons of SQL queries on the internet, which are included in the LLM training datasets. There are also specialized LLMs that are trained for text-to-SQL task, such as SQLCoder.

The downside is that SQL's WHERE clause might not support all operations and data types that could be supported in the future by various embedding stores. In such case, we could extend it to a superset of ANSI SQL WHERE syntax and/or provide an option to express filters in the native syntax of the store.

An out-of-the-box implementation of the SQL FilterParser is provided as a SqlFilterParser in a separate module langchain4j-embedding-store-filter-parser-sql, using JSqlParser under the hood.

SqlFilterParser can parse SQL "SELECT" (or just "WHERE" clause) statement into a Filter object:

  • SELECT * FROM fake_table WHERE userId = '123-456' -> metadataKey("userId").isEqualTo("123-456")
  • userId = '123-456' -> metadataKey("userId").isEqualTo("123-456")

It can also resolve CURDATE() and CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP:
SELECT * FROM fake_table WHERE year = EXTRACT(YEAR FROM CURRENT_DATE -> metadataKey("year").isEqualTo(LocalDate.now().getYear())

Changes in Metadata API

Until now, Metadata supported only String values. This PR expands the list of supported value types to Integer, Long, Float and Double. In the future, more types may be added (if needed).
The method String get(String key) will be deprecated later in favor of:

  • String getString(String key)
  • Integer getInteger(String key)
  • Long getLong(String key)
  • etc

New overloaded put(key, value) methods are introduced to support more value types:

  • put(String key, int value)
  • put(String key, long value)
  • etc

Changes in EmbeddingStore API

New method search is added that will become the main entry point for search in the future. All findRelevant methods will be deprecated later.
New search method accepts EmbeddingSearchRequest and returns EmbeddingSearchResult.
EmbeddingSearchRequest contains all search criteria (e.g. maxResults, minScore), including new Filter.
EmbeddingSearchResult contains a list of EmbeddingMatch.

EmbeddingSearchResult search(EmbeddingSearchRequest request);

Changes in EmbeddingStoreContentRetriever API

EmbeddingStoreContentRetriever can now be configured with a static filter as well as dynamic dynamicMaxResults, dynamicMinScore and dynamicFilter in the builder:

ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
                .embeddingStore(embeddingStore)
                .embeddingModel(embeddingModel)
                ...
                .maxResults(3)
                // or
                .dynamicMaxResults(query -> 3) // You can define maxResults dynamically. The value could, for example, depend on the query or the user associated with the query.
                ...
                .minScore(0.3)
                // or
                .dynamicMinScore(query -> 0.3)
                ...
                .filter(metadataKey("userId").isEqualTo("123-456")) // Assuming your TextSegments contain Metadata with key "userId"
                // or
                .dynamicFilter(query -> metadataKey("userId").isEqualTo(query.metadata().chatMemoryId().toString()))
                ...
                .build();

So now you can define maxResults, minScore and filter both statically and dynamically (they can depend on the query, user, etc.). These values will be propagated to the underlying EmbeddingStore.

"Self-querying"

This PR also introduces LanguageModelSqlFilterBuilder in langchain4j-embedding-store-filter-parser-sql module which can be used with EmbeddingStoreContentRetriever's dynamicFilter to automatically build a Filter object from the Query using language model and SqlFilterParser.

For example:

TextSegment groundhogDay = TextSegment.from("Groundhog Day", new Metadata().put("genre", "comedy").put("year", 1993));
TextSegment forrestGump = TextSegment.from("Forrest Gump", new Metadata().put("genre", "drama").put("year", 1994));
TextSegment dieHard = TextSegment.from("Die Hard", new Metadata().put("genre", "action").put("year", 1998));

// describe metadata keys as if they were columns in the SQL table
TableDefinition tableDefinition = TableDefinition.builder()
                .name("movies")
                .addColumn("genre", "VARCHAR", "one of [comedy, drama, action]")
                .addColumn("year", "INT")
                .build();

LanguageModelSqlFilterBuilder sqlFilterBuilder = new LanguageModelSqlFilterBuilder(model, tableDefinition);

ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
                .embeddingStore(embeddingStore)
                .embeddingModel(embeddingModel)
                .dynamicFilter(sqlFilterBuilder::build)
                .build();

String answer = assistant.answer("Recommend me a good drama from 90s"); // Forrest Gump

Which embedding store integrations will support Filter?

In the long run, all (provided the embedding store itself supports it).
In the first iteration, I aim to add support to just a few:

  • InMemoryEmbeddingStore
  • Elasticsearch
  • Milvus

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Introduced filters for checking key's value existence in a collection for improved data handling.
  • Enhancements
    • Updated InMemoryEmbeddingStoreTest to extend a different class for improved testing coverage and added a new test method.
  • Refactor
    • Made minor formatting adjustments in the assertion block for better readability.
  • Documentation
    • Updated class hierarchy information for clarity.

@andyflury
Copy link

This looks great @langchain4j Thanks for the quick turnaround! I just tested in with my application with a simple equals comparison and it worked like a charm!

Regarding your questions. I cannot comment on all of them, but here are a few thoughts

  • Which data types should Metadata support? Some embedding stores do not differentiate between number types (e.g., Integer, Long, Float, Double). Some store metadata as JSON, so type information is lost when persisting Metadata and then retrieving it back. Currently, I see two options: 1. Store an additional "type" field for each Metadata key-value pair and be strict during comparisons (e.g., fail if Integer is compared with Long). 2. Implement auto-casting (similar to Java's) during comparisons. So, if the metadata value was originally a Long, but after persistence, it is retrieved as an Integer, ensure it automatically casts to a Long filter value and also automatically casts when the user invokes getLong(key).

I would say types like:

  • Strings
  • Wrapped Primitives, (Integer, Long, Double, Boolean, etc.)
  • Dates (LocalDate, LocalTime, etc).
  • Enums
  • BigDecimal & BigInteger

I would strict comparisons.

  • Should we use and(x, y) or x.and(y)syntax for combining operations? Or both? The first is more explicit and clear in terms of precedence, while the second is more convenient for auto-completion.

Personally I would prefer x.and(y) but both are fine

  • JOOQ/QueryDSL-like (eq, lt, gte, etc) or AssertJ-like (isEqualTo, isLessThan, isGreaterThanOrEqualTo) naming?

Personally I would prefer eq, lt, gte, etc.

  • Should MetadataFilter extend Predicate<Metadata>?

that sounds reasonable to me!

@langchain4j
Copy link
Owner Author

@andyflury thanks for the feedback!

Another random thought: instead of having DSL just for metadata filtering, we could have a DSL for the whole search/retrieval, something like this:

SearchRequest.forQueryEmbedding(queryEmbedding)
                .where(metadata -> metadata.field("author").eq("Stephen King").and(metadata.field("year").gt(2020)))
                .minScore(0.77)
                .maxResults(5)
                .build();

@andyflury
Copy link

@langchain4j great idea, that would make things even more convenient. I'm all for it!

@geoand
Copy link
Contributor

geoand commented Feb 26, 2024

This feature is awesome!

@jmartisk @karesti you will probably interested in this as well

@andyflury
Copy link

@langchain4j would it potentially make sense to give the EmbeddingStore an optional filter constructor argument? That would be helpful for the new Advanced RAG when using a QueryRouter with EmbeddingStores and all of these stores are based on the same vector database index.

Let's say we have two types of information in the same vector database index:

  • information about cars (sourceType = 'CAR')
  • information about bikes (sourceType = 'BIKE')

Then you could construct one EmbeddingStore with filter sourceType = 'CAR' and one EmbeddingStore with filter sourceType = 'BIKE' plus then have the QueryRouter choose the appropriate EmbeddingStore for each query.

@langchain4j
Copy link
Owner Author

@andyflury that totally makes sense! However, it would probably be easier to define it on the EmbeddingStoreContentRetriever level, otherwise we will need to add this to constructor of every EmbeddingStore implementation (15+).

Also I plan to add a self-query retriever similar to the one in python version, where LLM will be able to construct metadata filter dynamically, potentially including multiple fields. This will help to narrow down the search.

Another thing that will probably be useful is to automatically construct filter by user (If @memoryid) is present.
Then same store/index can be used for multiple users and their data will be separated.

@andyflury
Copy link

@langchain4j Yes, you are right doing it inside EmbeddingStoreContentRetriever is probably better!

A self-query retriever sound extremely interesting.

This could potentially solve another challenge we are facing currently. We have started indexing all out customer support tickets. This works well when asking questions like "what is the best solution to solve problem xyz". But it does not work well for questions like "please summarize all tickets from the last month for customer xyz". I assume that with the self-query retriever, the system would automatically be able to set filters for customerName and ticketDate.

so far I was using Tools for this, but having a self-query retriever (in combination with the QueryRouter would definitely be the better approach).

Would you also consider having something like an SQLEmbeddingStore combined with a self-query retriever, which would automatically set appropriate WHERE clauses?

@langchain4j
Copy link
Owner Author

@andyflury

Would you also consider having something like an SQLEmbeddingStore combined with a self-query retriever, which would automatically set appropriate WHERE clauses?

Do you mean a ContentRetriever backed by an SQL DB?

@andyflury
Copy link

andyflury commented Feb 29, 2024

Do you mean a ContentRetriever backed by an SQL DB?

Yes, I think that could be useful.

I'm still trying to figure out where to use RAG (i.e. ContentRetriever) and where to use Tools. After our discussion I'm now tempted to use `ContentRetrievers? for any sort of data retrieval (vector store, SQL databases, etc.) and use Tools only for cases where some sort of action is performed by an external system (e.g. create an Email, raise a ticket, etc.).

Would you agree to this?

@langchain4j
Copy link
Owner Author

@andyflury yes, exactly. To summarise:

  • RAG is useful when you want to automatically provide LLM the external content which it does not have access to (by nature).
  • Tools are useful to provide LLM access to external actions which it does not have access to (by nature). LLM is the one who decides to call the tool or not.

However, ContentRetriever can be wrapped into a @Tool and be called by LLM on demand.

Copy link

coderabbitai bot commented Mar 7, 2024

Warning

Rate Limit Exceeded

@langchain4j has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 55 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.
Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.
Please see our FAQ for further information.

Commits Files that changed from the base of the PR and between 245b60d and 12a8872.

Walkthrough

These updates introduce new filters for checking if a specified key's value exists in a collection and enhance the InMemoryEmbeddingStoreTest functionality. The changes focus on improving type compatibility, handling number comparisons, and refining test coverage with a new method for JSON deserialization testing.

Changes

Files Change Summary
IsIn.java, IsNotIn.java Added filters to check key value existence in a collection with type compatibility and number comparison handling.
InMemoryEmbeddingStoreTest.java Updated inheritance to extend EmbeddingStoreWithFilteringIT, adjusted formatting in hash code comparison, and added test_backwards_compatibility method for JSON deserialization testing.

Poem

Within the code's domain, a rabbit did play,
Adding filters and tests to brighten the day.
🐇 "IsIn, IsNotIn, a collection to inspect,
InMemory tests now with new methods to perfect."
With a hop and a skip, the code came alive,
Testing and filtering to make systems thrive.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 40

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 200522f and 466dab4.
Files ignored due to path filters (22)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/pom.xml is excluded by: !**/*.xml
  • langchain4j-azure-ai-search/pom.xml is excluded by: !**/*.xml
  • langchain4j-bom/pom.xml is excluded by: !**/*.xml
  • langchain4j-cassandra/pom.xml is excluded by: !**/*.xml
  • langchain4j-chroma/pom.xml is excluded by: !**/*.xml
  • langchain4j-core/pom.xml is excluded by: !**/*.xml
  • langchain4j-elasticsearch/pom.xml is excluded by: !**/*.xml
  • langchain4j-infinispan/pom.xml is excluded by: !**/*.xml
  • langchain4j-milvus/pom.xml is excluded by: !**/*.xml
  • langchain4j-mongodb-atlas/pom.xml is excluded by: !**/*.xml
  • langchain4j-neo4j/pom.xml is excluded by: !**/*.xml
  • langchain4j-opensearch/pom.xml is excluded by: !**/*.xml
  • langchain4j-parent/pom.xml is excluded by: !**/*.xml
  • langchain4j-pgvector/pom.xml is excluded by: !**/*.xml
  • langchain4j-pinecone/pom.xml is excluded by: !**/*.xml
  • langchain4j-qdrant/pom.xml is excluded by: !**/*.xml
  • langchain4j-redis/pom.xml is excluded by: !**/*.xml
  • langchain4j-vearch/pom.xml is excluded by: !**/*.xml
  • langchain4j-vespa/pom.xml is excluded by: !**/*.xml
  • langchain4j-weaviate/pom.xml is excluded by: !**/*.xml
  • langchain4j/pom.xml is excluded by: !**/*.xml
  • pom.xml is excluded by: !**/*.xml
Files selected for processing (67)
  • .github/workflows/release.yaml (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/ColumnDefinition.java (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilder.java (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/TableDefinition.java (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParser.java (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderIT.java (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderTest.java (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParserTest.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/Experimental.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/data/document/Document.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java (8 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/data/segment/TextSegment.java (8 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/RetrievalAugmentor.java (2 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/content/aggregator/ContentAggregator.java (2 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/content/injector/ContentInjector.java (2 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/query/router/LanguageModelQueryRouter.java (2 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/query/router/QueryRouter.java (2 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/CompressingQueryTransformer.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/ExpandingQueryTransformer.java (2 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/QueryTransformer.java (2 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchRequest.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchResult.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java (6 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/FilterParser.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/Equal.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/GreaterThan.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/GreaterThanOrEqual.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/In.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/LessThan.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/LessThanOrEqual.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NotEqual.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NotIn.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NumberComparator.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/TypeChecker.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/And.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/Not.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/Or.java (1 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/data/document/MetadataTest.java (2 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetrieverTest.java (1 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreIT.java (2 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java (1 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithoutMetadataIT.java (10 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/filter/FilterTest.java (1 hunks)
  • langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/Document.java (1 hunks)
  • langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStore.java (6 hunks)
  • langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchMetadataFilterMapper.java (1 hunks)
  • langchain4j-elasticsearch/src/test/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStoreCloudIT.java (1 hunks)
  • langchain4j-elasticsearch/src/test/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStoreIT.java (2 hunks)
  • langchain4j-infinispan/src/test/java/dev/langchain4j/store/embedding/infinispan/InfinispanEmbeddingStoreIT.java (1 hunks)
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/CollectionOperationsExecutor.java (4 hunks)
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/CollectionRequestBuilder.java (3 hunks)
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/Generator.java (2 hunks)
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/Mapper.java (6 hunks)
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java (5 hunks)
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusMetadataFilterMapper.java (1 hunks)
  • langchain4j-milvus/src/test/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStoreCloudIT.java (1 hunks)
  • langchain4j-milvus/src/test/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStoreIT.java (3 hunks)
  • langchain4j-redis/src/main/java/dev/langchain4j/store/embedding/redis/RedisEmbeddingStore.java (9 hunks)
  • langchain4j-redis/src/main/java/dev/langchain4j/store/embedding/redis/RedisSchema.java (4 hunks)
  • langchain4j-redis/src/test/java/dev/langchain4j/store/embedding/redis/RedisEmbeddingStoreIT.java (2 hunks)
  • langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/GsonInMemoryEmbeddingStoreJsonCodec.java (1 hunks)
  • langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStore.java (3 hunks)
  • langchain4j/src/test/java/dev/langchain4j/service/AiServicesWithRagIT.java (3 hunks)
  • langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreSerializedTest.java (1 hunks)
  • langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3 hunks)
Files not summarized due to errors (2)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParserTest.java: Error: Message exceeds token limit
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java: Error: Message exceeds token limit
Files not reviewed due to errors (2)
  • SqlFilterParser.java (no review received)
  • SqlFilterParserTest.java (no review received)
Files skipped from review due to trivial changes (3)
  • langchain4j-core/src/main/java/dev/langchain4j/Experimental.java
  • langchain4j-core/src/main/java/dev/langchain4j/data/segment/TextSegment.java
  • langchain4j-infinispan/src/test/java/dev/langchain4j/store/embedding/infinispan/InfinispanEmbeddingStoreIT.java
Additional comments: 145
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/FilterParser.java (1)
  • 1-21: The introduction of the FilterParser interface is a crucial part of the new Filter API, enabling the parsing of filter expressions into Filter objects. The documentation clearly explains the purpose and the current implementation (SqlFilterParser). It's important to ensure that the documentation stays up-to-date as more implementations are added. The use of the @Experimental annotation is appropriate here, signaling that the API is subject to change.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/Not.java (1)
  • 1-27: The Not class is a well-implemented logical operation for the Filter API, allowing for negating filter expressions. The use of ensureNotNull in the constructor ensures that a null expression is not passed, which is crucial for avoiding null pointer exceptions during runtime. The test method correctly negates the result of the expression's test, aligning with the expected behavior of a logical NOT operation.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchResult.java (1)
  • 1-26: The EmbeddingSearchResult class is a straightforward representation of search results, encapsulating a list of EmbeddingMatch objects. The use of ensureNotNull in the constructor is good practice, preventing null values for the matches list and thus avoiding potential null pointer exceptions. Marking the class and its methods with the @Experimental annotation is appropriate, indicating that the API may change.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/Or.java (1)
  • 1-33: The Or class implements the logical OR operation for the Filter API, allowing for the combination of two filter expressions. The use of ensureNotNull for both left and right expressions in the constructor is essential for maintaining robustness. The test method's implementation, which returns true if either expression evaluates to true, correctly embodies the logical OR operation. This class is a valuable addition to the Filter API, enabling more complex filter expressions.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/And.java (1)
  • 1-33: The And class is another key component of the logical operations for the Filter API, implementing the logical AND operation. The constructor's use of ensureNotNull for both operands ensures that null expressions are not allowed, which is crucial for the operation's integrity. The test method's logic, requiring both expressions to evaluate to true, accurately represents the AND operation. This class enhances the Filter API's capability to represent complex filter expressions.
langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/Generator.java (1)
  • 29-35: The addition of the generateEmptyJsons method is a practical utility for creating lists of empty JSON objects, which could be useful in various contexts within the Milvus embedding store integration. The use of new JSONObject(new HashMap<>()) for each list entry ensures that each JSON object is indeed empty and mutable. This method could be particularly useful for initializing data structures or for tests where empty JSON objects are required.
langchain4j-core/src/main/java/dev/langchain4j/rag/RetrievalAugmentor.java (1)
  • 16-16: Updating the annotation to @Experimental on the RetrievalAugmentor interface signals that this part of the API is subject to change and may not be fully stable yet. This is an important note for developers using this interface, as it sets the expectation that future versions of LangChain4j might introduce breaking changes or enhancements to this functionality. It's crucial to document the rationale behind marking specific APIs as experimental to guide developers accordingly.
langchain4j-core/src/main/java/dev/langchain4j/rag/content/injector/ContentInjector.java (1)
  • 17-17: The change from MightChangeInTheFuture to Experimental annotation on the ContentInjector interface correctly signals the current stability and maturity level of this feature. Developers should be aware that experimental features might undergo significant changes and should use them with caution in production environments.
embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/ColumnDefinition.java (1)
  • 9-38: The ColumnDefinition class is well-designed, with appropriate use of validation in constructors and getters for each field. The decision to use a string for the type field, as discussed in previous comments, allows for flexibility and future-proofing, which is suitable for the intended use case. The @Experimental annotation is appropriately applied, indicating the feature's current development stage.
langchain4j-core/src/main/java/dev/langchain4j/rag/query/router/QueryRouter.java (1)
  • 25-25: The update from MightChangeInTheFuture to Experimental annotation on the QueryRouter interface is appropriate, signaling the current development stage of this feature. It's important for developers to note that experimental features are subject to change and should be used cautiously in production.
langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/QueryTransformer.java (1)
  • 27-27: The change from MightChangeInTheFuture to Experimental annotation on the QueryTransformer interface correctly indicates the feature's current development stage. Developers should be mindful of the experimental status of features, as they may undergo significant changes.
langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreSerializedTest.java (1)
  • 13-33: The InMemoryEmbeddingStoreSerializedTest class is well-structured for testing the serialization and deserialization functionality of InMemoryEmbeddingStore. It cleverly uses the awaitUntilPersisted method to simulate persistence. The actual testing logic relies on inherited tests from EmbeddingStoreWithFilteringIT, which is an efficient use of existing test infrastructure.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NumberComparator.java (1)
  • 6-32: The NumberComparator class provides robust utility methods for comparing numbers by converting them to BigDecimal, which is an effective way to handle floating-point inaccuracies. The clear error reporting for unsupported types in the toBigDecimal method enhances type safety and usability.
langchain4j-core/src/main/java/dev/langchain4j/rag/content/aggregator/ContentAggregator.java (1)
  • 26-26: The update from MightChangeInTheFuture to Experimental annotation on the ContentAggregator interface is appropriate, signaling the current development stage of this feature. Developers should be mindful of the experimental status of features, as they may undergo significant changes.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/Equal.java (1)
  • 15-52: The Equal class is well-implemented, providing robust functionality for equality comparisons within the metadata filtering context. It correctly handles both numeric and non-numeric comparisons, with appropriate validation for key and comparisonValue. The use of NumberComparator for numeric types and ensureTypesAreCompatible for type safety before comparison is commendable.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NotEqual.java (1)
  • 20-22: The constructor correctly validates the key and comparisonValue parameters using ensureNotBlank and ensureNotNull. This ensures that neither the key nor the comparison value is null or blank, which is crucial for the integrity of the filter operation.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/LessThan.java (1)
  • 20-22: The constructor validation for key and comparisonValue is correctly implemented, ensuring that essential parameters are neither null nor blank. This validation is crucial for the integrity of the filter operation.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/GreaterThan.java (1)
  • 20-22: The constructor validation for key and comparisonValue is correctly implemented, ensuring that essential parameters are neither null nor blank. This validation is crucial for the integrity of the filter operation.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/LessThanOrEqual.java (1)
  • 20-22: The constructor validation for key and comparisonValue is correctly implemented, ensuring that essential parameters are neither null nor blank. This validation is crucial for the integrity of the filter operation.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/GreaterThanOrEqual.java (1)
  • 20-22: The constructor validation for key and comparisonValue is correctly implemented, ensuring that essential parameters are neither null nor blank. This validation is crucial for the integrity of the filter operation.
langchain4j-redis/src/test/java/dev/langchain4j/store/embedding/redis/RedisEmbeddingStoreIT.java (1)
  • 46-46: The change to use createMetadata().toMap().keySet() for specifying metadata keys in the RedisEmbeddingStoreIT class is a good improvement. It ensures that the test setup dynamically reflects the actual metadata keys used, enhancing the test's relevance and accuracy.
.github/workflows/release.yaml (1)
  • 43-44: The addition of ELASTICSEARCH_CLOUD_API_KEY and ELASTICSEARCH_CLOUD_URL as environment variables in the release workflow is a good practice. It ensures that these sensitive values are securely managed and accessible to the steps that require them. Ensure that these secrets are properly configured in the GitHub repository settings.
embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/TableDefinition.java (1)
  • 22-25: The constructor correctly validates the name parameter and ensures that the columns collection is neither null nor empty. This validation is crucial for the integrity of the table definition. The description parameter is not validated for non-null values, which is acceptable if a description is optional.
langchain4j-milvus/src/test/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStoreIT.java (3)
  • 9-9: The import change from EmbeddingStoreWithoutMetadataIT to EmbeddingStoreWithFilteringIT aligns with the PR's objective to introduce filtering capabilities. This change suggests that the test now extends a class that likely includes filtering-related tests, which is a positive step towards testing the new functionality.
  • 38-41: Adding an @AfterEach method to drop the collection after each test is a good practice. It ensures that each test runs in a clean state, preventing potential side effects from affecting other tests. This is especially important in integration tests involving external systems like Milvus.
  • 24-24: The usage of a constant COLLECTION_NAME for the collection name instead of hardcoding it in multiple places enhances maintainability and readability. This change makes it easier to update the collection name in the future if needed.
langchain4j-milvus/src/test/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStoreCloudIT.java (4)
  • 9-9: The import change from EmbeddingStoreWithoutMetadataIT to EmbeddingStoreWithFilteringIT is consistent with the PR's goal to enhance filtering capabilities in tests. This change suggests that the test now extends a class likely including filtering-related tests, which is beneficial for testing the new functionality.
  • 32-35: Adding an @AfterEach method to drop the collection after each test ensures that each test runs in a clean state. This practice is crucial for integration tests involving external systems like Milvus, especially when testing against a cloud instance where leftover data might incur costs or affect subsequent tests.
  • 20-20: Using a constant COLLECTION_NAME for the collection name improves maintainability and readability by centralizing the collection name definition. This approach facilitates future updates to the collection name if necessary.
  • 23-28: The configuration for connecting to a Milvus cloud instance, including the use of an environment variable for the API token (MILVUS_API_KEY), is a secure practice for handling sensitive information. However, it's recommended to also externalize the URI to an environment variable or configuration file to avoid hardcoding sensitive or environment-specific information in the source code.

Consider externalizing the URI to an environment variable or configuration file for better security and flexibility.

langchain4j-redis/src/main/java/dev/langchain4j/store/embedding/redis/RedisSchema.java (3)
  • 37-37: Changing the type of metadataKeys from List<String> to Collection<String> increases flexibility in the types of collections that can be used. This change allows for both ordered (List) and unordered (Set) collections, which can be beneficial depending on the use case. However, it's important to ensure that the rest of the codebase that interacts with metadataKeys is compatible with this change.
  • 51-51: Introducing a constructor that only takes dimension as a parameter simplifies the creation of RedisSchema instances when only the dimension is known or relevant. This can enhance usability in scenarios where other parameters are not needed or have sensible defaults.
  • 74-91: The method name changes to follow Java bean naming conventions (indexName(), prefix(), vectorFieldName(), scalarFieldName(), metadataKeys()) improve readability and align with common Java practices. This change makes the code more intuitive and consistent with standard Java naming conventions.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchRequest.java (1)
  • 21-70: The introduction of the EmbeddingSearchRequest class is a significant addition, aligning with the PR's objectives to enhance the embedding store's functionality. This class encapsulates all necessary parameters for an embedding search request, including a new Filter for metadata filtering. The use of validation methods (ensureNotNull, ensureGreaterThanZero, ensureBetween) in the constructor ensures that the provided values are valid, enhancing the robustness of the code. The use of default values for optional parameters (maxResults, minScore) improves usability by providing sensible defaults.
langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/CollectionRequestBuilder.java (2)
  • 36-40: Introducing the buildDropCollectionRequest method to create a drop collection request for Milvus is a useful addition. This method simplifies the process of dropping a collection, encapsulating the necessary parameters and providing a clear, reusable way to perform this operation.
  • 57-74: Modifying the buildSearchRequest method to handle a Filter parameter is a crucial enhancement that aligns with the PR's objectives to introduce filtering capabilities. This change allows for the inclusion of filter expressions in search requests, significantly enhancing the flexibility and power of searches. It's important to ensure that the MilvusMetadataFilterMapper.map(filter) method correctly translates the Filter object into a Milvus-compatible expression.
langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/ExpandingQueryTransformer.java (2)
  • 49-51: Changing the access modifiers of chatLanguageModel, promptTemplate, and n from private to protected allows subclasses to directly access these fields. This change can facilitate extension and customization of the ExpandingQueryTransformer class. However, it's important to ensure that this does not unintentionally expose internal implementation details that should be encapsulated.
  • 76-76: Simplifying the transform method to directly return the parsed response enhances readability and reduces unnecessary complexity. This change makes the method's purpose and behavior more straightforward.
embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderTest.java (2)
  • 30-49: The tests for parsing valid SQL queries are comprehensive, covering various common scenarios including trailing semicolons, leading and trailing spaces, and newline characters. These tests ensure that the LanguageModelSqlFilterBuilder can correctly parse and handle SQL queries in different formats. It's important that the sqlFilterParser.parse method is called with the trimmed SQL query to avoid issues with leading or trailing whitespace.
  • 70-90: The tests for extracting valid SQL from responses with additional text or formatting are thorough, covering scenarios with code blocks, additional text before and after the SQL query, and various combinations of formatting characters. These tests validate the LanguageModelSqlFilterBuilder's ability to extract the core SQL query from a response that may include additional, non-relevant text or formatting. This capability is crucial for handling responses from chat models that might include explanatory text or formatting around the SQL query.
langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/CompressingQueryTransformer.java (1)
  • 51-52: Changing the access modifiers of promptTemplate and chatLanguageModel from private to protected in CompressingQueryTransformer allows for easier extension and customization by subclasses. This change can be beneficial for creating specialized transformers that require direct access to these fields. However, it's crucial to ensure that this change does not expose more of the class's internals than necessary.
langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (2)
  • 20-20: The change to extend EmbeddingStoreWithFilteringIT aligns with the PR's objective to introduce filtering capabilities to the EmbeddingStore. Ensure that EmbeddingStoreWithFilteringIT provides all necessary test methods and setup for filtering functionality. If additional tests specific to the InMemoryEmbeddingStore are required, consider implementing them.
  • 66-66: The minor formatting adjustment with hasSameHashCodeAs is a good practice for consistency in test assertions. This change is approved as it does not impact the functionality or logic of the tests.
langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusMetadataFilterMapper.java (2)
  • 17-43: The map method effectively handles different Filter types by delegating to specific mapping methods. This approach maintains readability and modularity. However, consider adding support for custom or future Filter types through a more extensible mechanism, such as a registry or factory pattern, to avoid modifying this method directly for each new type.
  • 41-41: When encountering an unsupported Filter type, the method throws an UnsupportedOperationException. This is appropriate for signaling to the caller that the operation cannot be completed as requested. However, ensure that upstream code properly handles this exception to avoid unexpected crashes or behavior.
langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreIT.java (1)
  • 42-117: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [21-81]

The integration test should_add_embedding_with_segment_with_metadata correctly tests the addition of embeddings with various metadata types and the retrieval of these embeddings using the new search API. It's comprehensive in testing different metadata value types. However, consider adding negative test cases or edge cases, such as adding embeddings with invalid metadata values or types, to ensure robust error handling.

langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/CollectionOperationsExecutor.java (2)
  • 60-63: The addition of a new field type for storing metadata as JSON is a significant enhancement, allowing for more flexible and rich metadata associated with embeddings. Ensure that the JSON data structure and usage are well-documented, and consider adding validation or sanitization to prevent issues related to malformed JSON data.
  • 75-79: The dropCollection method is correctly implemented for dropping a collection in Milvus. It's good practice to provide such utility methods to manage the lifecycle of collections, especially in testing or dynamic environments. Ensure that this method is used cautiously in production environments to prevent accidental data loss.
langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/Mapper.java (3)
  • 42-45: The addition of toMetadataJsons method enhances the flexibility in handling metadata associated with text segments. This method correctly converts metadata to JSON objects, which can be useful for various operations. Ensure that the metadata conversion handles all expected data types correctly and consider adding error handling for unexpected types.
  • 89-99: The method toTextSegment correctly constructs a TextSegment object from a RowRecord, including handling of metadata. This method improves the modularity and readability of the code by encapsulating the conversion logic. Ensure that null checks and error handling are sufficient to prevent issues with incomplete or malformed data.
  • 104-113: The toMetadata method converts a JSONObject to a Metadata object, handling the conversion of BigDecimal to double. This is a practical approach to dealing with numeric types in JSON. However, be aware of potential precision loss when converting from BigDecimal to double and document this behavior if necessary.
embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderIT.java (4)
  • 34-48: The test should_filter_by_genre correctly verifies that the LanguageModelSqlFilterBuilder can interpret a natural language query and produce the appropriate filter for a specific genre. This test demonstrates the builder's basic functionality. Consider adding more complex queries to test the builder's ability to handle various natural language structures and intents.
  • 50-67: The test should_filter_by_genre_and_year extends the testing to include both genre and year filters based on a natural language query. This test is well-constructed and demonstrates the builder's capability to handle composite filters. Ensure that edge cases, such as ambiguous or conflicting information in the query, are also tested.
  • 69-82: The test should_filter_by_year_range effectively tests the builder's ability to interpret a query specifying a range of years. This test showcases the builder's flexibility in understanding and translating natural language queries into precise filter expressions. Consider testing additional range expressions and edge cases, such as open-ended ranges or invalid ranges.
  • 85-98: The test should_filter_by_year_using_arithmetics demonstrates the builder's capability to interpret arithmetic expressions within natural language queries, such as calculating the previous year. This test is a good example of the builder's advanced functionality. Ensure that more complex arithmetic expressions and their potential impact on filter accuracy are also tested.
langchain4j-core/src/main/java/dev/langchain4j/rag/query/router/LanguageModelQueryRouter.java (2)
  • 56-60: The change from private to protected access modifiers for the fields chatLanguageModel, promptTemplate, options, idToRetriever, and fallbackStrategy allows subclasses to directly access these fields. This is a positive change for extensibility but consider encapsulating the fields with protected getter methods to maintain control over the access to these fields. This approach can help in maintaining the integrity of the data and provide a single point of modification in the future if needed.
  • 108-108: Changing the visibility of the fallback method from private to protected enables subclasses to override this method, which is a good practice for enhancing extensibility and allowing for customized behavior in subclasses. Ensure that the documentation is updated to guide developers on how to properly override this method and what the expected behavior should be.
langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchMetadataFilterMapper.java (1)
  • 18-151: The implementation of ElasticsearchMetadataFilterMapper provides a comprehensive mapping from the Filter API to Elasticsearch queries. This is crucial for integrating the new filtering capabilities with Elasticsearch. A few points to consider:
  • Ensure that all comparison values are properly sanitized and converted to the appropriate Elasticsearch data types to prevent any potential query injection vulnerabilities.
  • The use of metadata. prefix in field names (e.g., lines 64, 71, 78, 85) assumes a specific document structure. Make sure this aligns with the expected structure of documents stored in Elasticsearch and document this assumption for future reference.
  • The method formatKey (lines 137-151) smartly handles the distinction between keyword and non-keyword fields. This is important for correct query behavior, especially for string fields.

Overall, this mapper is a key component for enabling advanced filtering capabilities in Elasticsearch-backed embedding stores. Just ensure that the assumptions made (e.g., document structure, field naming conventions) are well-documented and validated against the actual Elasticsearch schema used in the application.

langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStore.java (1)
  • 96-122: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [99-129]

The implementation of the search method in InMemoryEmbeddingStore correctly utilizes the EmbeddingSearchRequest to perform a search with filtering and scoring. A few points to consider:

  • The filtering logic (lines 108-113) is currently limited to instances of TextSegment. This approach is mentioned in the previous comments and discussions. However, consider future-proofing this by abstracting the metadata retrieval into a method or interface that can be implemented by different types of embedded objects. This would make the filtering capability more flexible and applicable to a wider range of use cases.
  • The scoring and ranking logic (lines 115-123) is straightforward and effective for an in-memory store. However, for larger datasets, consider optimizing this process to improve performance, such as pre-filtering entries that cannot meet the minScore criterion before calculating the cosine similarity.
  • The conversion of the priority queue to a list and then reversing the list (lines 126-129) is necessary to return the results in descending order of scores. This is correct but consider adding a comment to explain this step for future maintainers.

Overall, the search method is well-implemented and addresses the requirements of the new Filter API. Just ensure that the filtering capability is as flexible and extensible as possible to accommodate future enhancements.

langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java (2)
  • 58-74: The introduction of the search method in the EmbeddingStore interface, marked as @Experimental, is a significant enhancement that aligns with the PR's objectives to provide advanced filtering and searching capabilities. This method provides a more comprehensive and flexible approach to searching embeddings by encapsulating all search criteria within the EmbeddingSearchRequest. The default implementation that leverages the deprecated findRelevant method is a good transitional solution. However, it's important to encourage implementers of the EmbeddingStore interface to provide their own optimized implementations of the search method that fully utilize the capabilities of their underlying storage systems.
  • 99-114: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [86-111]

The deprecation of the findRelevant methods in favor of the new search method is a clear indication of the direction for future development and usage of the EmbeddingStore interface. While the default implementations provided ensure backward compatibility, it's crucial to document the expected transition path for users of the EmbeddingStore interface. This includes guidance on how to switch to the new search method and the benefits of doing so, such as enhanced filtering capabilities and more precise control over search parameters. Additionally, consider providing examples or best practices for implementing the search method in custom EmbeddingStore implementations.

langchain4j-core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java (7)
  • 9-12: The addition of new imports indicates that the class now utilizes EmbeddingSearchRequest, EmbeddingSearchResult, EmbeddingStore, and Filter classes/interfaces. This change aligns with the PR's objective to enhance the EmbeddingStoreContentRetriever with dynamic filtering capabilities. Ensure that these new dependencies are properly managed in terms of versioning and compatibility within the project.
  • 16-16: The import of java.util.function.Function is crucial for the implementation of dynamic parameters (maxResults, minScore, and filter). This approach leverages Java's functional programming capabilities to provide flexibility in configuring the retriever based on the query context. It's a good use of Java's standard library to achieve the PR's objectives.
  • 53-55: The introduction of DEFAULT_MAX_RESULTS, DEFAULT_MIN_SCORE, and DEFAULT_METADATA_FILTER as static final functions provides sensible defaults for the dynamic parameters. This is a good design choice, ensuring that the retriever has a baseline behavior even if no dynamic configurations are provided. It also simplifies the constructors' logic by providing a fallback option.
  • 60-62: The use of Function<Query, Integer>, Function<Query, Double>, and Function<Query, Filter> for maxResultsProvider, minScoreProvider, and filterProvider respectively, is an excellent way to achieve dynamic parameter configuration based on the query. This design allows for a high degree of flexibility and customization in content retrieval, aligning with the PR's objectives.
  • 100-111: The builder pattern used here, along with the private constructor that accepts functions for dynamic parameter configuration, is a clean and effective way to instantiate EmbeddingStoreContentRetriever objects with customizable behavior. This approach enhances the class's usability and flexibility. However, ensure that the @Builder annotation is correctly processed and that all necessary builder methods are generated as expected, especially for setting dynamic parameters.
  • 115-134: The builder methods for setting maxResults, minScore, and metadataFilter dynamically are well-implemented. They provide a user-friendly way to configure these parameters while ensuring input validation (e.g., ensuring maxResults is greater than zero and minScore is within a valid range). This contributes to the robustness and usability of the API. However, consider adding null checks or documentation to clarify the behavior when null values are passed to these methods.
  • 140-151: The retrieve method's implementation effectively utilizes the dynamic parameters to configure the EmbeddingSearchRequest based on the provided Query. This is a key part of achieving the PR's objectives, allowing for flexible and context-aware content retrieval. The method's logic is clear and concise, leveraging Java 8 streams for processing the search results. Ensure that the EmbeddingModel and EmbeddingStore interactions are thoroughly tested, especially with various dynamic configurations.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java (2)
  • 43-43: The use of @Experimental annotation is appropriate here, indicating that the Filter interface is a new feature and may be subject to changes. However, it's important to ensure that there is documentation or communication around what "experimental" means in this context to manage user expectations.
  • 78-80: The MetadataKey constructor uses ensureNotBlank to validate the key parameter. This is a good practice for ensuring that keys are not null or empty. However, it's also important to document the expected format or constraints of the key for users.
langchain4j-core/src/test/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetrieverTest.java (11)
  • 38-54: The use of @BeforeEach and @AfterEach annotations for setting up and tearing down test environments is a good practice. It ensures that each test runs in a clean state, which is crucial for reliable test results.
  • 56-71: The test should_retrieve correctly verifies the basic retrieval functionality. It's important to ensure that all possible edge cases are covered, such as empty results or errors during the search process.
// Consider adding tests for edge cases, such as empty results or errors during the search process.
  • 73-92: The test should_retrieve_builder demonstrates the use of the builder pattern for creating an EmbeddingStoreContentRetriever. This pattern enhances readability and maintainability. Ensure that the builder properly initializes all required fields.
  • 94-114: The test should_retrieve_with_custom_maxResults verifies the functionality of custom maximum results. It's crucial to test boundary conditions, such as setting the maximum results to zero or a negative number, to ensure robust error handling.
// Test boundary conditions for custom maximum results, such as zero or negative numbers.
  • 116-136: The test should_retrieve_with_custom_maxResults_builder checks the custom maximum results functionality using the builder pattern. Similar to the previous comment, testing boundary conditions for maximum results is important for comprehensive coverage.
// Ensure comprehensive testing of boundary conditions for custom maximum results using the builder pattern.
  • 138-158: The test should_retrieve_with_custom_dynamicMaxResults_builder introduces dynamic maximum results based on the query. This feature adds flexibility but also complexity. Ensure that the dynamic logic is thoroughly tested, including with null or invalid queries.
// Thoroughly test the dynamic maximum results logic, including scenarios with null or invalid queries.
  • 160-181: The test should_retrieve_with_custom_minScore_ctor verifies custom minimum score functionality. Testing for edge cases, such as extremely high or low scores, is essential to ensure that the system behaves as expected under all conditions.
// Test edge cases for custom minimum scores, such as extremely high or low values.
  • 183-203: The test should_retrieve_with_custom_minScore_builder checks the custom minimum score functionality using the builder pattern. As with maximum results, it's important to test boundary conditions for minimum scores to ensure robustness.
// Test boundary conditions for custom minimum scores using the builder pattern.
  • 205-225: The test should_retrieve_with_custom_dynamicMinScore_builder explores dynamic minimum score functionality. This adds a layer of complexity, so testing with various dynamic logic scenarios, including edge cases, is crucial for reliability.
// Test various scenarios and edge cases for dynamic minimum score functionality to ensure reliability.
  • 227-250: The test should_retrieve_with_custom_metadataFilter verifies the functionality of using a metadata filter during retrieval. It's important to test with various filter types and combinations to ensure that filtering works as expected in all cases.
// Test with various filter types and combinations to ensure comprehensive coverage of metadata filtering functionality.
  • 252-275: The test should_retrieve_with_custom_dynamicMetadataFilter introduces dynamic metadata filtering based on the query. This feature increases flexibility but also adds complexity. Thorough testing, including with complex filter expressions, is essential for ensuring correct behavior.
// Thoroughly test dynamic metadata filtering, including with complex filter expressions, to ensure correct behavior.
langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java (2)
  • 37-41: The updated class documentation clearly explains the new functionalities related to managing Milvus instances, storing Metadata, and filtering. It's important to ensure that the documentation remains up-to-date as the implementation evolves.
  • 127-154: The refactored search method now uses EmbeddingSearchRequest and EmbeddingSearchResult, aligning with the new filtering functionality. It's important to verify that the method correctly handles all possible filter expressions and edge cases.
// Verify that the search method correctly handles all possible filter expressions and edge cases.
SearchParam searchParam = buildSearchRequest(...);
langchain4j-redis/src/main/java/dev/langchain4j/store/embedding/redis/RedisEmbeddingStore.java (4)
  • 50-50: Renaming metadataFieldsName to metadataKeys in the constructor parameter improves clarity and consistency with the internal representation of metadata within the RedisEmbeddingStore. This change aligns with the common terminology used in key-value stores and databases, making the codebase more intuitive for new developers and contributors.
  • 67-67: Using metadataKeys directly in the RedisSchema builder pattern is a straightforward and clean approach to passing metadata keys from the RedisEmbeddingStore to the RedisSchema. This ensures that the schema is aware of which metadata keys are important for indexing and retrieval operations, facilitating efficient data management and query execution.
  • 116-117: The addition of schema.metadataKeys() to the list of return fields in the findRelevant method ensures that all relevant metadata keys are included in the search results. This is crucial for applications that rely on metadata for filtering or post-processing of search results. It demonstrates a good understanding of the feature requirements and ensures that the API returns comprehensive information to the caller.
  • 285-288: The deprecation of metadataFieldsName in favor of metadataKeys within the Builder class is a good practice, as it signals to the users of the API that they should transition to the new naming convention. It's important to ensure that the deprecation is communicated clearly in the project's documentation and migration guides to facilitate a smooth transition for existing users.
embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilder.java (3)
  • 85-85: Marking the LanguageModelSqlFilterBuilder class as @Experimental is a good practice, as it clearly communicates to the users of the library that this feature is in an experimental stage and may undergo significant changes in the future. This tag encourages feedback and contributions while setting the right expectations regarding stability and support.
  • 90-106: The default PromptTemplate provided is well-documented and clearly outlines the task for the language model, including instructions, input, and expected response format. This detailed prompt design is crucial for achieving high-quality SQL generation from the language model. However, it's important to continuously evaluate and refine the prompt based on real-world usage and feedback to ensure it remains effective across a wide range of queries and database schemas.
  • 124-128: The constructor of LanguageModelSqlFilterBuilder properly ensures that both chatLanguageModel and tableDefinition are not null, which is crucial for the operation of this class. The use of getOrDefault for promptTemplate and sqlFilterParser provides sensible defaults while allowing for customization. This design promotes flexibility and robustness in the implementation.
langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithoutMetadataIT.java (8)
  • 56-59: The addition of tests using the new search API with EmbeddingSearchRequest parameters is a good practice, as it ensures that the new functionality is covered by automated tests. This helps in validating the correctness of the implementation and in catching regressions early. It's important to continue expanding the test coverage to include various edge cases and usage scenarios to ensure the robustness of the new API.
  • 82-85: Similar to the previous comment, the use of the new search API in this test case is commendable. It demonstrates the API's flexibility in handling different types of embedding additions, including those with specified IDs. Ensuring that the test assertions are comprehensive and reflect the expected behavior of the API is key to maintaining high-quality code.
  • 109-112: Again, testing the new search API with embeddings associated with text segments is crucial for verifying that the API correctly handles and returns relevant matches for complex queries involving both embeddings and their associated segments. This test case adds valuable coverage for scenarios where additional context (in the form of text segments) is provided alongside embeddings.
  • 149-152: Testing the search API with multiple embeddings ensures that the API can handle batch operations and return accurate results for queries involving multiple embeddings. This test case is essential for validating the API's performance and correctness in more demanding scenarios, such as bulk data processing or complex search queries.
  • 195-198: The use of the search API in testing scenarios with multiple embeddings and associated text segments further demonstrates the API's versatility. This test case is particularly important for ensuring that the API can accurately handle and return results for queries that involve a mix of embeddings and their contextual information, providing a comprehensive test coverage for the new functionality.
  • 226-229: Testing the search API with a minScore parameter is crucial for ensuring that the API correctly filters results based on the specified minimum score. This test case adds important coverage for scenarios where precision in the search results is critical, such as filtering out low-relevance matches. It's important to include tests for various minScore values to thoroughly evaluate this aspect of the API.
  • 269-273: The addition of a test case using the search API with a minScore parameter that filters out all but the highest-scoring matches is an excellent way to validate the API's filtering capabilities. This test ensures that the API can accurately apply score-based filtering to return only the most relevant results, which is essential for applications requiring high precision in search results.
  • 298-301: Testing the accuracy of the score returned by the search API is fundamental to ensuring the reliability of the search functionality. This test case validates that the scores calculated by the API are consistent with the expected relevance of the matches, which is crucial for applications that rely on the scoring mechanism to rank or filter search results.
langchain4j-core/src/test/java/dev/langchain4j/store/embedding/filter/FilterTest.java (14)
  • 15-34: The tests for test_in_string method are comprehensive, covering various scenarios including single values, lists, and type mismatches. However, consider adding a test case to ensure that the in operation behaves correctly when the metadata contains multiple keys, to verify that the filter correctly isolates and evaluates the specified key.
  • 36-55: The test_not_in_string method correctly tests the nin operation for strings. It's good to see both positive and negative cases, as well as type mismatch handling. Similar to the previous comment, adding a test case to ensure correct behavior when the metadata contains multiple keys would be beneficial.
  • 57-79: In test_in_integer, you've effectively tested various scenarios including type conversions and mismatches. It's particularly good to see tests for conversions from long, float, and double to integer. This ensures that the filter operation is robust across different numeric types. No additional comments here.
  • 81-103: The test_not_in_integer method follows a similar pattern to test_in_integer, covering a wide range of cases. The inclusion of tests for type mismatches is commendable. This method is well-tested.
  • 105-127: For test_in_long, the tests are thorough, covering conversions from integer, float, and double to long. This ensures that the filter operation can handle different numeric types seamlessly. The method is well-tested.
  • 129-151: The test_not_in_long method is comprehensive, similar to the test_in_long method. It effectively tests the nin operation for long values, including type conversions and mismatches. This method is well-tested.
  • 153-175: In test_in_float, the tests cover conversions from integer, long, and double to float. This ensures that the filter operation can handle different numeric types seamlessly. The method is well-tested.
  • 177-199: The test_not_in_float method follows a similar pattern to test_in_float, covering a wide range of cases. The inclusion of tests for type mismatches is commendable. This method is well-tested.
  • 201-223: For test_in_double, the tests are thorough, covering conversions from integer, long, and float to double. This ensures that the filter operation can handle different numeric types seamlessly. The method is well-tested.
  • 225-247: The test_not_in_double method is comprehensive, similar to the test_in_double method. It effectively tests the nin operation for double values, including type conversions and mismatches. This method is well-tested.
  • 250-258: The test_in_empty_list method correctly throws an IllegalArgumentException when an empty list is passed as a comparison value. This is a good practice to ensure that filters are not inadvertently applied with empty criteria. No additional comments here.
  • 260-269: Similarly, the test_in_list_with_null method ensures that null values within a list of comparison values result in an IllegalArgumentException. This maintains the integrity of the filter criteria. No additional comments here.
  • 271-280: The test_not_in_empty_list method follows the same pattern as test_in_empty_list, correctly handling empty lists for the nin operation. This consistency in handling empty lists across different filter operations is commendable.
  • 282-291: The test_not_in_list_with_null method ensures that null values within a list of comparison values for the nin operation result in an IllegalArgumentException, similar to the in operation. This consistency is good practice.
langchain4j-core/src/test/java/dev/langchain4j/data/document/MetadataTest.java (12)
  • 5-7: The addition of parameterized tests using @ParameterizedTest with @NullSource and @ValueSource is a good practice for testing edge cases more efficiently. This approach helps to ensure that the Metadata class behaves correctly under various input conditions.
  • 119-126: The test_asMap method correctly verifies that the asMap function returns a map representation of the metadata. This test ensures that the conversion is accurate and maintains the integrity of the data.
  • 128-135: The test_create_from_map method effectively tests the creation of Metadata from a map with a single key-value pair. It's good to see that the test verifies the correct retrieval of the value. Consider adding more complex map scenarios to ensure robustness.
  • 138-206: The should_create_from_map method is comprehensive, testing the creation of Metadata from a map with various data types. It's commendable that the tests cover a wide range of type conversions and ensure that the Metadata class can handle different types correctly. This thorough testing is crucial for maintaining the integrity and flexibility of the Metadata API.
  • 209-225: The parameterized test should_fail_to_create_from_map_when_key_is_null correctly handles edge cases for null and blank keys. Using @ParameterizedTest with @NullSource and @ValueSource is an efficient way to test these scenarios. This ensures that the Metadata class enforces key constraints properly.
  • 228-242: The should_fail_to_create_from_map_when_value_is_null method tests the important constraint that metadata values cannot be null. This test ensures that the Metadata class maintains data integrity by preventing null values.
  • 245-265: The should_fail_to_create_from_map_when_value_is_of_unsupported_type method tests the Metadata class's ability to reject unsupported value types. This is crucial for maintaining type safety and ensuring that only supported types are used within the metadata. The detailed error message provided in the assertion is helpful for debugging.
  • 268-291: The should_get_typed_values method tests the retrieval of typed values from the metadata. This ensures that the Metadata class can correctly handle and convert stored values to their respective types. It's good to see tests for null cases as well.
  • 294-319: The should_fail_when_adding_null_key method correctly tests the rejection of null keys for various value types. This ensures that the Metadata class enforces key constraints properly across different value types. The detailed error messages are helpful for understanding the constraints.
  • 322-331: The should_fail_when_adding_null_value method tests the important constraint that metadata values cannot be null. This ensures that the Metadata class maintains data integrity by preventing null values. This consistency in handling null values is commendable.
  • 334-350: The should_convert_to_map method tests the conversion of Metadata to a map. This ensures that the conversion maintains the integrity and accuracy of the data. It's good to see that the test verifies the equality of the original map and the converted map.
  • 353-357: The test_containsKey method effectively tests the containsKey function of the Metadata class. This ensures that the class can accurately report the presence or absence of keys. No additional comments here.
langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStore.java (5)
  • 27-30: The introduction of EmbeddingSearchRequest, EmbeddingSearchResult, and Filter classes is a positive change, aligning with the PR objectives to enhance metadata-based filtering capabilities in similarity searches. It's crucial to ensure these classes are well-documented and their usage is clear to the developers.
  • 57-59: The updated class documentation clearly explains the new capabilities of storing Metadata and filtering by it using Filter provided inside EmbeddingSearchRequest. This is a valuable addition to the documentation, helping developers understand the purpose and usage of the class.
  • 272-294: The buildScriptScoreQuery method constructs the query with optional metadata filtering, which is a key part of the new functionality. The method is well-structured and follows the PR objectives. However, there's a potential improvement in handling the null filter case more explicitly. Currently, it defaults to a matchAll query, which might not be clear to all developers. Adding a comment or documentation about this behavior could enhance code readability and maintainability.

Consider adding a comment explaining the default behavior when filter is null.

  • 352-352: Directly accessing metadata for indexing in the bulk method aligns with the PR objectives to enhance metadata support. Ensure that the metadata conversion to a map (toMap()) is efficient and handles all supported types correctly. This change is crucial for the functionality and should be thoroughly tested.
  • 370-370: The toMatches method conversion from SearchResponse<Document> to List<EmbeddingMatch<TextSegment>> is implemented correctly. It's essential to ensure that the conversion logic correctly handles null values and metadata conversion. This method is a critical part of the search functionality, and its correctness directly impacts the usability of the search results.
langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java (8)
  • 3-27: The expansion of the Metadata class to support additional value types (String, Integer, Long, Float, Double) is a significant improvement, aligning with the PR objectives. It's important to ensure that the documentation is updated to reflect these changes and guide developers on how to use the new functionality effectively.
  • 31-47: The introduction of SUPPORTED_VALUE_TYPES as a Set to track supported value types is a good practice, ensuring type safety and making it easier to manage supported types. However, consider using wrapper classes only (e.g., Integer instead of int) for consistency, as autoboxing will handle conversions, and it simplifies the type checking logic.

Consider removing primitive class types from SUPPORTED_VALUE_TYPES to simplify the type checking logic.

  • 88-96: The get method is marked for deprecation, which is a good decision given the introduction of type-specific getters. However, it's important to communicate this change clearly in the documentation and provide guidance on migrating to the new methods.
  • 105-118: The introduction of type-specific getters (e.g., getString, getInteger, getLong, getFloat, getDouble) is a significant improvement for type safety and usability. It's crucial to ensure these methods are well-documented and include examples of their usage. Additionally, consider adding unit tests to cover edge cases and type conversion scenarios.
  • 270-273: The add method is marked for deprecation in favor of the new put methods, which is a positive change for clarity and consistency. Ensure that the migration path is clearly documented for developers.
  • 296-356: The new put methods for adding metadata with specific value types are well-implemented, enhancing the class's usability and type safety. It's important to ensure that these methods are accompanied by comprehensive unit tests to validate their behavior, especially for edge cases and type conversions.
  • 384-391: The asMap method is marked for deprecation, which aligns with the introduction of toMap that returns a map with object values. This change is consistent with the expansion of supported value types. Ensure that the deprecation is clearly communicated and that developers are guided towards using toMap for future compatibility.
  • 398-399: The toMap method provides a way to access the metadata as a map of key-value pairs with object values, reflecting the expanded support for different value types. This method is crucial for interoperability and should be thoroughly tested to ensure it handles all supported types correctly.
langchain4j/src/test/java/dev/langchain4j/service/AiServicesWithRagIT.java (5)
  • 34-36: The addition of new imports for Filter, LanguageModelSqlFilterBuilder, and TableDefinition is appropriate given the context of the PR, which introduces a new Filter API for the EmbeddingStore. These imports are necessary for the implementation of the test cases that demonstrate the usage of the new API.
  • 48-48: The introduction of the Function interface from java.util.function is a good practice, as it is used to define dynamic metadata filters in the test methods. This usage aligns with modern Java practices for functional programming and enhances the readability and flexibility of the code.
  • 429-467: This test method, should_use_LLM_generated_metadataFilter, demonstrates the use of a dynamically generated metadata filter using a language model. The test setup is clear and concise, effectively showcasing how to use the LanguageModelSqlFilterBuilder to create a filter based on SQL-like expressions. The test asserts that the correct movie recommendation is returned based on the filter criteria, which is a practical demonstration of the new feature's capabilities.

However, it's important to ensure that the LanguageModelSqlFilterBuilder and the TableDefinition are thoroughly tested in their own unit tests to guarantee their correctness and robustness in various scenarios.

  • 474-509: The test method should_use_dynamicMetadataFilter_by_user_id effectively demonstrates the use of a dynamic metadata filter based on user IDs. The use of a lambda function to create the filter condition dynamically is a good example of leveraging Java's functional programming capabilities. This test method clearly shows how the new filtering functionality can be applied in a personalized context, which is valuable for developers looking to implement similar features.

It's crucial to ensure that edge cases and error handling are covered in additional tests, especially when dealing with dynamic filters that might be subject to varying input conditions.

  • 512-541: The test method should_use_static_metadataFilter demonstrates the use of a static metadata filter to retrieve specific content from the embedding store. The setup and assertion are straightforward, effectively showing how to apply a simple filter condition. This test method complements the dynamic filter tests by showcasing another aspect of the new filtering functionality.

As with the dynamic filters, it's important to cover a range of scenarios and edge cases in separate unit tests to ensure the robustness of the static filtering functionality.

langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java (3)
  • 21-24: The class-level documentation clearly explains the purpose of EmbeddingStoreWithFilteringIT. It's well-structured to serve as a base for testing any EmbeddingStore implementation that supports filtering with the Filter API. This approach promotes code reuse and ensures consistency across different implementations.
  • 26-73: The test method should_filter_by_metadata is well-structured and covers a broad range of filter operations. It tests the functionality of filtering embeddings by metadata, ensuring that only embeddings with matching metadata are returned. The use of @ParameterizedTest and @MethodSource for data-driven testing is appropriate and enhances test coverage. However, consider adding more detailed comments within the test body to explain the rationale behind specific assertions, especially for complex filter conditions.
  • 1135-1178: Similar to should_filter_by_metadata, the should_filter_by_metadata_not method effectively tests the negation of filter conditions. It's crucial for ensuring that the Filter API correctly handles not operations. Like the previous method, adding comments to explain complex assertions or test setups would improve readability and maintainability.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 466dab4 and 52c74e1.
Files selected for processing (25)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParser.java (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderIT.java (1 hunks)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParserTest.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchRequest.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java (6 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/MetadataFilterBuilder.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsEqualTo.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsGreaterThan.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsGreaterThanOrEqualTo.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsIn.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsLessThan.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsLessThanOrEqualTo.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotEqualTo.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotIn.java (1 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetrieverTest.java (1 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java (1 hunks)
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/filter/FilterTest.java (1 hunks)
  • langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStore.java (6 hunks)
  • langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchMetadataFilterMapper.java (1 hunks)
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java (5 hunks)
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusMetadataFilterMapper.java (1 hunks)
  • langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStore.java (3 hunks)
  • langchain4j/src/test/java/dev/langchain4j/service/AiServicesWithRagIT.java (3 hunks)
Files not summarized due to errors (2)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParserTest.java: Error: Message exceeds token limit
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java: Error: Message exceeds token limit
Files not reviewed due to errors (1)
  • SqlFilterParserTest.java (no review received)
Files skipped from review as they are similar to previous changes (12)
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParser.java
  • embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderIT.java
  • langchain4j-core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchRequest.java
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java
  • langchain4j-core/src/test/java/dev/langchain4j/store/embedding/filter/FilterTest.java
  • langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStore.java
  • langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchMetadataFilterMapper.java
  • langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusMetadataFilterMapper.java
  • langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStore.java
  • langchain4j/src/test/java/dev/langchain4j/service/AiServicesWithRagIT.java
Additional comments: 37
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsEqualTo.java (2)
  • 20-22: The constructor correctly validates the key and comparisonValue parameters to ensure they are not blank or null, respectively. This is crucial for avoiding runtime errors due to invalid inputs.
  • 33-52: The test method implements the logic for checking equality between the comparisonValue and the value retrieved from the Metadata object. It correctly handles type compatibility and uses NumberComparator.compareAsBigDecimals for comparing numeric values, which is a good practice for handling potential issues with floating-point comparisons.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotEqualTo.java (2)
  • 20-22: The constructor in IsNotEqualTo performs necessary validation on the key and comparisonValue, ensuring they are not blank or null. This validation is essential for maintaining data integrity and preventing runtime errors.
  • 33-52: The test method in IsNotEqualTo correctly implements the logic for inequality checks. It uses NumberComparator.compareAsBigDecimals for numeric comparisons, which is a robust approach for handling floating-point numbers. The method also correctly returns true when the Metadata object does not contain the specified key, which aligns with the expected behavior for an inequality check.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsLessThan.java (2)
  • 20-22: The constructor in IsLessThan ensures that the key and comparisonValue are valid, using ensureNotBlank and ensureNotNull. This validation is crucial for preventing null pointer exceptions and ensuring that the filter operates on valid data.
  • 33-52: The test method in IsLessThan correctly implements the comparison logic, including type compatibility checks and numeric comparison using compareAsBigDecimals. This approach is effective for handling both exact and floating-point numeric comparisons, ensuring accurate results across different numeric types.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsGreaterThan.java (2)
  • 20-22: The constructor in IsGreaterThan performs necessary input validation, ensuring that neither the key nor the comparisonValue are invalid. This is important for preventing errors during the filter's operation.
  • 33-52: The test method in IsGreaterThan correctly implements the greater-than comparison logic. It includes comprehensive type checks and uses compareAsBigDecimals for numeric comparisons, which is a robust approach for ensuring accurate comparisons across different numeric types.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsLessThanOrEqualTo.java (2)
  • 20-22: The constructor in IsLessThanOrEqualTo ensures that the inputs are valid, using ensureNotBlank and ensureNotNull. This validation is essential for the filter's reliability and correctness.
  • 33-52: The test method in IsLessThanOrEqualTo implements the comparison logic effectively, including type compatibility checks and numeric comparison using compareAsBigDecimals. This method ensures accurate and consistent comparison results, which is crucial for the filter's functionality.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsGreaterThanOrEqualTo.java (2)
  • 20-22: The constructor in IsGreaterThanOrEqualTo correctly validates the inputs, ensuring that the key and comparisonValue are not invalid. This validation is crucial for the filter's operation and prevents potential runtime errors.
  • 33-52: The test method in IsGreaterThanOrEqualTo effectively implements the comparison logic, including type compatibility checks and numeric comparison using compareAsBigDecimals. This approach ensures accurate comparisons across different numeric types, which is essential for the filter's functionality.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsIn.java (2)
  • 24-28: The constructor in IsIn performs necessary input validation and creates an unmodifiable set from the provided collection of comparison values. This ensures immutability and thread safety of the comparisonValues field. Additionally, it validates that none of the comparison values are null, which is important for preventing null pointer exceptions during the filter's operation.
  • 39-58: The test method in IsIn correctly implements the logic for checking if a value is within the specified set of values. It includes type compatibility checks and handles numeric comparisons using containsAsBigDecimals. This approach ensures accurate and efficient containment checks, which is crucial for the filter's functionality.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotIn.java (2)
  • 24-28: The constructor in IsNotIn ensures that the inputs are valid and creates an unmodifiable set from the provided collection of comparison values. This approach guarantees the immutability of the comparisonValues field and validates that none of the comparison values are null, which is essential for the filter's reliability.
  • 39-58: The test method in IsNotIn effectively implements the logic for checking if a value is not within the specified set of values. It includes type compatibility checks and handles numeric comparisons using containsAsBigDecimals. This method ensures accurate and efficient containment checks, which is crucial for the filter's functionality.
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/MetadataFilterBuilder.java (2)
  • 24-25: The constructor of MetadataFilterBuilder correctly validates the key parameter to ensure it is not blank. This validation is essential for preventing errors during the construction of filters.
  • 36-218: The methods in MetadataFilterBuilder provide a comprehensive and fluent API for constructing various types of filters, including equality, inequality, greater/less than, and containment checks. Each method correctly instantiates the appropriate filter class with validated inputs. This design promotes ease of use and flexibility in constructing complex filter expressions.
langchain4j-core/src/test/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetrieverTest.java (12)
  • 38-48: The @BeforeEach method correctly sets up the necessary mocks and expectations for the tests. It's good practice to isolate test setup to ensure each test runs with a clean state.
  • 50-54: The @AfterEach method ensures that the EMBEDDING_MODEL mock is used as expected and that there are no unexpected interactions. This is a good practice for verifying mock behavior and ensuring test cleanliness.
  • 56-71: This test method should_retrieve correctly verifies the basic retrieval functionality using default configurations. It's well-structured and follows best practices for unit testing.
  • 79-92: The use of the builder pattern in should_retrieve_builder test method is a good practice, allowing for more readable and maintainable test setup. This test method correctly asserts the behavior of the retrieval process using the builder pattern.
  • 100-114: The test should_retrieve_with_custom_maxResults effectively demonstrates the functionality of customizing the maximum results. It's important to test such configurations to ensure the system behaves as expected under different settings.
  • 122-136: Similar to the previous comment, should_retrieve_with_custom_maxResults_builder confirms that the builder pattern supports custom maximum results. This consistency in testing both direct instantiation and builder pattern usage is commendable.
  • 140-158: Testing dynamic configuration with should_retrieve_with_custom_dynamicMaxResults_builder is crucial for ensuring that the system can adapt to runtime conditions. This test method correctly sets up and verifies dynamic max results functionality.
  • 161-181: The should_retrieve_with_custom_minScore_ctor test method correctly demonstrates the functionality of customizing the minimum score through the constructor. Testing various ways to configure the retriever is important for flexibility and usability.
  • 188-203: Using the builder pattern to customize the minimum score in should_retrieve_with_custom_minScore_builder is another example of thorough testing for different configurations. This ensures that the builder pattern is fully functional and supports all necessary options.
  • 206-225: Dynamic minimum score configuration tested in should_retrieve_with_custom_dynamicMinScore_builder is essential for scenarios where the score threshold might need to be adjusted based on the query. This test method effectively verifies this functionality.
  • 228-250: Testing the retrieval with a custom filter in should_retrieve_with_custom_filter is crucial for ensuring that the system can correctly apply metadata-based filtering. This test method properly sets up and verifies the filter functionality.
  • 253-275: The should_retrieve_with_custom_dynamicFilter test method extends the filtering capabilities by testing dynamic filter creation based on the query. This is an important feature for advanced use cases, and the test method correctly verifies its behavior.
langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java (3)
  • 15-23: The addition of imports for Metadata, EmbeddingSearchResult, EmbeddingSearchRequest, and Filter aligns with the PR's objectives to enhance the EmbeddingStore API and introduce filtering capabilities. Ensure that these new dependencies are used effectively throughout the implementation and that there are no unused imports to maintain code cleanliness.
  • 37-41: The updated class documentation clearly communicates the enhanced capabilities of the MilvusEmbeddingStore, including support for managing Milvus instances, storing Metadata, and filtering by it. This documentation update is crucial for developers to understand the purpose and capabilities of the class. It's well-written and informative.
  • 47-47: The introduction of the METADATA_FIELD_NAME constant is a good practice for maintaining consistency and avoiding hard-coded strings throughout the code. This change supports the expanded Metadata API and the storage of metadata alongside embeddings. Ensure that this constant is used consistently wherever the metadata field name is required.
langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java (4)
  • 24-24: The class is well-structured as an abstract integration test suite for EmbeddingStore implementations, ensuring that extending classes can leverage these tests to verify their filtering capabilities. This approach promotes code reuse and consistency across different EmbeddingStore implementations.
  • 26-73: The should_filter_by_metadata method correctly sets up test data, performs searches with and without filters, and asserts the expected results. The use of awaitUntilPersisted() is appropriate for handling asynchronous behavior or eventual consistency in the EmbeddingStore. This method effectively tests the filtering capabilities of the EmbeddingStore API.
  • 1135-1178: The should_filter_by_metadata_not method follows a similar pattern to should_filter_by_metadata, testing the negation of filters. It properly sets up test data, executes searches, and asserts the outcomes, including handling asynchronous behavior with awaitUntilPersisted(). This method ensures that the Filter API's negation logic works as expected.
  • 75-1132: The parameterized test data provided by the MethodSource methods should_filter_by_metadata and should_filter_by_metadata_not is comprehensive and well-organized. It covers a wide range of filter operations and scenarios, ensuring thorough testing of the Filter API's capabilities. This approach effectively validates the filtering logic across various data types and conditions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 52c74e1 and 5d262e2.
Files selected for processing (5)
  • langchain4j-core/src/main/java/dev/langchain4j/data/document/Document.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java (8 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/data/segment/TextSegment.java (8 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java (6 hunks)
  • langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3 hunks)
Files skipped from review as they are similar to previous changes (3)
  • langchain4j-core/src/main/java/dev/langchain4j/data/document/Document.java
  • langchain4j-core/src/main/java/dev/langchain4j/data/segment/TextSegment.java
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java
Additional comments: 7
langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java (4)
  • 31-47: The addition of new supported value types (int, long, float, double alongside their wrapper classes) is a significant improvement for the Metadata class, enhancing its flexibility and utility. This change aligns with the PR objectives to expand the Metadata API to support more value types. It's also good to see the use of both primitive and wrapper types to ensure compatibility with different use cases.
  • 105-118: The getString method is a welcome addition, providing a type-safe way to retrieve String values from the metadata. The use of @Experimental annotation is appropriate here, indicating that the API is subject to change. However, it's important to ensure that the documentation clearly explains the method's behavior, especially regarding its return of null if the key is not present or if the value is not of the expected type. This clarity will help developers understand the method's contract and handle its usage correctly.
  • 136-151: The getInteger method introduces flexibility in handling numeric metadata values, accommodating both direct Integer retrievals and parsing from String. This method, along with similar ones for Long, Float, and Double, addresses the potential issues with type loss during serialization/deserialization processes, as mentioned in the comments. It's a thoughtful design choice that enhances the Metadata class's usability across different storage implementations. However, ensure that the potential NumberFormatException from parsing strings is either documented or handled gracefully to avoid unexpected crashes.
  • 298-356: The new put methods for adding key-value pairs to the metadata are well-designed, providing a clear and type-safe way to work with different value types. Marking these methods as @Experimental allows for future adjustments based on feedback and usage patterns. It's good to see consistent validation across these methods, ensuring that keys and values meet the expected criteria before being added to the metadata. This approach enhances the robustness and reliability of the Metadata class.
langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3)
  • 23-23: The class InMemoryEmbeddingStoreTest now extends EmbeddingStoreWithFilteringIT instead of EmbeddingStoreIT. This change reflects an update in the inheritance hierarchy to accommodate the new filtering capabilities introduced in the PR. Ensure that all relevant test cases from EmbeddingStoreIT are still valid or appropriately adapted in EmbeddingStoreWithFilteringIT to maintain comprehensive test coverage.
  • 69-69: The use of .hasSameHashCodeAs in the assertion block is a minor formatting adjustment that enhances the clarity of the test's intention. This change ensures that not only the contents are equal but also that the hash codes of the deserialized and original entries match, which is a good practice for ensuring the integrity of serialization and deserialization processes.
  • 81-105: The addition of the test_backwards_compatibility method is a significant and valuable enhancement. It tests the JSON deserialization and matching functionality to ensure that the system remains compatible with data produced by previous versions. This is crucial for maintaining data integrity and system reliability across version updates. The test method is well-structured and covers essential aspects of backwards compatibility, including deserialization from a predefined JSON string and verifying the expected behavior of the findRelevant method.

However, consider adding more detailed comments within the test method to explain the purpose of each block of code, especially the JSON string's structure and the significance of the specific version it represents. This will improve the maintainability and understandability of the test case for future contributors.

Verification successful

The verification process confirms that the test_backwards_compatibility method is the only test specifically designed for testing backwards compatibility within the Java files of the project. This underscores the importance of this test in ensuring the system's compatibility with data produced by previous versions, as highlighted in the review comment.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify if there are more tests related to backwards compatibility across different versions.
rg "test_backwards_compatibility" --type java

Length of output: 186

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5d262e2 and c08e36f.
Files selected for processing (1)
  • langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3 hunks)
Files skipped from review as they are similar to previous changes (1)
  • langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between c08e36f and 245b60d.
Files ignored due to path filters (6)
  • langchain4j-cassandra/pom.xml is excluded by: !**/*.xml
  • langchain4j-infinispan/pom.xml is excluded by: !**/*.xml
  • langchain4j-neo4j/pom.xml is excluded by: !**/*.xml
  • langchain4j-opensearch/pom.xml is excluded by: !**/*.xml
  • langchain4j-parent/pom.xml is excluded by: !**/*.xml
  • pom.xml is excluded by: !**/*.xml
Files selected for processing (3)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsIn.java (1 hunks)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotIn.java (1 hunks)
  • langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3 hunks)
Files skipped from review as they are similar to previous changes (3)
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsIn.java
  • langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotIn.java
  • langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java

@langchain4j langchain4j merged commit 1acb7a6 into main Mar 8, 2024
6 checks passed
@langchain4j langchain4j deleted the metadata_filtering branch March 8, 2024 16:06
@ansonma
Copy link

ansonma commented Mar 26, 2024

@langchain4j , Thanks a lot for this new feature, it is exactly what we need. May I know if I can use the metadata filter function for AWS OpenSearch embedding store? If not now, may I know when it will be available? Thanks.

@langchain4j
Copy link
Owner Author

@ansonma I do not plan to add this feature to all embedding store integrations as I do not have enough time. I hope that someone from community could contribute this, considering that foundation is there, examples (elasticsearch, milvus) are there, it should be pretty easy.

@YahiaDev
Copy link

YahiaDev commented May 17, 2024

Hello,
thank you for the new feature. Can i use the metadata filter with PgVectorEmbeddingStore ?
If not it will be available for futur release ? Did you have a work around to filter metadata with PgVectorEmbeddingStore ? Thanks

@langchain4j
Copy link
Owner Author

langchain4j commented May 17, 2024

Hi, Filter support for PgVectorEmbeddingStore will be released next week.

@ansonma
Copy link

ansonma commented Dec 16, 2024

Hello, my company is investigating Langchain4J as our long term AI framework for our Java stack apps, can someone help to add the OpenSearch metadata filtering function? This is the only piece missing for our requirements for now. Thanks in advance.

@langchain4j
Copy link
Owner Author

@andyflury would you like to contribute this? It should be pretty straightforward. There is an implementation for Elasticsearch already, I assume the one for OpenSearch will be similar/same? Thanks a lot in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants