-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EmbeddingStore (Metadata) Filter API #610
Conversation
This looks great @langchain4j Thanks for the quick turnaround! I just tested in with my application with a simple Regarding your questions. I cannot comment on all of them, but here are a few thoughts
I would say types like:
I would strict comparisons.
Personally I would prefer
Personally I would prefer
that sounds reasonable to me! |
@andyflury thanks for the feedback! Another random thought: instead of having DSL just for metadata filtering, we could have a DSL for the whole search/retrieval, something like this: SearchRequest.forQueryEmbedding(queryEmbedding)
.where(metadata -> metadata.field("author").eq("Stephen King").and(metadata.field("year").gt(2020)))
.minScore(0.77)
.maxResults(5)
.build(); |
@langchain4j great idea, that would make things even more convenient. I'm all for it! |
@langchain4j would it potentially make sense to give the Let's say we have two types of information in the same vector database index:
Then you could construct one |
@andyflury that totally makes sense! However, it would probably be easier to define it on the Also I plan to add a self-query retriever similar to the one in python version, where LLM will be able to construct metadata filter dynamically, potentially including multiple fields. This will help to narrow down the search. Another thing that will probably be useful is to automatically construct filter by user (If @memoryid) is present. |
@langchain4j Yes, you are right doing it inside A self-query retriever sound extremely interesting. This could potentially solve another challenge we are facing currently. We have started indexing all out customer support tickets. This works well when asking questions like "what is the best solution to solve problem xyz". But it does not work well for questions like "please summarize all tickets from the last month for customer xyz". I assume that with the self-query retriever, the system would automatically be able to set filters for so far I was using Tools for this, but having a self-query retriever (in combination with the Would you also consider having something like an |
Do you mean a |
Yes, I think that could be useful. I'm still trying to figure out where to use RAG (i.e. Would you agree to this? |
@andyflury yes, exactly. To summarise:
However, |
Warning Rate Limit Exceeded@langchain4j has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 55 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. WalkthroughThese updates introduce new filters for checking if a specified key's value exists in a collection and enhance the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 40
Configuration used: CodeRabbit UI
Files ignored due to path filters (22)
embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/pom.xml
is excluded by:!**/*.xml
langchain4j-azure-ai-search/pom.xml
is excluded by:!**/*.xml
langchain4j-bom/pom.xml
is excluded by:!**/*.xml
langchain4j-cassandra/pom.xml
is excluded by:!**/*.xml
langchain4j-chroma/pom.xml
is excluded by:!**/*.xml
langchain4j-core/pom.xml
is excluded by:!**/*.xml
langchain4j-elasticsearch/pom.xml
is excluded by:!**/*.xml
langchain4j-infinispan/pom.xml
is excluded by:!**/*.xml
langchain4j-milvus/pom.xml
is excluded by:!**/*.xml
langchain4j-mongodb-atlas/pom.xml
is excluded by:!**/*.xml
langchain4j-neo4j/pom.xml
is excluded by:!**/*.xml
langchain4j-opensearch/pom.xml
is excluded by:!**/*.xml
langchain4j-parent/pom.xml
is excluded by:!**/*.xml
langchain4j-pgvector/pom.xml
is excluded by:!**/*.xml
langchain4j-pinecone/pom.xml
is excluded by:!**/*.xml
langchain4j-qdrant/pom.xml
is excluded by:!**/*.xml
langchain4j-redis/pom.xml
is excluded by:!**/*.xml
langchain4j-vearch/pom.xml
is excluded by:!**/*.xml
langchain4j-vespa/pom.xml
is excluded by:!**/*.xml
langchain4j-weaviate/pom.xml
is excluded by:!**/*.xml
langchain4j/pom.xml
is excluded by:!**/*.xml
pom.xml
is excluded by:!**/*.xml
Files selected for processing (67)
- .github/workflows/release.yaml (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/ColumnDefinition.java (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilder.java (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/TableDefinition.java (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParser.java (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderIT.java (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderTest.java (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParserTest.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/Experimental.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/data/document/Document.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java (8 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/data/segment/TextSegment.java (8 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/RetrievalAugmentor.java (2 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/content/aggregator/ContentAggregator.java (2 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/content/injector/ContentInjector.java (2 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/query/router/LanguageModelQueryRouter.java (2 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/query/router/QueryRouter.java (2 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/CompressingQueryTransformer.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/ExpandingQueryTransformer.java (2 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/QueryTransformer.java (2 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchRequest.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchResult.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java (6 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/FilterParser.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/Equal.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/GreaterThan.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/GreaterThanOrEqual.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/In.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/LessThan.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/LessThanOrEqual.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NotEqual.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NotIn.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NumberComparator.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/TypeChecker.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/And.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/Not.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/Or.java (1 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/data/document/MetadataTest.java (2 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetrieverTest.java (1 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreIT.java (2 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java (1 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithoutMetadataIT.java (10 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/filter/FilterTest.java (1 hunks)
- langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/Document.java (1 hunks)
- langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStore.java (6 hunks)
- langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchMetadataFilterMapper.java (1 hunks)
- langchain4j-elasticsearch/src/test/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStoreCloudIT.java (1 hunks)
- langchain4j-elasticsearch/src/test/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStoreIT.java (2 hunks)
- langchain4j-infinispan/src/test/java/dev/langchain4j/store/embedding/infinispan/InfinispanEmbeddingStoreIT.java (1 hunks)
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/CollectionOperationsExecutor.java (4 hunks)
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/CollectionRequestBuilder.java (3 hunks)
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/Generator.java (2 hunks)
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/Mapper.java (6 hunks)
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java (5 hunks)
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusMetadataFilterMapper.java (1 hunks)
- langchain4j-milvus/src/test/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStoreCloudIT.java (1 hunks)
- langchain4j-milvus/src/test/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStoreIT.java (3 hunks)
- langchain4j-redis/src/main/java/dev/langchain4j/store/embedding/redis/RedisEmbeddingStore.java (9 hunks)
- langchain4j-redis/src/main/java/dev/langchain4j/store/embedding/redis/RedisSchema.java (4 hunks)
- langchain4j-redis/src/test/java/dev/langchain4j/store/embedding/redis/RedisEmbeddingStoreIT.java (2 hunks)
- langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/GsonInMemoryEmbeddingStoreJsonCodec.java (1 hunks)
- langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStore.java (3 hunks)
- langchain4j/src/test/java/dev/langchain4j/service/AiServicesWithRagIT.java (3 hunks)
- langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreSerializedTest.java (1 hunks)
- langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3 hunks)
Files not summarized due to errors (2)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParserTest.java: Error: Message exceeds token limit
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java: Error: Message exceeds token limit
Files not reviewed due to errors (2)
- SqlFilterParser.java (no review received)
- SqlFilterParserTest.java (no review received)
Files skipped from review due to trivial changes (3)
- langchain4j-core/src/main/java/dev/langchain4j/Experimental.java
- langchain4j-core/src/main/java/dev/langchain4j/data/segment/TextSegment.java
- langchain4j-infinispan/src/test/java/dev/langchain4j/store/embedding/infinispan/InfinispanEmbeddingStoreIT.java
Additional comments: 145
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/FilterParser.java (1)
- 1-21: The introduction of the
FilterParser
interface is a crucial part of the newFilter
API, enabling the parsing of filter expressions intoFilter
objects. The documentation clearly explains the purpose and the current implementation (SqlFilterParser
). It's important to ensure that the documentation stays up-to-date as more implementations are added. The use of the@Experimental
annotation is appropriate here, signaling that the API is subject to change.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/Not.java (1)
- 1-27: The
Not
class is a well-implemented logical operation for theFilter
API, allowing for negating filter expressions. The use ofensureNotNull
in the constructor ensures that a null expression is not passed, which is crucial for avoiding null pointer exceptions during runtime. Thetest
method correctly negates the result of the expression's test, aligning with the expected behavior of a logical NOT operation.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchResult.java (1)
- 1-26: The
EmbeddingSearchResult
class is a straightforward representation of search results, encapsulating a list ofEmbeddingMatch
objects. The use ofensureNotNull
in the constructor is good practice, preventing null values for thematches
list and thus avoiding potential null pointer exceptions. Marking the class and its methods with the@Experimental
annotation is appropriate, indicating that the API may change.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/Or.java (1)
- 1-33: The
Or
class implements the logical OR operation for theFilter
API, allowing for the combination of two filter expressions. The use ofensureNotNull
for bothleft
andright
expressions in the constructor is essential for maintaining robustness. Thetest
method's implementation, which returns true if either expression evaluates to true, correctly embodies the logical OR operation. This class is a valuable addition to theFilter
API, enabling more complex filter expressions.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/logical/And.java (1)
- 1-33: The
And
class is another key component of the logical operations for theFilter
API, implementing the logical AND operation. The constructor's use ofensureNotNull
for both operands ensures that null expressions are not allowed, which is crucial for the operation's integrity. Thetest
method's logic, requiring both expressions to evaluate to true, accurately represents the AND operation. This class enhances theFilter
API's capability to represent complex filter expressions.langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/Generator.java (1)
- 29-35: The addition of the
generateEmptyJsons
method is a practical utility for creating lists of empty JSON objects, which could be useful in various contexts within the Milvus embedding store integration. The use ofnew JSONObject(new HashMap<>())
for each list entry ensures that each JSON object is indeed empty and mutable. This method could be particularly useful for initializing data structures or for tests where empty JSON objects are required.langchain4j-core/src/main/java/dev/langchain4j/rag/RetrievalAugmentor.java (1)
- 16-16: Updating the annotation to
@Experimental
on theRetrievalAugmentor
interface signals that this part of the API is subject to change and may not be fully stable yet. This is an important note for developers using this interface, as it sets the expectation that future versions of LangChain4j might introduce breaking changes or enhancements to this functionality. It's crucial to document the rationale behind marking specific APIs as experimental to guide developers accordingly.langchain4j-core/src/main/java/dev/langchain4j/rag/content/injector/ContentInjector.java (1)
- 17-17: The change from
MightChangeInTheFuture
toExperimental
annotation on theContentInjector
interface correctly signals the current stability and maturity level of this feature. Developers should be aware that experimental features might undergo significant changes and should use them with caution in production environments.embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/ColumnDefinition.java (1)
- 9-38: The
ColumnDefinition
class is well-designed, with appropriate use of validation in constructors and getters for each field. The decision to use a string for thetype
field, as discussed in previous comments, allows for flexibility and future-proofing, which is suitable for the intended use case. The@Experimental
annotation is appropriately applied, indicating the feature's current development stage.langchain4j-core/src/main/java/dev/langchain4j/rag/query/router/QueryRouter.java (1)
- 25-25: The update from
MightChangeInTheFuture
toExperimental
annotation on theQueryRouter
interface is appropriate, signaling the current development stage of this feature. It's important for developers to note that experimental features are subject to change and should be used cautiously in production.langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/QueryTransformer.java (1)
- 27-27: The change from
MightChangeInTheFuture
toExperimental
annotation on theQueryTransformer
interface correctly indicates the feature's current development stage. Developers should be mindful of the experimental status of features, as they may undergo significant changes.langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreSerializedTest.java (1)
- 13-33: The
InMemoryEmbeddingStoreSerializedTest
class is well-structured for testing the serialization and deserialization functionality ofInMemoryEmbeddingStore
. It cleverly uses theawaitUntilPersisted
method to simulate persistence. The actual testing logic relies on inherited tests fromEmbeddingStoreWithFilteringIT
, which is an efficient use of existing test infrastructure.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NumberComparator.java (1)
- 6-32: The
NumberComparator
class provides robust utility methods for comparing numbers by converting them toBigDecimal
, which is an effective way to handle floating-point inaccuracies. The clear error reporting for unsupported types in thetoBigDecimal
method enhances type safety and usability.langchain4j-core/src/main/java/dev/langchain4j/rag/content/aggregator/ContentAggregator.java (1)
- 26-26: The update from
MightChangeInTheFuture
toExperimental
annotation on theContentAggregator
interface is appropriate, signaling the current development stage of this feature. Developers should be mindful of the experimental status of features, as they may undergo significant changes.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/Equal.java (1)
- 15-52: The
Equal
class is well-implemented, providing robust functionality for equality comparisons within the metadata filtering context. It correctly handles both numeric and non-numeric comparisons, with appropriate validation forkey
andcomparisonValue
. The use ofNumberComparator
for numeric types andensureTypesAreCompatible
for type safety before comparison is commendable.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/NotEqual.java (1)
- 20-22: The constructor correctly validates the
key
andcomparisonValue
parameters usingensureNotBlank
andensureNotNull
. This ensures that neither the key nor the comparison value is null or blank, which is crucial for the integrity of the filter operation.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/LessThan.java (1)
- 20-22: The constructor validation for
key
andcomparisonValue
is correctly implemented, ensuring that essential parameters are neither null nor blank. This validation is crucial for the integrity of the filter operation.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/GreaterThan.java (1)
- 20-22: The constructor validation for
key
andcomparisonValue
is correctly implemented, ensuring that essential parameters are neither null nor blank. This validation is crucial for the integrity of the filter operation.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/LessThanOrEqual.java (1)
- 20-22: The constructor validation for
key
andcomparisonValue
is correctly implemented, ensuring that essential parameters are neither null nor blank. This validation is crucial for the integrity of the filter operation.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/GreaterThanOrEqual.java (1)
- 20-22: The constructor validation for
key
andcomparisonValue
is correctly implemented, ensuring that essential parameters are neither null nor blank. This validation is crucial for the integrity of the filter operation.langchain4j-redis/src/test/java/dev/langchain4j/store/embedding/redis/RedisEmbeddingStoreIT.java (1)
- 46-46: The change to use
createMetadata().toMap().keySet()
for specifying metadata keys in theRedisEmbeddingStoreIT
class is a good improvement. It ensures that the test setup dynamically reflects the actual metadata keys used, enhancing the test's relevance and accuracy..github/workflows/release.yaml (1)
- 43-44: The addition of
ELASTICSEARCH_CLOUD_API_KEY
andELASTICSEARCH_CLOUD_URL
as environment variables in the release workflow is a good practice. It ensures that these sensitive values are securely managed and accessible to the steps that require them. Ensure that these secrets are properly configured in the GitHub repository settings.embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/TableDefinition.java (1)
- 22-25: The constructor correctly validates the
name
parameter and ensures that thecolumns
collection is neither null nor empty. This validation is crucial for the integrity of the table definition. Thedescription
parameter is not validated for non-null values, which is acceptable if a description is optional.langchain4j-milvus/src/test/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStoreIT.java (3)
- 9-9: The import change from
EmbeddingStoreWithoutMetadataIT
toEmbeddingStoreWithFilteringIT
aligns with the PR's objective to introduce filtering capabilities. This change suggests that the test now extends a class that likely includes filtering-related tests, which is a positive step towards testing the new functionality.- 38-41: Adding an
@AfterEach
method to drop the collection after each test is a good practice. It ensures that each test runs in a clean state, preventing potential side effects from affecting other tests. This is especially important in integration tests involving external systems like Milvus.- 24-24: The usage of a constant
COLLECTION_NAME
for the collection name instead of hardcoding it in multiple places enhances maintainability and readability. This change makes it easier to update the collection name in the future if needed.langchain4j-milvus/src/test/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStoreCloudIT.java (4)
- 9-9: The import change from
EmbeddingStoreWithoutMetadataIT
toEmbeddingStoreWithFilteringIT
is consistent with the PR's goal to enhance filtering capabilities in tests. This change suggests that the test now extends a class likely including filtering-related tests, which is beneficial for testing the new functionality.- 32-35: Adding an
@AfterEach
method to drop the collection after each test ensures that each test runs in a clean state. This practice is crucial for integration tests involving external systems like Milvus, especially when testing against a cloud instance where leftover data might incur costs or affect subsequent tests.- 20-20: Using a constant
COLLECTION_NAME
for the collection name improves maintainability and readability by centralizing the collection name definition. This approach facilitates future updates to the collection name if necessary.- 23-28: The configuration for connecting to a Milvus cloud instance, including the use of an environment variable for the API token (
MILVUS_API_KEY
), is a secure practice for handling sensitive information. However, it's recommended to also externalize the URI to an environment variable or configuration file to avoid hardcoding sensitive or environment-specific information in the source code.Consider externalizing the URI to an environment variable or configuration file for better security and flexibility.
langchain4j-redis/src/main/java/dev/langchain4j/store/embedding/redis/RedisSchema.java (3)
- 37-37: Changing the type of
metadataKeys
fromList<String>
toCollection<String>
increases flexibility in the types of collections that can be used. This change allows for both ordered (List) and unordered (Set) collections, which can be beneficial depending on the use case. However, it's important to ensure that the rest of the codebase that interacts withmetadataKeys
is compatible with this change.- 51-51: Introducing a constructor that only takes
dimension
as a parameter simplifies the creation ofRedisSchema
instances when only the dimension is known or relevant. This can enhance usability in scenarios where other parameters are not needed or have sensible defaults.- 74-91: The method name changes to follow Java bean naming conventions (
indexName()
,prefix()
,vectorFieldName()
,scalarFieldName()
,metadataKeys()
) improve readability and align with common Java practices. This change makes the code more intuitive and consistent with standard Java naming conventions.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchRequest.java (1)
- 21-70: The introduction of the
EmbeddingSearchRequest
class is a significant addition, aligning with the PR's objectives to enhance the embedding store's functionality. This class encapsulates all necessary parameters for an embedding search request, including a newFilter
for metadata filtering. The use of validation methods (ensureNotNull
,ensureGreaterThanZero
,ensureBetween
) in the constructor ensures that the provided values are valid, enhancing the robustness of the code. The use of default values for optional parameters (maxResults
,minScore
) improves usability by providing sensible defaults.langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/CollectionRequestBuilder.java (2)
- 36-40: Introducing the
buildDropCollectionRequest
method to create a drop collection request for Milvus is a useful addition. This method simplifies the process of dropping a collection, encapsulating the necessary parameters and providing a clear, reusable way to perform this operation.- 57-74: Modifying the
buildSearchRequest
method to handle aFilter
parameter is a crucial enhancement that aligns with the PR's objectives to introduce filtering capabilities. This change allows for the inclusion of filter expressions in search requests, significantly enhancing the flexibility and power of searches. It's important to ensure that theMilvusMetadataFilterMapper.map(filter)
method correctly translates theFilter
object into a Milvus-compatible expression.langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/ExpandingQueryTransformer.java (2)
- 49-51: Changing the access modifiers of
chatLanguageModel
,promptTemplate
, andn
from private to protected allows subclasses to directly access these fields. This change can facilitate extension and customization of theExpandingQueryTransformer
class. However, it's important to ensure that this does not unintentionally expose internal implementation details that should be encapsulated.- 76-76: Simplifying the
transform
method to directly return the parsed response enhances readability and reduces unnecessary complexity. This change makes the method's purpose and behavior more straightforward.embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderTest.java (2)
- 30-49: The tests for parsing valid SQL queries are comprehensive, covering various common scenarios including trailing semicolons, leading and trailing spaces, and newline characters. These tests ensure that the
LanguageModelSqlFilterBuilder
can correctly parse and handle SQL queries in different formats. It's important that thesqlFilterParser.parse
method is called with the trimmed SQL query to avoid issues with leading or trailing whitespace.- 70-90: The tests for extracting valid SQL from responses with additional text or formatting are thorough, covering scenarios with code blocks, additional text before and after the SQL query, and various combinations of formatting characters. These tests validate the
LanguageModelSqlFilterBuilder
's ability to extract the core SQL query from a response that may include additional, non-relevant text or formatting. This capability is crucial for handling responses from chat models that might include explanatory text or formatting around the SQL query.langchain4j-core/src/main/java/dev/langchain4j/rag/query/transformer/CompressingQueryTransformer.java (1)
- 51-52: Changing the access modifiers of
promptTemplate
andchatLanguageModel
from private to protected inCompressingQueryTransformer
allows for easier extension and customization by subclasses. This change can be beneficial for creating specialized transformers that require direct access to these fields. However, it's crucial to ensure that this change does not expose more of the class's internals than necessary.langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (2)
- 20-20: The change to extend
EmbeddingStoreWithFilteringIT
aligns with the PR's objective to introduce filtering capabilities to theEmbeddingStore
. Ensure thatEmbeddingStoreWithFilteringIT
provides all necessary test methods and setup for filtering functionality. If additional tests specific to theInMemoryEmbeddingStore
are required, consider implementing them.- 66-66: The minor formatting adjustment with
hasSameHashCodeAs
is a good practice for consistency in test assertions. This change is approved as it does not impact the functionality or logic of the tests.langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusMetadataFilterMapper.java (2)
- 17-43: The
map
method effectively handles differentFilter
types by delegating to specific mapping methods. This approach maintains readability and modularity. However, consider adding support for custom or futureFilter
types through a more extensible mechanism, such as a registry or factory pattern, to avoid modifying this method directly for each new type.- 41-41: When encountering an unsupported
Filter
type, the method throws anUnsupportedOperationException
. This is appropriate for signaling to the caller that the operation cannot be completed as requested. However, ensure that upstream code properly handles this exception to avoid unexpected crashes or behavior.langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreIT.java (1)
- 42-117: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [21-81]
The integration test
should_add_embedding_with_segment_with_metadata
correctly tests the addition of embeddings with various metadata types and the retrieval of these embeddings using the new search API. It's comprehensive in testing different metadata value types. However, consider adding negative test cases or edge cases, such as adding embeddings with invalid metadata values or types, to ensure robust error handling.langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/CollectionOperationsExecutor.java (2)
- 60-63: The addition of a new field type for storing metadata as JSON is a significant enhancement, allowing for more flexible and rich metadata associated with embeddings. Ensure that the JSON data structure and usage are well-documented, and consider adding validation or sanitization to prevent issues related to malformed JSON data.
- 75-79: The
dropCollection
method is correctly implemented for dropping a collection in Milvus. It's good practice to provide such utility methods to manage the lifecycle of collections, especially in testing or dynamic environments. Ensure that this method is used cautiously in production environments to prevent accidental data loss.langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/Mapper.java (3)
- 42-45: The addition of
toMetadataJsons
method enhances the flexibility in handling metadata associated with text segments. This method correctly converts metadata to JSON objects, which can be useful for various operations. Ensure that the metadata conversion handles all expected data types correctly and consider adding error handling for unexpected types.- 89-99: The method
toTextSegment
correctly constructs aTextSegment
object from aRowRecord
, including handling of metadata. This method improves the modularity and readability of the code by encapsulating the conversion logic. Ensure that null checks and error handling are sufficient to prevent issues with incomplete or malformed data.- 104-113: The
toMetadata
method converts aJSONObject
to aMetadata
object, handling the conversion ofBigDecimal
todouble
. This is a practical approach to dealing with numeric types in JSON. However, be aware of potential precision loss when converting fromBigDecimal
todouble
and document this behavior if necessary.embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderIT.java (4)
- 34-48: The test
should_filter_by_genre
correctly verifies that theLanguageModelSqlFilterBuilder
can interpret a natural language query and produce the appropriate filter for a specific genre. This test demonstrates the builder's basic functionality. Consider adding more complex queries to test the builder's ability to handle various natural language structures and intents.- 50-67: The test
should_filter_by_genre_and_year
extends the testing to include both genre and year filters based on a natural language query. This test is well-constructed and demonstrates the builder's capability to handle composite filters. Ensure that edge cases, such as ambiguous or conflicting information in the query, are also tested.- 69-82: The test
should_filter_by_year_range
effectively tests the builder's ability to interpret a query specifying a range of years. This test showcases the builder's flexibility in understanding and translating natural language queries into precise filter expressions. Consider testing additional range expressions and edge cases, such as open-ended ranges or invalid ranges.- 85-98: The test
should_filter_by_year_using_arithmetics
demonstrates the builder's capability to interpret arithmetic expressions within natural language queries, such as calculating the previous year. This test is a good example of the builder's advanced functionality. Ensure that more complex arithmetic expressions and their potential impact on filter accuracy are also tested.langchain4j-core/src/main/java/dev/langchain4j/rag/query/router/LanguageModelQueryRouter.java (2)
- 56-60: The change from private to protected access modifiers for the fields
chatLanguageModel
,promptTemplate
,options
,idToRetriever
, andfallbackStrategy
allows subclasses to directly access these fields. This is a positive change for extensibility but consider encapsulating the fields with protected getter methods to maintain control over the access to these fields. This approach can help in maintaining the integrity of the data and provide a single point of modification in the future if needed.- 108-108: Changing the visibility of the
fallback
method from private to protected enables subclasses to override this method, which is a good practice for enhancing extensibility and allowing for customized behavior in subclasses. Ensure that the documentation is updated to guide developers on how to properly override this method and what the expected behavior should be.langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchMetadataFilterMapper.java (1)
- 18-151: The implementation of
ElasticsearchMetadataFilterMapper
provides a comprehensive mapping from theFilter
API to Elasticsearch queries. This is crucial for integrating the new filtering capabilities with Elasticsearch. A few points to consider:
- Ensure that all comparison values are properly sanitized and converted to the appropriate Elasticsearch data types to prevent any potential query injection vulnerabilities.
- The use of
metadata.
prefix in field names (e.g., lines 64, 71, 78, 85) assumes a specific document structure. Make sure this aligns with the expected structure of documents stored in Elasticsearch and document this assumption for future reference.- The method
formatKey
(lines 137-151) smartly handles the distinction between keyword and non-keyword fields. This is important for correct query behavior, especially for string fields.Overall, this mapper is a key component for enabling advanced filtering capabilities in Elasticsearch-backed embedding stores. Just ensure that the assumptions made (e.g., document structure, field naming conventions) are well-documented and validated against the actual Elasticsearch schema used in the application.
langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStore.java (1)
- 96-122: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [99-129]
The implementation of the
search
method inInMemoryEmbeddingStore
correctly utilizes theEmbeddingSearchRequest
to perform a search with filtering and scoring. A few points to consider:
- The filtering logic (lines 108-113) is currently limited to instances of
TextSegment
. This approach is mentioned in the previous comments and discussions. However, consider future-proofing this by abstracting the metadata retrieval into a method or interface that can be implemented by different types of embedded objects. This would make the filtering capability more flexible and applicable to a wider range of use cases.- The scoring and ranking logic (lines 115-123) is straightforward and effective for an in-memory store. However, for larger datasets, consider optimizing this process to improve performance, such as pre-filtering entries that cannot meet the
minScore
criterion before calculating the cosine similarity.- The conversion of the priority queue to a list and then reversing the list (lines 126-129) is necessary to return the results in descending order of scores. This is correct but consider adding a comment to explain this step for future maintainers.
Overall, the
search
method is well-implemented and addresses the requirements of the newFilter
API. Just ensure that the filtering capability is as flexible and extensible as possible to accommodate future enhancements.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java (2)
- 58-74: The introduction of the
search
method in theEmbeddingStore
interface, marked as@Experimental
, is a significant enhancement that aligns with the PR's objectives to provide advanced filtering and searching capabilities. This method provides a more comprehensive and flexible approach to searching embeddings by encapsulating all search criteria within theEmbeddingSearchRequest
. The default implementation that leverages the deprecatedfindRelevant
method is a good transitional solution. However, it's important to encourage implementers of theEmbeddingStore
interface to provide their own optimized implementations of thesearch
method that fully utilize the capabilities of their underlying storage systems.- 99-114: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [86-111]
The deprecation of the
findRelevant
methods in favor of the newsearch
method is a clear indication of the direction for future development and usage of theEmbeddingStore
interface. While the default implementations provided ensure backward compatibility, it's crucial to document the expected transition path for users of theEmbeddingStore
interface. This includes guidance on how to switch to the newsearch
method and the benefits of doing so, such as enhanced filtering capabilities and more precise control over search parameters. Additionally, consider providing examples or best practices for implementing thesearch
method in customEmbeddingStore
implementations.langchain4j-core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java (7)
- 9-12: The addition of new imports indicates that the class now utilizes
EmbeddingSearchRequest
,EmbeddingSearchResult
,EmbeddingStore
, andFilter
classes/interfaces. This change aligns with the PR's objective to enhance theEmbeddingStoreContentRetriever
with dynamic filtering capabilities. Ensure that these new dependencies are properly managed in terms of versioning and compatibility within the project.- 16-16: The import of
java.util.function.Function
is crucial for the implementation of dynamic parameters (maxResults
,minScore
, andfilter
). This approach leverages Java's functional programming capabilities to provide flexibility in configuring the retriever based on the query context. It's a good use of Java's standard library to achieve the PR's objectives.- 53-55: The introduction of
DEFAULT_MAX_RESULTS
,DEFAULT_MIN_SCORE
, andDEFAULT_METADATA_FILTER
as static final functions provides sensible defaults for the dynamic parameters. This is a good design choice, ensuring that the retriever has a baseline behavior even if no dynamic configurations are provided. It also simplifies the constructors' logic by providing a fallback option.- 60-62: The use of
Function<Query, Integer>
,Function<Query, Double>
, andFunction<Query, Filter>
formaxResultsProvider
,minScoreProvider
, andfilterProvider
respectively, is an excellent way to achieve dynamic parameter configuration based on the query. This design allows for a high degree of flexibility and customization in content retrieval, aligning with the PR's objectives.- 100-111: The builder pattern used here, along with the private constructor that accepts functions for dynamic parameter configuration, is a clean and effective way to instantiate
EmbeddingStoreContentRetriever
objects with customizable behavior. This approach enhances the class's usability and flexibility. However, ensure that the@Builder
annotation is correctly processed and that all necessary builder methods are generated as expected, especially for setting dynamic parameters.- 115-134: The builder methods for setting
maxResults
,minScore
, andmetadataFilter
dynamically are well-implemented. They provide a user-friendly way to configure these parameters while ensuring input validation (e.g., ensuringmaxResults
is greater than zero andminScore
is within a valid range). This contributes to the robustness and usability of the API. However, consider adding null checks or documentation to clarify the behavior when null values are passed to these methods.- 140-151: The
retrieve
method's implementation effectively utilizes the dynamic parameters to configure theEmbeddingSearchRequest
based on the providedQuery
. This is a key part of achieving the PR's objectives, allowing for flexible and context-aware content retrieval. The method's logic is clear and concise, leveraging Java 8 streams for processing the search results. Ensure that theEmbeddingModel
andEmbeddingStore
interactions are thoroughly tested, especially with various dynamic configurations.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java (2)
- 43-43: The use of
@Experimental
annotation is appropriate here, indicating that theFilter
interface is a new feature and may be subject to changes. However, it's important to ensure that there is documentation or communication around what "experimental" means in this context to manage user expectations.- 78-80: The
MetadataKey
constructor usesensureNotBlank
to validate thekey
parameter. This is a good practice for ensuring that keys are not null or empty. However, it's also important to document the expected format or constraints of the key for users.langchain4j-core/src/test/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetrieverTest.java (11)
- 38-54: The use of
@BeforeEach
and@AfterEach
annotations for setting up and tearing down test environments is a good practice. It ensures that each test runs in a clean state, which is crucial for reliable test results.- 56-71: The test
should_retrieve
correctly verifies the basic retrieval functionality. It's important to ensure that all possible edge cases are covered, such as empty results or errors during the search process.// Consider adding tests for edge cases, such as empty results or errors during the search process.
- 73-92: The test
should_retrieve_builder
demonstrates the use of the builder pattern for creating anEmbeddingStoreContentRetriever
. This pattern enhances readability and maintainability. Ensure that the builder properly initializes all required fields.- 94-114: The test
should_retrieve_with_custom_maxResults
verifies the functionality of custom maximum results. It's crucial to test boundary conditions, such as setting the maximum results to zero or a negative number, to ensure robust error handling.// Test boundary conditions for custom maximum results, such as zero or negative numbers.
- 116-136: The test
should_retrieve_with_custom_maxResults_builder
checks the custom maximum results functionality using the builder pattern. Similar to the previous comment, testing boundary conditions for maximum results is important for comprehensive coverage.// Ensure comprehensive testing of boundary conditions for custom maximum results using the builder pattern.
- 138-158: The test
should_retrieve_with_custom_dynamicMaxResults_builder
introduces dynamic maximum results based on the query. This feature adds flexibility but also complexity. Ensure that the dynamic logic is thoroughly tested, including with null or invalid queries.// Thoroughly test the dynamic maximum results logic, including scenarios with null or invalid queries.
- 160-181: The test
should_retrieve_with_custom_minScore_ctor
verifies custom minimum score functionality. Testing for edge cases, such as extremely high or low scores, is essential to ensure that the system behaves as expected under all conditions.// Test edge cases for custom minimum scores, such as extremely high or low values.
- 183-203: The test
should_retrieve_with_custom_minScore_builder
checks the custom minimum score functionality using the builder pattern. As with maximum results, it's important to test boundary conditions for minimum scores to ensure robustness.// Test boundary conditions for custom minimum scores using the builder pattern.
- 205-225: The test
should_retrieve_with_custom_dynamicMinScore_builder
explores dynamic minimum score functionality. This adds a layer of complexity, so testing with various dynamic logic scenarios, including edge cases, is crucial for reliability.// Test various scenarios and edge cases for dynamic minimum score functionality to ensure reliability.
- 227-250: The test
should_retrieve_with_custom_metadataFilter
verifies the functionality of using a metadata filter during retrieval. It's important to test with various filter types and combinations to ensure that filtering works as expected in all cases.// Test with various filter types and combinations to ensure comprehensive coverage of metadata filtering functionality.
- 252-275: The test
should_retrieve_with_custom_dynamicMetadataFilter
introduces dynamic metadata filtering based on the query. This feature increases flexibility but also adds complexity. Thorough testing, including with complex filter expressions, is essential for ensuring correct behavior.// Thoroughly test dynamic metadata filtering, including with complex filter expressions, to ensure correct behavior.
langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java (2)
- 37-41: The updated class documentation clearly explains the new functionalities related to managing Milvus instances, storing
Metadata
, and filtering. It's important to ensure that the documentation remains up-to-date as the implementation evolves.- 127-154: The refactored
search
method now usesEmbeddingSearchRequest
andEmbeddingSearchResult
, aligning with the new filtering functionality. It's important to verify that the method correctly handles all possible filter expressions and edge cases.// Verify that the search method correctly handles all possible filter expressions and edge cases. SearchParam searchParam = buildSearchRequest(...);langchain4j-redis/src/main/java/dev/langchain4j/store/embedding/redis/RedisEmbeddingStore.java (4)
- 50-50: Renaming
metadataFieldsName
tometadataKeys
in the constructor parameter improves clarity and consistency with the internal representation of metadata within theRedisEmbeddingStore
. This change aligns with the common terminology used in key-value stores and databases, making the codebase more intuitive for new developers and contributors.- 67-67: Using
metadataKeys
directly in theRedisSchema
builder pattern is a straightforward and clean approach to passing metadata keys from theRedisEmbeddingStore
to theRedisSchema
. This ensures that the schema is aware of which metadata keys are important for indexing and retrieval operations, facilitating efficient data management and query execution.- 116-117: The addition of
schema.metadataKeys()
to the list of return fields in thefindRelevant
method ensures that all relevant metadata keys are included in the search results. This is crucial for applications that rely on metadata for filtering or post-processing of search results. It demonstrates a good understanding of the feature requirements and ensures that the API returns comprehensive information to the caller.- 285-288: The deprecation of
metadataFieldsName
in favor ofmetadataKeys
within theBuilder
class is a good practice, as it signals to the users of the API that they should transition to the new naming convention. It's important to ensure that the deprecation is communicated clearly in the project's documentation and migration guides to facilitate a smooth transition for existing users.embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilder.java (3)
- 85-85: Marking the
LanguageModelSqlFilterBuilder
class as@Experimental
is a good practice, as it clearly communicates to the users of the library that this feature is in an experimental stage and may undergo significant changes in the future. This tag encourages feedback and contributions while setting the right expectations regarding stability and support.- 90-106: The default
PromptTemplate
provided is well-documented and clearly outlines the task for the language model, including instructions, input, and expected response format. This detailed prompt design is crucial for achieving high-quality SQL generation from the language model. However, it's important to continuously evaluate and refine the prompt based on real-world usage and feedback to ensure it remains effective across a wide range of queries and database schemas.- 124-128: The constructor of
LanguageModelSqlFilterBuilder
properly ensures that bothchatLanguageModel
andtableDefinition
are not null, which is crucial for the operation of this class. The use ofgetOrDefault
forpromptTemplate
andsqlFilterParser
provides sensible defaults while allowing for customization. This design promotes flexibility and robustness in the implementation.langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithoutMetadataIT.java (8)
- 56-59: The addition of tests using the new
search
API withEmbeddingSearchRequest
parameters is a good practice, as it ensures that the new functionality is covered by automated tests. This helps in validating the correctness of the implementation and in catching regressions early. It's important to continue expanding the test coverage to include various edge cases and usage scenarios to ensure the robustness of the new API.- 82-85: Similar to the previous comment, the use of the new
search
API in this test case is commendable. It demonstrates the API's flexibility in handling different types of embedding additions, including those with specified IDs. Ensuring that the test assertions are comprehensive and reflect the expected behavior of the API is key to maintaining high-quality code.- 109-112: Again, testing the new
search
API with embeddings associated with text segments is crucial for verifying that the API correctly handles and returns relevant matches for complex queries involving both embeddings and their associated segments. This test case adds valuable coverage for scenarios where additional context (in the form of text segments) is provided alongside embeddings.- 149-152: Testing the
search
API with multiple embeddings ensures that the API can handle batch operations and return accurate results for queries involving multiple embeddings. This test case is essential for validating the API's performance and correctness in more demanding scenarios, such as bulk data processing or complex search queries.- 195-198: The use of the
search
API in testing scenarios with multiple embeddings and associated text segments further demonstrates the API's versatility. This test case is particularly important for ensuring that the API can accurately handle and return results for queries that involve a mix of embeddings and their contextual information, providing a comprehensive test coverage for the new functionality.- 226-229: Testing the
search
API with aminScore
parameter is crucial for ensuring that the API correctly filters results based on the specified minimum score. This test case adds important coverage for scenarios where precision in the search results is critical, such as filtering out low-relevance matches. It's important to include tests for variousminScore
values to thoroughly evaluate this aspect of the API.- 269-273: The addition of a test case using the
search
API with aminScore
parameter that filters out all but the highest-scoring matches is an excellent way to validate the API's filtering capabilities. This test ensures that the API can accurately apply score-based filtering to return only the most relevant results, which is essential for applications requiring high precision in search results.- 298-301: Testing the accuracy of the score returned by the
search
API is fundamental to ensuring the reliability of the search functionality. This test case validates that the scores calculated by the API are consistent with the expected relevance of the matches, which is crucial for applications that rely on the scoring mechanism to rank or filter search results.langchain4j-core/src/test/java/dev/langchain4j/store/embedding/filter/FilterTest.java (14)
- 15-34: The tests for
test_in_string
method are comprehensive, covering various scenarios including single values, lists, and type mismatches. However, consider adding a test case to ensure that thein
operation behaves correctly when the metadata contains multiple keys, to verify that the filter correctly isolates and evaluates the specified key.- 36-55: The
test_not_in_string
method correctly tests thenin
operation for strings. It's good to see both positive and negative cases, as well as type mismatch handling. Similar to the previous comment, adding a test case to ensure correct behavior when the metadata contains multiple keys would be beneficial.- 57-79: In
test_in_integer
, you've effectively tested various scenarios including type conversions and mismatches. It's particularly good to see tests for conversions from long, float, and double to integer. This ensures that the filter operation is robust across different numeric types. No additional comments here.- 81-103: The
test_not_in_integer
method follows a similar pattern totest_in_integer
, covering a wide range of cases. The inclusion of tests for type mismatches is commendable. This method is well-tested.- 105-127: For
test_in_long
, the tests are thorough, covering conversions from integer, float, and double to long. This ensures that the filter operation can handle different numeric types seamlessly. The method is well-tested.- 129-151: The
test_not_in_long
method is comprehensive, similar to thetest_in_long
method. It effectively tests thenin
operation for long values, including type conversions and mismatches. This method is well-tested.- 153-175: In
test_in_float
, the tests cover conversions from integer, long, and double to float. This ensures that the filter operation can handle different numeric types seamlessly. The method is well-tested.- 177-199: The
test_not_in_float
method follows a similar pattern totest_in_float
, covering a wide range of cases. The inclusion of tests for type mismatches is commendable. This method is well-tested.- 201-223: For
test_in_double
, the tests are thorough, covering conversions from integer, long, and float to double. This ensures that the filter operation can handle different numeric types seamlessly. The method is well-tested.- 225-247: The
test_not_in_double
method is comprehensive, similar to thetest_in_double
method. It effectively tests thenin
operation for double values, including type conversions and mismatches. This method is well-tested.- 250-258: The
test_in_empty_list
method correctly throws anIllegalArgumentException
when an empty list is passed as a comparison value. This is a good practice to ensure that filters are not inadvertently applied with empty criteria. No additional comments here.- 260-269: Similarly, the
test_in_list_with_null
method ensures that null values within a list of comparison values result in anIllegalArgumentException
. This maintains the integrity of the filter criteria. No additional comments here.- 271-280: The
test_not_in_empty_list
method follows the same pattern astest_in_empty_list
, correctly handling empty lists for thenin
operation. This consistency in handling empty lists across different filter operations is commendable.- 282-291: The
test_not_in_list_with_null
method ensures that null values within a list of comparison values for thenin
operation result in anIllegalArgumentException
, similar to thein
operation. This consistency is good practice.langchain4j-core/src/test/java/dev/langchain4j/data/document/MetadataTest.java (12)
- 5-7: The addition of parameterized tests using
@ParameterizedTest
with@NullSource
and@ValueSource
is a good practice for testing edge cases more efficiently. This approach helps to ensure that theMetadata
class behaves correctly under various input conditions.- 119-126: The
test_asMap
method correctly verifies that theasMap
function returns a map representation of the metadata. This test ensures that the conversion is accurate and maintains the integrity of the data.- 128-135: The
test_create_from_map
method effectively tests the creation ofMetadata
from a map with a single key-value pair. It's good to see that the test verifies the correct retrieval of the value. Consider adding more complex map scenarios to ensure robustness.- 138-206: The
should_create_from_map
method is comprehensive, testing the creation ofMetadata
from a map with various data types. It's commendable that the tests cover a wide range of type conversions and ensure that theMetadata
class can handle different types correctly. This thorough testing is crucial for maintaining the integrity and flexibility of theMetadata
API.- 209-225: The parameterized test
should_fail_to_create_from_map_when_key_is_null
correctly handles edge cases for null and blank keys. Using@ParameterizedTest
with@NullSource
and@ValueSource
is an efficient way to test these scenarios. This ensures that theMetadata
class enforces key constraints properly.- 228-242: The
should_fail_to_create_from_map_when_value_is_null
method tests the important constraint that metadata values cannot be null. This test ensures that theMetadata
class maintains data integrity by preventing null values.- 245-265: The
should_fail_to_create_from_map_when_value_is_of_unsupported_type
method tests theMetadata
class's ability to reject unsupported value types. This is crucial for maintaining type safety and ensuring that only supported types are used within the metadata. The detailed error message provided in the assertion is helpful for debugging.- 268-291: The
should_get_typed_values
method tests the retrieval of typed values from the metadata. This ensures that theMetadata
class can correctly handle and convert stored values to their respective types. It's good to see tests for null cases as well.- 294-319: The
should_fail_when_adding_null_key
method correctly tests the rejection of null keys for various value types. This ensures that theMetadata
class enforces key constraints properly across different value types. The detailed error messages are helpful for understanding the constraints.- 322-331: The
should_fail_when_adding_null_value
method tests the important constraint that metadata values cannot be null. This ensures that theMetadata
class maintains data integrity by preventing null values. This consistency in handling null values is commendable.- 334-350: The
should_convert_to_map
method tests the conversion ofMetadata
to a map. This ensures that the conversion maintains the integrity and accuracy of the data. It's good to see that the test verifies the equality of the original map and the converted map.- 353-357: The
test_containsKey
method effectively tests thecontainsKey
function of theMetadata
class. This ensures that the class can accurately report the presence or absence of keys. No additional comments here.langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStore.java (5)
- 27-30: The introduction of
EmbeddingSearchRequest
,EmbeddingSearchResult
, andFilter
classes is a positive change, aligning with the PR objectives to enhance metadata-based filtering capabilities in similarity searches. It's crucial to ensure these classes are well-documented and their usage is clear to the developers.- 57-59: The updated class documentation clearly explains the new capabilities of storing
Metadata
and filtering by it usingFilter
provided insideEmbeddingSearchRequest
. This is a valuable addition to the documentation, helping developers understand the purpose and usage of the class.- 272-294: The
buildScriptScoreQuery
method constructs the query with optional metadata filtering, which is a key part of the new functionality. The method is well-structured and follows the PR objectives. However, there's a potential improvement in handling thenull
filter case more explicitly. Currently, it defaults to amatchAll
query, which might not be clear to all developers. Adding a comment or documentation about this behavior could enhance code readability and maintainability.Consider adding a comment explaining the default behavior when
filter
isnull
.
- 352-352: Directly accessing metadata for indexing in the
bulk
method aligns with the PR objectives to enhance metadata support. Ensure that the metadata conversion to a map (toMap()
) is efficient and handles all supported types correctly. This change is crucial for the functionality and should be thoroughly tested.- 370-370: The
toMatches
method conversion fromSearchResponse<Document>
toList<EmbeddingMatch<TextSegment>>
is implemented correctly. It's essential to ensure that the conversion logic correctly handles null values and metadata conversion. This method is a critical part of the search functionality, and its correctness directly impacts the usability of the search results.langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java (8)
- 3-27: The expansion of the
Metadata
class to support additional value types (String
,Integer
,Long
,Float
,Double
) is a significant improvement, aligning with the PR objectives. It's important to ensure that the documentation is updated to reflect these changes and guide developers on how to use the new functionality effectively.- 31-47: The introduction of
SUPPORTED_VALUE_TYPES
as aSet
to track supported value types is a good practice, ensuring type safety and making it easier to manage supported types. However, consider using wrapper classes only (e.g.,Integer
instead ofint
) for consistency, as autoboxing will handle conversions, and it simplifies the type checking logic.Consider removing primitive class types from
SUPPORTED_VALUE_TYPES
to simplify the type checking logic.
- 88-96: The
get
method is marked for deprecation, which is a good decision given the introduction of type-specific getters. However, it's important to communicate this change clearly in the documentation and provide guidance on migrating to the new methods.- 105-118: The introduction of type-specific getters (e.g.,
getString
,getInteger
,getLong
,getFloat
,getDouble
) is a significant improvement for type safety and usability. It's crucial to ensure these methods are well-documented and include examples of their usage. Additionally, consider adding unit tests to cover edge cases and type conversion scenarios.- 270-273: The
add
method is marked for deprecation in favor of the newput
methods, which is a positive change for clarity and consistency. Ensure that the migration path is clearly documented for developers.- 296-356: The new
put
methods for adding metadata with specific value types are well-implemented, enhancing the class's usability and type safety. It's important to ensure that these methods are accompanied by comprehensive unit tests to validate their behavior, especially for edge cases and type conversions.- 384-391: The
asMap
method is marked for deprecation, which aligns with the introduction oftoMap
that returns a map with object values. This change is consistent with the expansion of supported value types. Ensure that the deprecation is clearly communicated and that developers are guided towards usingtoMap
for future compatibility.- 398-399: The
toMap
method provides a way to access the metadata as a map of key-value pairs with object values, reflecting the expanded support for different value types. This method is crucial for interoperability and should be thoroughly tested to ensure it handles all supported types correctly.langchain4j/src/test/java/dev/langchain4j/service/AiServicesWithRagIT.java (5)
- 34-36: The addition of new imports for
Filter
,LanguageModelSqlFilterBuilder
, andTableDefinition
is appropriate given the context of the PR, which introduces a newFilter
API for the EmbeddingStore. These imports are necessary for the implementation of the test cases that demonstrate the usage of the new API.- 48-48: The introduction of the
Function
interface fromjava.util.function
is a good practice, as it is used to define dynamic metadata filters in the test methods. This usage aligns with modern Java practices for functional programming and enhances the readability and flexibility of the code.- 429-467: This test method,
should_use_LLM_generated_metadataFilter
, demonstrates the use of a dynamically generated metadata filter using a language model. The test setup is clear and concise, effectively showcasing how to use theLanguageModelSqlFilterBuilder
to create a filter based on SQL-like expressions. The test asserts that the correct movie recommendation is returned based on the filter criteria, which is a practical demonstration of the new feature's capabilities.However, it's important to ensure that the
LanguageModelSqlFilterBuilder
and theTableDefinition
are thoroughly tested in their own unit tests to guarantee their correctness and robustness in various scenarios.
- 474-509: The test method
should_use_dynamicMetadataFilter_by_user_id
effectively demonstrates the use of a dynamic metadata filter based on user IDs. The use of a lambda function to create the filter condition dynamically is a good example of leveraging Java's functional programming capabilities. This test method clearly shows how the new filtering functionality can be applied in a personalized context, which is valuable for developers looking to implement similar features.It's crucial to ensure that edge cases and error handling are covered in additional tests, especially when dealing with dynamic filters that might be subject to varying input conditions.
- 512-541: The test method
should_use_static_metadataFilter
demonstrates the use of a static metadata filter to retrieve specific content from the embedding store. The setup and assertion are straightforward, effectively showing how to apply a simple filter condition. This test method complements the dynamic filter tests by showcasing another aspect of the new filtering functionality.As with the dynamic filters, it's important to cover a range of scenarios and edge cases in separate unit tests to ensure the robustness of the static filtering functionality.
langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java (3)
- 21-24: The class-level documentation clearly explains the purpose of
EmbeddingStoreWithFilteringIT
. It's well-structured to serve as a base for testing anyEmbeddingStore
implementation that supports filtering with theFilter
API. This approach promotes code reuse and ensures consistency across different implementations.- 26-73: The test method
should_filter_by_metadata
is well-structured and covers a broad range of filter operations. It tests the functionality of filtering embeddings by metadata, ensuring that only embeddings with matching metadata are returned. The use of@ParameterizedTest
and@MethodSource
for data-driven testing is appropriate and enhances test coverage. However, consider adding more detailed comments within the test body to explain the rationale behind specific assertions, especially for complex filter conditions.- 1135-1178: Similar to
should_filter_by_metadata
, theshould_filter_by_metadata_not
method effectively tests the negation of filter conditions. It's crucial for ensuring that theFilter
API correctly handlesnot
operations. Like the previous method, adding comments to explain complex assertions or test setups would improve readability and maintainability.
langchain4j-core/src/main/java/dev/langchain4j/data/document/Document.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java
Outdated
Show resolved
Hide resolved
...hain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java
Outdated
Show resolved
Hide resolved
...hain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java
Outdated
Show resolved
Hide resolved
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java
Show resolved
Hide resolved
...t/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStoreCloudIT.java
Show resolved
Hide resolved
...c/test/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStoreIT.java
Show resolved
Hide resolved
...in4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/Document.java
Show resolved
Hide resolved
.../main/java/dev/langchain4j/store/embedding/inmemory/GsonInMemoryEmbeddingStoreJsonCodec.java
Show resolved
Hide resolved
...hain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/TypeChecker.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files selected for processing (25)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParser.java (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderIT.java (1 hunks)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParserTest.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchRequest.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java (6 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/MetadataFilterBuilder.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsEqualTo.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsGreaterThan.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsGreaterThanOrEqualTo.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsIn.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsLessThan.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsLessThanOrEqualTo.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotEqualTo.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotIn.java (1 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetrieverTest.java (1 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java (1 hunks)
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/filter/FilterTest.java (1 hunks)
- langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStore.java (6 hunks)
- langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchMetadataFilterMapper.java (1 hunks)
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java (5 hunks)
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusMetadataFilterMapper.java (1 hunks)
- langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStore.java (3 hunks)
- langchain4j/src/test/java/dev/langchain4j/service/AiServicesWithRagIT.java (3 hunks)
Files not summarized due to errors (2)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParserTest.java: Error: Message exceeds token limit
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java: Error: Message exceeds token limit
Files not reviewed due to errors (1)
- SqlFilterParserTest.java (no review received)
Files skipped from review as they are similar to previous changes (12)
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/main/java/dev/langchain4j/store/embedding/filter/parser/sql/SqlFilterParser.java
- embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql/src/test/java/dev/langchain4j/store/embedding/filter/builder/sql/LanguageModelSqlFilterBuilderIT.java
- langchain4j-core/src/main/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetriever.java
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingSearchRequest.java
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/Filter.java
- langchain4j-core/src/test/java/dev/langchain4j/store/embedding/filter/FilterTest.java
- langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchEmbeddingStore.java
- langchain4j-elasticsearch/src/main/java/dev/langchain4j/store/embedding/elasticsearch/ElasticsearchMetadataFilterMapper.java
- langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusMetadataFilterMapper.java
- langchain4j/src/main/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStore.java
- langchain4j/src/test/java/dev/langchain4j/service/AiServicesWithRagIT.java
Additional comments: 37
langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsEqualTo.java (2)
- 20-22: The constructor correctly validates the
key
andcomparisonValue
parameters to ensure they are not blank or null, respectively. This is crucial for avoiding runtime errors due to invalid inputs.- 33-52: The
test
method implements the logic for checking equality between thecomparisonValue
and the value retrieved from theMetadata
object. It correctly handles type compatibility and usesNumberComparator.compareAsBigDecimals
for comparing numeric values, which is a good practice for handling potential issues with floating-point comparisons.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotEqualTo.java (2)
- 20-22: The constructor in
IsNotEqualTo
performs necessary validation on thekey
andcomparisonValue
, ensuring they are not blank or null. This validation is essential for maintaining data integrity and preventing runtime errors.- 33-52: The
test
method inIsNotEqualTo
correctly implements the logic for inequality checks. It usesNumberComparator.compareAsBigDecimals
for numeric comparisons, which is a robust approach for handling floating-point numbers. The method also correctly returnstrue
when theMetadata
object does not contain the specified key, which aligns with the expected behavior for an inequality check.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsLessThan.java (2)
- 20-22: The constructor in
IsLessThan
ensures that thekey
andcomparisonValue
are valid, usingensureNotBlank
andensureNotNull
. This validation is crucial for preventing null pointer exceptions and ensuring that the filter operates on valid data.- 33-52: The
test
method inIsLessThan
correctly implements the comparison logic, including type compatibility checks and numeric comparison usingcompareAsBigDecimals
. This approach is effective for handling both exact and floating-point numeric comparisons, ensuring accurate results across different numeric types.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsGreaterThan.java (2)
- 20-22: The constructor in
IsGreaterThan
performs necessary input validation, ensuring that neither thekey
nor thecomparisonValue
are invalid. This is important for preventing errors during the filter's operation.- 33-52: The
test
method inIsGreaterThan
correctly implements the greater-than comparison logic. It includes comprehensive type checks and usescompareAsBigDecimals
for numeric comparisons, which is a robust approach for ensuring accurate comparisons across different numeric types.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsLessThanOrEqualTo.java (2)
- 20-22: The constructor in
IsLessThanOrEqualTo
ensures that the inputs are valid, usingensureNotBlank
andensureNotNull
. This validation is essential for the filter's reliability and correctness.- 33-52: The
test
method inIsLessThanOrEqualTo
implements the comparison logic effectively, including type compatibility checks and numeric comparison usingcompareAsBigDecimals
. This method ensures accurate and consistent comparison results, which is crucial for the filter's functionality.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsGreaterThanOrEqualTo.java (2)
- 20-22: The constructor in
IsGreaterThanOrEqualTo
correctly validates the inputs, ensuring that thekey
andcomparisonValue
are not invalid. This validation is crucial for the filter's operation and prevents potential runtime errors.- 33-52: The
test
method inIsGreaterThanOrEqualTo
effectively implements the comparison logic, including type compatibility checks and numeric comparison usingcompareAsBigDecimals
. This approach ensures accurate comparisons across different numeric types, which is essential for the filter's functionality.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsIn.java (2)
- 24-28: The constructor in
IsIn
performs necessary input validation and creates an unmodifiable set from the provided collection of comparison values. This ensures immutability and thread safety of thecomparisonValues
field. Additionally, it validates that none of the comparison values are null, which is important for preventing null pointer exceptions during the filter's operation.- 39-58: The
test
method inIsIn
correctly implements the logic for checking if a value is within the specified set of values. It includes type compatibility checks and handles numeric comparisons usingcontainsAsBigDecimals
. This approach ensures accurate and efficient containment checks, which is crucial for the filter's functionality.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotIn.java (2)
- 24-28: The constructor in
IsNotIn
ensures that the inputs are valid and creates an unmodifiable set from the provided collection of comparison values. This approach guarantees the immutability of thecomparisonValues
field and validates that none of the comparison values are null, which is essential for the filter's reliability.- 39-58: The
test
method inIsNotIn
effectively implements the logic for checking if a value is not within the specified set of values. It includes type compatibility checks and handles numeric comparisons usingcontainsAsBigDecimals
. This method ensures accurate and efficient containment checks, which is crucial for the filter's functionality.langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/MetadataFilterBuilder.java (2)
- 24-25: The constructor of
MetadataFilterBuilder
correctly validates thekey
parameter to ensure it is not blank. This validation is essential for preventing errors during the construction of filters.- 36-218: The methods in
MetadataFilterBuilder
provide a comprehensive and fluent API for constructing various types of filters, including equality, inequality, greater/less than, and containment checks. Each method correctly instantiates the appropriate filter class with validated inputs. This design promotes ease of use and flexibility in constructing complex filter expressions.langchain4j-core/src/test/java/dev/langchain4j/rag/content/retriever/EmbeddingStoreContentRetrieverTest.java (12)
- 38-48: The
@BeforeEach
method correctly sets up the necessary mocks and expectations for the tests. It's good practice to isolate test setup to ensure each test runs with a clean state.- 50-54: The
@AfterEach
method ensures that theEMBEDDING_MODEL
mock is used as expected and that there are no unexpected interactions. This is a good practice for verifying mock behavior and ensuring test cleanliness.- 56-71: This test method
should_retrieve
correctly verifies the basic retrieval functionality using default configurations. It's well-structured and follows best practices for unit testing.- 79-92: The use of the builder pattern in
should_retrieve_builder
test method is a good practice, allowing for more readable and maintainable test setup. This test method correctly asserts the behavior of the retrieval process using the builder pattern.- 100-114: The test
should_retrieve_with_custom_maxResults
effectively demonstrates the functionality of customizing the maximum results. It's important to test such configurations to ensure the system behaves as expected under different settings.- 122-136: Similar to the previous comment,
should_retrieve_with_custom_maxResults_builder
confirms that the builder pattern supports custom maximum results. This consistency in testing both direct instantiation and builder pattern usage is commendable.- 140-158: Testing dynamic configuration with
should_retrieve_with_custom_dynamicMaxResults_builder
is crucial for ensuring that the system can adapt to runtime conditions. This test method correctly sets up and verifies dynamic max results functionality.- 161-181: The
should_retrieve_with_custom_minScore_ctor
test method correctly demonstrates the functionality of customizing the minimum score through the constructor. Testing various ways to configure the retriever is important for flexibility and usability.- 188-203: Using the builder pattern to customize the minimum score in
should_retrieve_with_custom_minScore_builder
is another example of thorough testing for different configurations. This ensures that the builder pattern is fully functional and supports all necessary options.- 206-225: Dynamic minimum score configuration tested in
should_retrieve_with_custom_dynamicMinScore_builder
is essential for scenarios where the score threshold might need to be adjusted based on the query. This test method effectively verifies this functionality.- 228-250: Testing the retrieval with a custom filter in
should_retrieve_with_custom_filter
is crucial for ensuring that the system can correctly apply metadata-based filtering. This test method properly sets up and verifies the filter functionality.- 253-275: The
should_retrieve_with_custom_dynamicFilter
test method extends the filtering capabilities by testing dynamic filter creation based on the query. This is an important feature for advanced use cases, and the test method correctly verifies its behavior.langchain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java (3)
- 15-23: The addition of imports for
Metadata
,EmbeddingSearchResult
,EmbeddingSearchRequest
, andFilter
aligns with the PR's objectives to enhance theEmbeddingStore
API and introduce filtering capabilities. Ensure that these new dependencies are used effectively throughout the implementation and that there are no unused imports to maintain code cleanliness.- 37-41: The updated class documentation clearly communicates the enhanced capabilities of the
MilvusEmbeddingStore
, including support for managing Milvus instances, storingMetadata
, and filtering by it. This documentation update is crucial for developers to understand the purpose and capabilities of the class. It's well-written and informative.- 47-47: The introduction of the
METADATA_FIELD_NAME
constant is a good practice for maintaining consistency and avoiding hard-coded strings throughout the code. This change supports the expandedMetadata
API and the storage of metadata alongside embeddings. Ensure that this constant is used consistently wherever the metadata field name is required.langchain4j-core/src/test/java/dev/langchain4j/store/embedding/EmbeddingStoreWithFilteringIT.java (4)
- 24-24: The class is well-structured as an abstract integration test suite for
EmbeddingStore
implementations, ensuring that extending classes can leverage these tests to verify their filtering capabilities. This approach promotes code reuse and consistency across differentEmbeddingStore
implementations.- 26-73: The
should_filter_by_metadata
method correctly sets up test data, performs searches with and without filters, and asserts the expected results. The use ofawaitUntilPersisted()
is appropriate for handling asynchronous behavior or eventual consistency in theEmbeddingStore
. This method effectively tests the filtering capabilities of theEmbeddingStore
API.- 1135-1178: The
should_filter_by_metadata_not
method follows a similar pattern toshould_filter_by_metadata
, testing the negation of filters. It properly sets up test data, executes searches, and asserts the outcomes, including handling asynchronous behavior withawaitUntilPersisted()
. This method ensures that theFilter
API's negation logic works as expected.- 75-1132: The parameterized test data provided by the
MethodSource
methodsshould_filter_by_metadata
andshould_filter_by_metadata_not
is comprehensive and well-organized. It covers a wide range of filter operations and scenarios, ensuring thorough testing of theFilter
API's capabilities. This approach effectively validates the filtering logic across various data types and conditions.
...hain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java
Show resolved
Hide resolved
...hain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java
Show resolved
Hide resolved
...hain4j-milvus/src/main/java/dev/langchain4j/store/embedding/milvus/MilvusEmbeddingStore.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (5)
- langchain4j-core/src/main/java/dev/langchain4j/data/document/Document.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java (8 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/data/segment/TextSegment.java (8 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java (6 hunks)
- langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3 hunks)
Files skipped from review as they are similar to previous changes (3)
- langchain4j-core/src/main/java/dev/langchain4j/data/document/Document.java
- langchain4j-core/src/main/java/dev/langchain4j/data/segment/TextSegment.java
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/EmbeddingStore.java
Additional comments: 7
langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java (4)
- 31-47: The addition of new supported value types (
int
,long
,float
,double
alongside their wrapper classes) is a significant improvement for theMetadata
class, enhancing its flexibility and utility. This change aligns with the PR objectives to expand theMetadata
API to support more value types. It's also good to see the use of both primitive and wrapper types to ensure compatibility with different use cases.- 105-118: The
getString
method is a welcome addition, providing a type-safe way to retrieveString
values from the metadata. The use of@Experimental
annotation is appropriate here, indicating that the API is subject to change. However, it's important to ensure that the documentation clearly explains the method's behavior, especially regarding its return ofnull
if the key is not present or if the value is not of the expected type. This clarity will help developers understand the method's contract and handle its usage correctly.- 136-151: The
getInteger
method introduces flexibility in handling numeric metadata values, accommodating both directInteger
retrievals and parsing fromString
. This method, along with similar ones forLong
,Float
, andDouble
, addresses the potential issues with type loss during serialization/deserialization processes, as mentioned in the comments. It's a thoughtful design choice that enhances theMetadata
class's usability across different storage implementations. However, ensure that the potentialNumberFormatException
from parsing strings is either documented or handled gracefully to avoid unexpected crashes.- 298-356: The new
put
methods for adding key-value pairs to the metadata are well-designed, providing a clear and type-safe way to work with different value types. Marking these methods as@Experimental
allows for future adjustments based on feedback and usage patterns. It's good to see consistent validation across these methods, ensuring that keys and values meet the expected criteria before being added to the metadata. This approach enhances the robustness and reliability of theMetadata
class.langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3)
- 23-23: The class
InMemoryEmbeddingStoreTest
now extendsEmbeddingStoreWithFilteringIT
instead ofEmbeddingStoreIT
. This change reflects an update in the inheritance hierarchy to accommodate the new filtering capabilities introduced in the PR. Ensure that all relevant test cases fromEmbeddingStoreIT
are still valid or appropriately adapted inEmbeddingStoreWithFilteringIT
to maintain comprehensive test coverage.- 69-69: The use of
.hasSameHashCodeAs
in the assertion block is a minor formatting adjustment that enhances the clarity of the test's intention. This change ensures that not only the contents are equal but also that the hash codes of the deserialized and original entries match, which is a good practice for ensuring the integrity of serialization and deserialization processes.- 81-105: The addition of the
test_backwards_compatibility
method is a significant and valuable enhancement. It tests the JSON deserialization and matching functionality to ensure that the system remains compatible with data produced by previous versions. This is crucial for maintaining data integrity and system reliability across version updates. The test method is well-structured and covers essential aspects of backwards compatibility, including deserialization from a predefined JSON string and verifying the expected behavior of thefindRelevant
method.However, consider adding more detailed comments within the test method to explain the purpose of each block of code, especially the JSON string's structure and the significance of the specific version it represents. This will improve the maintainability and understandability of the test case for future contributors.
Verification successful
The verification process confirms that the
test_backwards_compatibility
method is the only test specifically designed for testing backwards compatibility within the Java files of the project. This underscores the importance of this test in ensuring the system's compatibility with data produced by previous versions, as highlighted in the review comment.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify if there are more tests related to backwards compatibility across different versions. rg "test_backwards_compatibility" --type javaLength of output: 186
langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java
Show resolved
Hide resolved
langchain4j-core/src/main/java/dev/langchain4j/data/document/Metadata.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3 hunks)
Files skipped from review as they are similar to previous changes (1)
- langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java
…va client which enables Java 8 support (#663)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to path filters (6)
langchain4j-cassandra/pom.xml
is excluded by:!**/*.xml
langchain4j-infinispan/pom.xml
is excluded by:!**/*.xml
langchain4j-neo4j/pom.xml
is excluded by:!**/*.xml
langchain4j-opensearch/pom.xml
is excluded by:!**/*.xml
langchain4j-parent/pom.xml
is excluded by:!**/*.xml
pom.xml
is excluded by:!**/*.xml
Files selected for processing (3)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsIn.java (1 hunks)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotIn.java (1 hunks)
- langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java (3 hunks)
Files skipped from review as they are similar to previous changes (3)
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsIn.java
- langchain4j-core/src/main/java/dev/langchain4j/store/embedding/filter/comparison/IsNotIn.java
- langchain4j/src/test/java/dev/langchain4j/store/embedding/inmemory/InMemoryEmbeddingStoreTest.java
@langchain4j , Thanks a lot for this new feature, it is exactly what we need. May I know if I can use the metadata filter function for AWS OpenSearch embedding store? If not now, may I know when it will be available? Thanks. |
@ansonma I do not plan to add this feature to all embedding store integrations as I do not have enough time. I hope that someone from community could contribute this, considering that foundation is there, examples (elasticsearch, milvus) are there, it should be pretty easy. |
Hello, |
Hi, |
Hello, my company is investigating Langchain4J as our long term AI framework for our Java stack apps, can someone help to add the OpenSearch metadata filtering function? This is the only piece missing for our requirements for now. Thanks in advance. |
@andyflury would you like to contribute this? It should be pretty straightforward. There is an implementation for Elasticsearch already, I assume the one for OpenSearch will be similar/same? Thanks a lot in advance! |
New EmbeddingStore (metadata)
Filter
APIMany embedding stores, such as Pinecone and Milvus support strict filtering (think of an SQL "WHERE" clause) during similarity search.
So, if one has an embedding store with movies, for example, one could search not only for the most semantically similar movies to the given user query but also apply strict filtering by metadata fields like year, genre, rating, etc. In this case, the similarity search will be performed only on those movies that match the filter expression.
Since LangChain4j supports (and abstracts away) many embedding stores, there needs to be an embedding-store-agnostic way for users to define the filter expression.
This PR introduces a
Filter
interface, which can represent both simple (e.g.,type = "documentation"
) and composite (e.g.,type in ("documentation", "tutorial") AND year > 2020
) filter expressions in an embedding-store-agnostic manner.Filter
currently supports the following operations:Comparison:
IsEqualTo
IsNotEqualTo
IsGreaterThan
IsGreaterThanOrEqualTo
IsLessThan
IsLessThanOrEqualTo
IsIn
IsNotIn
Logical:
And
Not
Or
These operations are supported by most embedding stores and serve as a good starting point. However, the list of operations will expand over time to include other operations (e.g.,
Contains
) supported by embedding stores.Currently, the DSL looks like this:
Filter expression as a
String
Filter expression can also be specified as a
String
. This might be necessary, for example, if the filter expression is generated dynamically by the application or by the LLM (as in self querying).This PR introduces a
FilterParser
interface with a simpleFilter parse(String)
API, allowing for future support of multiple syntaxes (if this will be required).For the out-of-the-box filter syntax, ANSI SQL's
WHERE
clause is proposed as a suitable candidate for several reasons:The downside is that SQL's
WHERE
clause might not support all operations and data types that could be supported in the future by various embedding stores. In such case, we could extend it to a superset of ANSI SQLWHERE
syntax and/or provide an option to express filters in the native syntax of the store.An out-of-the-box implementation of the SQL
FilterParser
is provided as aSqlFilterParser
in a separate modulelangchain4j-embedding-store-filter-parser-sql
, using JSqlParser under the hood.SqlFilterParser
can parse SQL "SELECT" (or just "WHERE" clause) statement into aFilter
object:SELECT * FROM fake_table WHERE userId = '123-456'
->metadataKey("userId").isEqualTo("123-456")
userId = '123-456'
->metadataKey("userId").isEqualTo("123-456")
It can also resolve
CURDATE()
andCURRENT_DATE
/CURRENT_TIME
/CURRENT_TIMESTAMP
:SELECT * FROM fake_table WHERE year = EXTRACT(YEAR FROM CURRENT_DATE
->metadataKey("year").isEqualTo(LocalDate.now().getYear())
Changes in
Metadata
APIUntil now,
Metadata
supported onlyString
values. This PR expands the list of supported value types toInteger
,Long
,Float
andDouble
. In the future, more types may be added (if needed).The method
String get(String key)
will be deprecated later in favor of:String getString(String key)
Integer getInteger(String key)
Long getLong(String key)
New overloaded
put(key, value)
methods are introduced to support more value types:put(String key, int value)
put(String key, long value)
Changes in
EmbeddingStore
APINew method
search
is added that will become the main entry point for search in the future. AllfindRelevant
methods will be deprecated later.New
search
method acceptsEmbeddingSearchRequest
and returnsEmbeddingSearchResult
.EmbeddingSearchRequest
contains all search criteria (e.g.maxResults
,minScore
), including newFilter
.EmbeddingSearchResult
contains a list ofEmbeddingMatch
.Changes in
EmbeddingStoreContentRetriever
APIEmbeddingStoreContentRetriever
can now be configured with a staticfilter
as well as dynamicdynamicMaxResults
,dynamicMinScore
anddynamicFilter
in the builder:So now you can define
maxResults
,minScore
andfilter
both statically and dynamically (they can depend on the query, user, etc.). These values will be propagated to the underlyingEmbeddingStore
."Self-querying"
This PR also introduces
LanguageModelSqlFilterBuilder
inlangchain4j-embedding-store-filter-parser-sql
module which can be used withEmbeddingStoreContentRetriever
'sdynamicFilter
to automatically build aFilter
object from theQuery
using language model andSqlFilterParser
.For example:
Which embedding store integrations will support
Filter
?In the long run, all (provided the embedding store itself supports it).
In the first iteration, I aim to add support to just a few:
InMemoryEmbeddingStore
Summary by CodeRabbit
Summary by CodeRabbit
InMemoryEmbeddingStoreTest
to extend a different class for improved testing coverage and added a new test method.