Changes Neo4j add_edge method and implements unit tests around get_graph_from_model logic #474

hajdul88 · 2025-01-27T16:30:16Z

Description

This PR includes extensive unit tests around the get_graph_from_model logic and changes the neo4j add_edges logic.

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

Release Notes

New Features
- None
Bug Fixes
- None
Refactor
- Updated relationship creation logic in Neo4j database adapter
- Simplified graph processing by removing only_root parameter from various functions
Tests
- Added comprehensive unit tests for graph model generation functionality
Chores
- Streamlined task and data point processing workflows

…om_model

coderabbitai · 2025-01-27T16:30:23Z

Walkthrough

This pull request focuses on removing the only_root parameter across multiple files in the Cognee project. The changes primarily affect the add_data_points task and related graph-related functions, simplifying the function signatures and removing conditional logic associated with the only_root flag. The modifications span several files, including the API, infrastructure, tasks, and test modules, suggesting a systematic approach to removing this parameter from the codebase.

Changes

File	Change Summary
`cognee/api/v1/cognify/cognify_v2.py`	Removed `only_root=True` from `Task` instantiation in `get_default_tasks` function
`cognee/infrastructure/databases/graph/neo4j_driver/adapter.py`	Updated `add_edges` method to use `apoc.merge.relationship` with explicit node ID properties
`cognee/modules/graph/utils/get_graph_from_model.py`	Removed `only_root` parameter and associated conditional logic
`cognee/tasks/storage/add_data_points.py`	Removed `only_root` parameter from function signature
`cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py`	Added new unit tests for `get_graph_from_model` function

Sequence Diagram

sequenceDiagram
    participant API as Cognify API
    participant Tasks as Add Data Points Task
    participant GraphUtils as Graph Utilities
    participant Database as Neo4j Database

    API->>Tasks: Invoke add_data_points
    Tasks->>GraphUtils: Get graph from model
    GraphUtils-->>Tasks: Return graph nodes and edges
    Tasks->>Database: Add nodes and edges
    Database-->>Tasks: Confirm addition

Possibly related PRs

test: Add github action to test code graph #471: Removal of only_root parameter in add_data_points task

Suggested labels

run-checks

Poem

🐰 A Rabbit's Ode to Code Simplification 🐰

Farewell, only_root, you've served your time,
Streamlined and clean, our code now shines!
Nodes and edges dance with glee,
Complexity reduced, we're finally free!
Hop, hop, hooray for elegant design! 🌟

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…nsive-checks-around-the

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

208-221: Consider indexing for performance when using MATCH with UNWIND.

These lines create or merge a large number of relationships via apoc.merge.relationship. If edge.from_node and edge.to_node fields are not indexed in Neo4j, the repeated MATCH (from_node {id: edge.from_node}) and MATCH (to_node {id: edge.to_node}) operations under UNWIND could degrade performance for large lists of edges. Ensuring proper indexing on the id property will help optimize these queries.
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1)
8-8: Remove unused import.

The is_type import from IPython.utils.wildcard is not used in any of the test cases.
-from IPython.utils.wildcard import is_type

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8bde54 and e904e64.

📒 Files selected for processing (5)

cognee/api/v1/cognify/cognify_v2.py (1 hunks)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1 hunks)
cognee/modules/graph/utils/get_graph_from_model.py (1 hunks)
cognee/tasks/storage/add_data_points.py (1 hunks)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: Publish Cognee Docker image
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: windows-latest
GitHub Check: docker-compose-test

🔇 Additional comments (7)

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

208-221: Verify the new apoc.merge.relationship logic.

Previously, if this code used apoc.create.relationship, changing it to apoc.merge.relationship can alter behavior for existing relationships. Specifically, apoc.merge.relationship will merge properties on any matched relationship, rather than creating duplicates. If this is intentional (avoiding duplicates), it’s a beneficial move; otherwise, re-check whether you prefer always creating new relationships vs. merging.

cognee/tasks/storage/add_data_points.py (1)

8-8: Confirm backward compatibility after removing only_root.

Removing the only_root parameter from add_data_points may affect code depending on that argument. If any callers pass only_root or expect root-only behavior, this could introduce unexpected runtime errors or logic changes. Double-check all call sites to confirm they do not rely on that parameter.
cognee/modules/graph/utils/get_graph_from_model.py (1)

100-100: Validate that field_value always has an id attribute.

At line 100, the check depends on field_value.id. Ensure that field_value is indeed a DataPoint or a compatible type with the id attribute. If not, this code could raise runtime errors or skip creating nodes for valid objects. Consider adding input validations or type hints to make assumptions explicit.

As a follow-up, you can run a script to confirm that all field_value instances passed here are DataPoint or subtypes:

✅ Verification successful

The code already validates that field_value has an id attribute

The implementation includes proper type checking through isinstance(field_value, DataPoint) before accessing the id attribute, ensuring type safety at runtime. This validation prevents any potential attribute errors.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Use ast-grep to find calls to get_graph_from_model and examine argument types.
ast-grep --pattern $'await get_graph_from_model($_, $$$)' 
Length of output: 2472

Script:
#!/bin/bash
# Get function definition and surrounding context
ast-grep --pattern 'async def get_graph_from_model($$$)' -A 10

# Get context around the id access
rg "field_value.id" -B 5 -A 5

# Get test setup to see field_value types
rg "field_value.*=|data_point.*=" cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py -B 2 -A 2
Length of output: 5531
cognee/api/v1/cognify/cognify_v2.py (1)

163-163: Verify the impact of removing only_root parameter.

The removal of only_root=True from the add_data_points task configuration aligns with the broader refactoring effort. However, we should ensure this change doesn't affect the existing data processing behavior.

Run the following script to check for any remaining usages of only_root in the task configuration and related files:

✅ Verification successful

Removal of only_root parameter is safe

The parameter has been consistently removed across all task configurations in the codebase, and all similar tasks follow the same pattern without using this parameter. The change aligns with the broader refactoring effort and maintains consistency with other pipeline configurations.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for any remaining references to only_root in task configurations and related files
rg -g '!*.pyc' -g '!*test*' 'only_root' .
Length of output: 41

Script:
#!/bin/bash
# Find Task class definition and similar configurations
echo "=== Searching for Task class definition ==="
ast-grep --pattern 'class Task'

echo -e "\n=== Searching for task configurations ==="
rg -g '!*.pyc' -g '!*test*' 'Task\(' -A 2
Length of output: 6869
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (3)

17-32: LGTM! Well-structured test case.

The basic initialization test is well-organized with clear setup, execution, and assertions. The docstring effectively describes the test's purpose.

34-62: LGTM! Comprehensive test for single neighbor relationship.

The test effectively validates both node creation and edge formation for a simple entity-type relationship.

64-121: LGTM! Thorough test for complex nested relationships.

The test comprehensively covers multiple aspects:

Multiple entity nodes with shared type

Document and chunk relationships

Multiple edge types (is_a, is_part_of, contains)

coderabbitai · 2025-01-27T16:36:02Z

cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py

+@pytest.mark.asyncio
+async def test_get_graph_from_model_only_root():
+    """Test the behavior of get_graph_from_model when the 'only_root' option is set to True."""
+    type_node = EntityType(
+        id=uuid4(),
+        name="Vehicle",
+        description="This is a Vehicle node",
+    )
+
+    entity_node = Entity(
+        id=uuid4(),
+        name="Car",
+        is_a=type_node,
+        description="This is a car node",
+    )
+
+    document = Document(
+        name="main_document", raw_data_location="home/", metadata_id=uuid4(), mime_type="test"
+    )
+
+    chunk = DocumentChunk(
+        id=uuid4(),
+        word_count=8,
+        chunk_index=0,
+        cut_type="test",
+        text="The car and the bus are transportation tools",
+        is_part_of=document,
+        contains=[entity_node],
+    )
+
+    added_nodes = {}
+    added_edges = {}
+    visited_properties = {}
+
+    nodes, edges = await get_graph_from_model(
+        chunk, added_nodes, added_edges, visited_properties, only_root=True
+    )
+
+    # assert len(nodes) == 1 :TODO THIS HAS TO BE FINISHED
+    # assert len(edges) == 0
+    # assert str(entity_node.id) in added_nodes
+    # assert str(type_node.id) not in added_nodes


⚠️ Potential issue

Remove or update the incomplete test case.

Since the only_root parameter is being removed from the codebase (as evidenced by changes in cognify_v2.py and other files), this test case should either be:

Removed entirely as it's testing deprecated functionality, or

Updated to test the new behavior without the only_root parameter

If you choose to keep the test, here's how to update it:

-async def test_get_graph_from_model_only_root(): - """Test the behavior of get_graph_from_model when the 'only_root' option is set to True.""" +async def test_get_graph_from_model_with_document_chunk(): + """Test the behavior of get_graph_from_model with a document chunk containing an entity.""" # ... (keep the setup code) ... nodes, edges = await get_graph_from_model( - chunk, added_nodes, added_edges, visited_properties, only_root=True + chunk, added_nodes, added_edges, visited_properties ) - # assert len(nodes) == 1 :TODO THIS HAS TO BE FINISHED - # assert len(edges) == 0 - # assert str(entity_node.id) in added_nodes - # assert str(type_node.id) not in added_nodes + assert len(nodes) == 4 # chunk, document, entity, and type nodes + assert len(edges) == 3 # is_part_of, contains, and is_a edges + assert str(chunk.id) in added_nodes + assert str(document.id) in added_nodes + assert str(entity_node.id) in added_nodes + assert str(type_node.id) in added_nodes

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@pytest.mark.asyncio

async def test_get_graph_from_model_only_root():

"""Test the behavior of get_graph_from_model when the 'only_root' option is set to True."""

type_node = EntityType(

id=uuid4(),

name="Vehicle",

description="This is a Vehicle node",

)

entity_node = Entity(

id=uuid4(),

name="Car",

is_a=type_node,

description="This is a car node",

)

document = Document(

name="main_document", raw_data_location="home/", metadata_id=uuid4(), mime_type="test"

)

chunk = DocumentChunk(

id=uuid4(),

word_count=8,

chunk_index=0,

cut_type="test",

text="The car and the bus are transportation tools",

is_part_of=document,

contains=[entity_node],

)

added_nodes = {}

added_edges = {}

visited_properties = {}

nodes, edges = await get_graph_from_model(

chunk, added_nodes, added_edges, visited_properties, only_root=True

)

# assert len(nodes) == 1 :TODO THIS HAS TO BE FINISHED

# assert len(edges) == 0

# assert str(entity_node.id) in added_nodes

# assert str(type_node.id) not in added_nodes

@pytest.mark.asyncio

async def test_get_graph_from_model_with_document_chunk():

"""Test the behavior of get_graph_from_model with a document chunk containing an entity."""

type_node = EntityType(

id=uuid4(),

name="Vehicle",

description="This is a Vehicle node",

)

entity_node = Entity(

id=uuid4(),

name="Car",

is_a=type_node,

description="This is a car node",

)

document = Document(

name="main_document", raw_data_location="home/", metadata_id=uuid4(), mime_type="test"

)

chunk = DocumentChunk(

id=uuid4(),

word_count=8,

chunk_index=0,

cut_type="test",

text="The car and the bus are transportation tools",

is_part_of=document,

contains=[entity_node],

)

added_nodes = {}

added_edges = {}

visited_properties = {}

nodes, edges = await get_graph_from_model(

chunk, added_nodes, added_edges, visited_properties

)

assert len(nodes) == 4 # chunk, document, entity, and type nodes

assert len(edges) == 3 # is_part_of, contains, and is_a edges

assert str(chunk.id) in added_nodes

assert str(document.id) in added_nodes

assert str(entity_node.id) in added_nodes

assert str(type_node.id) in added_nodes

borisarzentar · 2025-01-29T09:28:07Z

cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py

+
+
+@pytest.mark.asyncio
+async def test_get_graph_from_model_only_root():


If only_root is removed from the method, let's remove the test as well.

hajdul88 added 2 commits January 20, 2025 16:01

feat: adds some unit tests for get_graph_from_model

cd61f71

feat: updates neo4j add_edges cypher and deletes shallow get_graph_fr…

354b071

…om_model

hajdul88 added the do not merge label Jan 27, 2025

Merge branch 'dev' into feature/cog-754-implement-unit-tests-and-exte…

e904e64

…nsive-checks-around-the

hajdul88 changed the title ~~Feature/cog 754 implement unit tests and extensive checks around the~~ Changes Neo4j add_edge method and implements unit tests around get_graph_from_model logic Jan 27, 2025

coderabbitai bot reviewed Jan 27, 2025

View reviewed changes

borisarzentar reviewed Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes Neo4j add_edge method and implements unit tests around get_graph_from_model logic #474

Changes Neo4j add_edge method and implements unit tests around get_graph_from_model logic #474

hajdul88 commented Jan 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 27, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 27, 2025

borisarzentar Jan 29, 2025



		@pytest.mark.asyncio
		async def test_get_graph_from_model_only_root():

Changes Neo4j add_edge method and implements unit tests around get_graph_from_model logic #474

Are you sure you want to change the base?

Changes Neo4j add_edge method and implements unit tests around get_graph_from_model logic #474

Conversation

hajdul88 commented Jan 27, 2025 • edited by coderabbitai bot Loading

Description

DCO Affirmation

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Jan 27, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram

Possibly related PRs

Suggested labels

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

borisarzentar Jan 29, 2025

Choose a reason for hiding this comment

hajdul88 commented Jan 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 27, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)