Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(skill): redesign skills and fix code exec env bug #836

Merged
merged 7 commits into from
Dec 29, 2023

Conversation

mspronesti
Copy link
Contributor

@mspronesti mspronesti commented Dec 26, 2023

Hi @gventuri,
this PR aims at making Skill a proper, pythonic type while supporting a (proper) decorator. This new implementation requires (purposely) the function to have a docstring when decorated with @skill. In addition, one can create a skill as follows

skill = Skill.from_function(func=some_func, description=...)

Summary by CodeRabbit

  • Refactor

    • Streamlined the process for adding skills to the system.
    • Enhanced the logic for SQL comparison extraction.
    • Improved the Skill class with new creation methods and properties.
    • Updated the handling of skills within the SmartDataframe class.
  • Bug Fixes

    • Fixed incorrect type hints and method return types.
  • Tests

    • Refined test cases for skill addition and prompt display functionalities.
  • Documentation

    • Added docstrings to clarify the purpose of test functions.

Copy link
Contributor

coderabbitai bot commented Dec 26, 2023

Walkthrough

The updates involve reorganizing import statements, refining method parameters and return types, and enhancing the Skill class with new functionalities. The add_skills method now accepts a single Skill object, and the Skill class has gained a constructor from functions, improved string representation, and a new decorator. Changes in the code manager reflect a more accurate SQL validation logic, while the skills manager shows better integration with the Skill class.

Changes

File Path Change Summary
pandasai/agent/__init__.py, pandasai/smart_datalake/__init__.py Reorganized imports; add_skills accepts single Skill object.
pandasai/helpers/code_manager.py Removed can_direct_sql, renamed _ast_comparator_map, changed _validate_direct_sql return type to bool, updated comparison logic.
pandasai/helpers/skills_manager.py Modified Skill import, updated add_skills type hint, refactored __str__, changed prompt_display return type.
pandasai/skills/__init__.py Enhanced Skill class with from_function, new __str__, name property; added skill decorator.
pandasai/smart_dataframe/__init__.py Updated imports, altered __init__ parameter type, changed add_skills type signature.
tests/skills/test_skills.py Updated Skill imports, replaced Mock with functions and Skill instances, modified test_prompt_display, new test_skill_decorator_test_code, updated test_add_skills_with... methods, added docstring in test_run_prompt_agent and test_code_exec_with_skills.

Assessment against linked issues

Objective Addressed Explanation
Fix NameError in SkillManager parsing logic (#839) The changes made to the skills management and the way Skill is imported and used suggest that the issue would be resolved.

Poem

Code hops through the fields of AI,
Bugs scatter as the updates fly.
A single Skill now joins the fray,
In Panda's land, the smart data play. 🐼🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • You can directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • You can tag CodeRabbit on specific lines of code or entire files in the PR by tagging @coderabbitai in a comment. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • You can tag @coderabbitai in a PR comment and ask questions about the PR and the codebase. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid.
    • @coderabbitai read the files in the src/scheduler package and generate README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@codecov-commenter
Copy link

codecov-commenter commented Dec 26, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (5b2d0ef) 85.11% compared to head (1e07e73) 85.12%.

Files Patch % Lines
pandasai/skills/__init__.py 90.62% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #836   +/-   ##
=======================================
  Coverage   85.11%   85.12%           
=======================================
  Files          88       88           
  Lines        3816     3838   +22     
=======================================
+ Hits         3248     3267   +19     
- Misses        568      571    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gventuri
Copy link
Collaborator

@mspronesti thanks a lot for the PR, great improvement. Can I review it or is it still on draft?

@mspronesti
Copy link
Contributor Author

@gventuri Ready :) I also solved #839.

@mspronesti mspronesti marked this pull request as ready for review December 28, 2023 02:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5b2d0ef and ee70d0c.
Files selected for processing (7)
  • pandasai/agent/init.py (3 hunks)
  • pandasai/helpers/code_manager.py (6 hunks)
  • pandasai/helpers/skills_manager.py (3 hunks)
  • pandasai/skills/init.py (1 hunks)
  • pandasai/smart_dataframe/init.py (4 hunks)
  • pandasai/smart_datalake/init.py (2 hunks)
  • tests/skills/test_skills.py (10 hunks)
Additional comments: 32
pandasai/helpers/skills_manager.py (4)
  • 1-2: The import modifications align with the changes to the Skill class. Ensure that the List type is used correctly for _skills and _used_skills attributes.

  • 17-17: The add_skills method now correctly expects Skill objects as its parameters, which is consistent with the refactor. Ensure that all calls to add_skills throughout the codebase have been updated to pass Skill objects.

  • 67-67: The refactoring of the __str__ method to use list comprehension is a good improvement for readability and performance.

  • 69-69: The prompt_display method now correctly returns Optional[str]. This change should be verified to ensure that all usages of prompt_display handle the possibility of a None return value correctly.

pandasai/skills/__init__.py (4)
  • 6-12: The Skill class is well-defined with appropriate attributes and private attributes for internal use. Ensure that the private attributes _name and _signature are set correctly during instantiation and not just in the from_function class method.

  • 55-60: The __str__ method provides a clear string representation of the Skill. Ensure that the _signature attribute is correctly populated for this representation to be accurate.

  • 62-64: The name property provides access to the private _name attribute. Ensure that this property is used consistently throughout the codebase instead of direct access to the _name attribute.

  • 67-93: The skill decorator is correctly defined to create a Skill object from a function with a docstring. Ensure that the error handling for the decorator is consistent with the expected usage patterns.

pandasai/agent/__init__.py (2)
  • 15-15: The import statement has been updated to reflect the new Skill class. Ensure that the rest of the file uses the Skill class correctly.

  • 52-52: The add_skills method signature now correctly accepts Skill objects. Ensure that all calls to this method pass Skill objects and that the method's implementation handles them correctly.

tests/skills/test_skills.py (10)
  • 12-12: The import statement has been updated to include both skill and Skill. Ensure that the tests use these imports correctly.

  • 82-90: The test_add_skills method has been updated to use the skill decorator and Skill.from_function. Ensure that the tests verify the correct behavior of adding skills and handling duplicate skill names.

  • 163-164: The test_prompt_display method checks for the correct prompt display. Ensure that the test covers scenarios where no skills are present and the method should return None.

  • 170-176: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [173-190]

The test_skill_decorator_test_code method tests the skill decorator. Ensure that the test verifies the correct creation of Skill objects and their string representation.

  • 198-201: The test_add_skills_with_agent method has been updated to reflect the new Skill class usage. Ensure that the test verifies the agent's ability to add and manage skills correctly.

  • 216-219: The test_add_skills_with_smartDataframe method has been updated similarly to the test_add_skills_with_agent method. Verify that the SmartDataframe class interacts correctly with the Skill class.

  • 234-235: The test_run_prompt method includes a string representation of the plot_salaries function. Ensure that the test verifies the correct inclusion of this representation in the prompt.

  • 261-262: The test_run_prompt_agent method is similar to test_run_prompt. Verify that the agent correctly includes the plot_salaries function in the prompt.

  • 294-295: The test_run_prompt_without_skills method checks that no skills are included in the prompt when none are added. Ensure that the test covers this scenario correctly.

  • 314-314: The test_code_exec_with_skills method tests the execution of code with skills. Verify that the test checks for the correct usage of skills during code execution.

pandasai/helpers/code_manager.py (7)
  • 32-37: The introduction of the CodeExecutionContext class with prompt_id and skills_manager as properties is a good encapsulation of the execution context. This should help in maintaining a clean execution state.

  • 53-56: Renaming _ast_comparatos_map to _ast_comparator_map corrects a typo and improves code readability.

  • 121-124: The _validate_direct_sql method has been added to ensure that direct SQL execution is only allowed when certain conditions are met. This is a good security practice to prevent unauthorized database access.

  • 197-203: The addition of _validate_direct_sql in the execute_code method is a good security measure. However, ensure that the method is thoroughly tested, especially since it raises an exception that could affect the execution flow.

Verification successful

The shell script output indicates that the _validate_direct_sql method is indeed called within the tests, specifically in tests/test_codemanager.py and tests/test_smartdatalake.py. This suggests that the method is covered by the test suite, which is a positive sign for the changes made in the PR.

  • The method is directly used in a test in tests/test_codemanager.py.
  • There is also a dedicated test named test_validate_direct_sql_with_connector in tests/test_smartdatalake.py.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the _validate_direct_sql method is covered by tests.
rg --type python "_validate_direct_sql" --files-with-matches | xargs rg "def test_.*_validate_direct_sql"

Length of output: 136



Script:

#!/bin/bash
# Correct the flag for Python files and verify that the _validate_direct_sql method is covered by tests.
rg --type py "_validate_direct_sql" --files-with-matches | xargs rg "def test_.*_validate_direct_sql"

Length of output: 101



Script:

#!/bin/bash
# Search for any tests that call the _validate_direct_sql method.
rg --type py "def test_" --files-with-matches | xargs rg "_validate_direct_sql"

Length of output: 266

* 200-203: Adding skills to the execution environment based on `used_skills` is a dynamic way to extend functionality. It's important to ensure that the skills being added are secure and do not introduce any vulnerabilities.
Verification successful


The script output provides the following information:

  • The file tests/skills/test_skills.py exists and contains several test cases related to skills management, including tests for adding skills, checking if a skill exists, retrieving a skill by function name, and adding used skills. Notably, the test_add_used_skill method directly tests the addition of used skills to the SkillsManager, which is relevant to the original concern about adding skills to the execution environment.
  • The files pandasai/tests/skills_manager.py and pandasai/tests/code_manager.py do not exist. This suggests that the inferred test file paths based on the directory structure were incorrect, or these test files are not present in the repository.

Given the context, we can conclude:

  • The test_skills.py file contains tests that likely cover the security and functionality of the used_skills feature.
  • There is no direct evidence of test files named skills_manager.py or code_manager.py in the pandasai/tests directory, which may indicate a gap in testing or a different file structure for tests.

Based on the available information, the following points can be made:

  • : The test_skills.py file includes tests that cover the functionality related to used_skills, which suggests that the feature is tested.
  • : The non-existence of pandasai/tests/skills_manager.py and pandasai/tests/code_manager.py should be manually verified to ensure that all relevant code paths are tested, especially since the script could not find these files. It's possible that tests for the skills_manager and code_manager functionalities are located in different files or under different naming conventions.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the skills being added to the environment are secure and tested.
rg --type python "used_skills" --files-with-matches | xargs rg "def test_.*_used_skills"

Length of output: 119



Script:

#!/bin/bash
# Verify that the skills being added to the environment are secure and tested.
# Removing the --type python argument to ensure ripgrep searches all files.
rg "used_skills" --files-with-matches | xargs rg "def test_.*_used_skills"

Length of output: 74



Script:

#!/bin/bash
# Search for test files related to the files where `used_skills` is mentioned.
# This will help us identify if there are tests covering the functionality of `used_skills`.
used_skills_files=$(rg "used_skills" --files-with-matches)
echo "Files with 'used_skills':"
echo "$used_skills_files"
echo ""

# For each file that contains `used_skills`, find related test files.
echo "Related test files:"
for file in $used_skills_files; do
  test_file=$(echo $file | sed 's/\/helpers\//\/tests\//')
  echo $test_file
done

Length of output: 1138



Script:

#!/bin/bash
# Verify the existence of the inferred test files and list their contents to check for relevant tests.
test_files=("tests/skills/test_skills.py" "pandasai/tests/skills_manager.py" "pandasai/tests/code_manager.py")

for test_file in "${test_files[@]}"; do
  if [ -f "$test_file" ]; then
    echo "Contents of $test_file:"
    cat "$test_file"
    echo ""
  else
    echo "$test_file does not exist."
  fi
done

Length of output: 11153

* 560-563: The use of the corrected `_ast_comparator_map` in `_extract_comparisons` aligns with the renaming change and ensures consistency in the codebase.
  • 575-578: The update in _extract_comparisons to use the renamed _ast_comparator_map is consistent with the changes made earlier in the class. This maintains the integrity of the comparison extraction logic.
pandasai/smart_dataframe/__init__.py (3)
  • 30-30: The import statement has been modified to reflect the new location of the Skill class. This change seems consistent with the PR's objective to refactor Skill into a proper type.

  • 236-236: The __init__ method's parameter type has been updated to accept both DataFrameType and BaseConnector. Ensure that all usages of SmartDataframe throughout the codebase have been updated to handle the new parameter type correctly.

  • 292-292: The add_skills method now accepts a variable number of Skill objects directly, rather than a list. This change simplifies the addition of skills but requires verification that all calls to add_skills have been updated accordingly.

Verification successful

The search with rg for the string "add_skills" has produced several results across the codebase, including test files and documentation. Here are the findings:

  • Calls to add_skills are found in tests/skills/test_skills.py, which suggests that the test suite has been updated to use the new method signature.
  • The method is also used in pandasai/smart_datalake/__init__.py, pandasai/smart_dataframe/__init__.py, pandasai/helpers/skills_manager.py, and pandasai/agent/__init__.py. These instances indicate that the method is part of the public API and is used within the library itself.
  • There are examples of add_skills being used in examples/skills_example.py and in the documentation files docs/skills.md and docs/examples.md.

Given the results, it appears that the calls to add_skills within the test suite and possibly within the library have been updated to match the new method signature. However, without seeing the exact context of each call, it's not possible to definitively confirm that all calls pass Skill objects correctly according to the new signature.

Since the test suite includes calls to add_skills and the tests are passing (as mentioned in the PR objectives), it's reasonable to infer that the method is being used correctly in those instances. For the other occurrences in the library and documentation, manual verification by the user or the maintainers would be necessary to ensure that all instances conform to the new signature.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for calls to `add_skills` to ensure they are passing `Skill` objects correctly.
ast-grep --lang python --pattern $'add_skills($$$)'

Length of output: 52



Script:

#!/bin/bash
# Search for calls to `add_skills` within the context of the SmartDataframe class.
ast-grep --lang python --pattern $'class SmartDataframe {
  $$$
  def add_skills($$$) {
    $$$
  }
  $$$
  $$$
}'

Length of output: 460



Script:

#!/bin/bash
# Use ripgrep to search for the string "add_skills" to find any textual occurrences of the method call.
rg 'add_skills'

Length of output: 1941

pandasai/smart_datalake/__init__.py (2)
  • 26-26: The import statement has been updated to import Skill instead of skill. This change aligns with the PR objectives to refactor Skill into a proper type.

  • 246-249: The method signature of add_skills has been changed to accept a variable number of Skill objects. Ensure that the implementation of add_skills within SkillsManager is compatible with this change and that it can handle multiple Skill objects being passed as separate arguments.

Verification successful


The add_skills method is indeed present in multiple files within the pandas-ai codebase, including smart_datalake/__init__.py, smart_dataframe/__init__.py, helpers/skills_manager.py, and agent/__init__.py. Each instance of the method accepts a variable number of Skill objects as indicated by the *skills: Skill in the method signature.

Given this information, it can be inferred that the method signature change to accept a variable number of Skill objects has been applied consistently across these files. However, without the method bodies, we cannot confirm whether the implementation has been updated to handle individual Skill objects properly.

To verify the implementation, we would need to inspect the bodies of the add_skills methods in these files. Since the rg command only provided the lines with the method signatures, we should now attempt to retrieve the method bodies.


The method bodies for add_skills have been successfully retrieved from the four files where the method is defined. Here's a summary of the implementation in each file:

  • In pandasai/smart_datalake/__init__.py, the add_skills method delegates the call to an internal _skills object with the same method name.
  • In pandasai/smart_dataframe/__init__.py, the add_skills method delegates the call to a lake object.
  • In pandasai/helpers/skills_manager.py, the add_skills method iterates over the skills variable argument list and performs a check before adding each skill to an internal _skills list.
  • In pandasai/agent/__init__.py, the add_skills method delegates the call to an internal _lake object.

From this information, it can be verified that:

  • The add_skills method is consistently defined to accept a variable number of Skill objects across different modules.
  • The implementations in smart_datalake, smart_dataframe, and agent are straightforward, as they pass the Skill objects to another object's add_skills method.
  • The implementation in skills_manager is more complex, involving a loop and a conditional check to ensure no skill with the same name is added twice.

Given the consistency of the method signatures and the appropriate handling of the variable number of Skill objects in the method bodies, it can be concluded that the implementation of add_skills is compatible with the change in the method signature.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the SkillsManager's add_skills method can handle multiple Skill objects.
ast-grep --lang python --pattern $'class SkillsManager {
  $$$
  def add_skills($_) {
    $$$
  }
  $$$
}'

Length of output: 452



Script:

#!/bin/bash
# Refine the ast-grep pattern to search for the add_skills method within any class, to handle potential variations in class names or method definitions.
ast-grep --lang python --pattern $'class $_ {
  $$$
  def add_skills($$$) {
    $$$
  }
  $$$
}'

Length of output: 442



Script:

#!/bin/bash
# Use ripgrep to search for the add_skills method definition in the codebase.
rg "def add_skills" --python

Length of output: 441



Script:

#!/bin/bash
# Correct the ripgrep command by removing the invalid --python flag.
rg "def add_skills"

Length of output: 324



Script:

#!/bin/bash
# Retrieve the method bodies of the add_skills method from the files identified.
rg -A 10 "def add_skills" pandasai/smart_datalake/__init__.py
rg -A 10 "def add_skills" pandasai/smart_dataframe/__init__.py
rg -A 10 "def add_skills" pandasai/helpers/skills_manager.py
rg -A 10 "def add_skills" pandasai/agent/__init__.py

Length of output: 1527

pandasai/skills/__init__.py Outdated Show resolved Hide resolved
pandasai/skills/__init__.py Outdated Show resolved Hide resolved
@mspronesti mspronesti changed the title feat(skill): make Skill a type feat(skill): redesign skills and fix code exec env bug Dec 28, 2023
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between ee70d0c and e588910.
Files selected for processing (1)
  • pandasai/skills/init.py (1 hunks)
Additional comments: 3
pandasai/skills/__init__.py (3)
  • 48-50: The __call__ method implementation is correct and adheres to the expected behavior of making the Skill instance callable.

  • 52-64: The from_function class method is a clean and Pythonic way to create Skill instances from functions. It correctly uses class method syntax and forwards the received arguments to the class constructor.

  • 66-71: The __str__ method provides a string representation of the Skill instance, including the signature and description. This is useful for debugging and logging purposes.

pandasai/skills/__init__.py Show resolved Hide resolved
pandasai/skills/__init__.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between e588910 and 1e07e73.
Files selected for processing (1)
  • examples/skills_example.py (1 hunks)
Additional comments: 2
examples/skills_example.py (2)
  • 23-29: The docstring for plot_salaries has been updated to reflect the use of matplotlib, which aligns with the changes mentioned in the summary. Ensure that the rest of the codebase that may rely on plot_salaries is aware of this change from streamlit to matplotlib.
Verification successful

The previous script contained an error with the file type specification. Let's correct this and rerun the search to ensure that usages of plot_salaries are consistent with the changes.


The search results indicate that the plot_salaries function is used in tests, examples, and documentation. To ensure consistency with the changes, we need to verify that the documentation and examples reflect the switch from streamlit to matplotlib.


The executed script did not find any mentions of 'streamlit' in relation to 'plot_salaries' within the documentation and examples, suggesting that these references have been updated to be consistent with the use of 'matplotlib'.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for usages of plot_salaries to ensure they are updated to handle matplotlib.
rg --type python 'plot_salaries'

Length of output: 63



Script:

#!/bin/bash
# Search for usages of plot_salaries to ensure they are updated to handle matplotlib.
rg 'plot_salaries'

Length of output: 1426



Script:

#!/bin/bash
# Check the documentation and examples for mentions of streamlit in relation to plot_salaries
rg 'plot_salaries' docs/ examples/ | rg 'streamlit'

Length of output: 49

* 23-29: The `plot_salaries` function is correctly decorated with `@skill`, which is in line with the PR's objective to enhance the `Skill` construct. The function is then added to the agent's skills, which seems appropriate. However, ensure that the `YOUR-API-KEY` placeholder is meant to be replaced by the user and that there is no risk of committing actual API keys to the codebase.

examples/skills_example.py Show resolved Hide resolved
@mspronesti
Copy link
Contributor Author

@gventuri ready to go :)
I've also added the option to specify a custom name for the skill and fixed the skill example (which was not working).

@gventuri gventuri merged commit 75b431a into Sinaptik-AI:main Dec 29, 2023
10 checks passed
@gventuri
Copy link
Collaborator

@mspronesti thanks a lot, great changes (and thanks for fixing the skills, I should probably enforce with more tests to make sure this is more robust and cannot easily be broken)! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NameError: name 'pandasai' is not defined
3 participants