-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(skill): redesign skills and fix code exec env bug #836
Conversation
WalkthroughThe updates involve reorganizing import statements, refining method parameters and return types, and enhancing the Changes
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChat with CodeRabbit Bot (
|
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #836 +/- ##
=======================================
Coverage 85.11% 85.12%
=======================================
Files 88 88
Lines 3816 3838 +22
=======================================
+ Hits 3248 3267 +19
- Misses 568 571 +3 ☔ View full report in Codecov by Sentry. |
@mspronesti thanks a lot for the PR, great improvement. Can I review it or is it still on draft? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (7)
- pandasai/agent/init.py (3 hunks)
- pandasai/helpers/code_manager.py (6 hunks)
- pandasai/helpers/skills_manager.py (3 hunks)
- pandasai/skills/init.py (1 hunks)
- pandasai/smart_dataframe/init.py (4 hunks)
- pandasai/smart_datalake/init.py (2 hunks)
- tests/skills/test_skills.py (10 hunks)
Additional comments: 32
pandasai/helpers/skills_manager.py (4)
1-2: The import modifications align with the changes to the
Skill
class. Ensure that theList
type is used correctly for_skills
and_used_skills
attributes.17-17: The
add_skills
method now correctly expectsSkill
objects as its parameters, which is consistent with the refactor. Ensure that all calls toadd_skills
throughout the codebase have been updated to passSkill
objects.67-67: The refactoring of the
__str__
method to use list comprehension is a good improvement for readability and performance.69-69: The
prompt_display
method now correctly returnsOptional[str]
. This change should be verified to ensure that all usages ofprompt_display
handle the possibility of aNone
return value correctly.pandasai/skills/__init__.py (4)
6-12: The
Skill
class is well-defined with appropriate attributes and private attributes for internal use. Ensure that the private attributes_name
and_signature
are set correctly during instantiation and not just in thefrom_function
class method.55-60: The
__str__
method provides a clear string representation of theSkill
. Ensure that the_signature
attribute is correctly populated for this representation to be accurate.62-64: The
name
property provides access to the private_name
attribute. Ensure that this property is used consistently throughout the codebase instead of direct access to the_name
attribute.67-93: The
skill
decorator is correctly defined to create aSkill
object from a function with a docstring. Ensure that the error handling for the decorator is consistent with the expected usage patterns.pandasai/agent/__init__.py (2)
15-15: The import statement has been updated to reflect the new
Skill
class. Ensure that the rest of the file uses theSkill
class correctly.52-52: The
add_skills
method signature now correctly acceptsSkill
objects. Ensure that all calls to this method passSkill
objects and that the method's implementation handles them correctly.tests/skills/test_skills.py (10)
12-12: The import statement has been updated to include both
skill
andSkill
. Ensure that the tests use these imports correctly.82-90: The
test_add_skills
method has been updated to use theskill
decorator andSkill.from_function
. Ensure that the tests verify the correct behavior of adding skills and handling duplicate skill names.163-164: The
test_prompt_display
method checks for the correct prompt display. Ensure that the test covers scenarios where no skills are present and the method should returnNone
.170-176: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [173-190]
The
test_skill_decorator_test_code
method tests theskill
decorator. Ensure that the test verifies the correct creation ofSkill
objects and their string representation.
198-201: The
test_add_skills_with_agent
method has been updated to reflect the newSkill
class usage. Ensure that the test verifies the agent's ability to add and manage skills correctly.216-219: The
test_add_skills_with_smartDataframe
method has been updated similarly to thetest_add_skills_with_agent
method. Verify that theSmartDataframe
class interacts correctly with theSkill
class.234-235: The
test_run_prompt
method includes a string representation of theplot_salaries
function. Ensure that the test verifies the correct inclusion of this representation in the prompt.261-262: The
test_run_prompt_agent
method is similar totest_run_prompt
. Verify that the agent correctly includes theplot_salaries
function in the prompt.294-295: The
test_run_prompt_without_skills
method checks that no skills are included in the prompt when none are added. Ensure that the test covers this scenario correctly.314-314: The
test_code_exec_with_skills
method tests the execution of code with skills. Verify that the test checks for the correct usage of skills during code execution.pandasai/helpers/code_manager.py (7)
32-37: The introduction of the
CodeExecutionContext
class withprompt_id
andskills_manager
as properties is a good encapsulation of the execution context. This should help in maintaining a clean execution state.53-56: Renaming
_ast_comparatos_map
to_ast_comparator_map
corrects a typo and improves code readability.121-124: The
_validate_direct_sql
method has been added to ensure that direct SQL execution is only allowed when certain conditions are met. This is a good security practice to prevent unauthorized database access.197-203: The addition of
_validate_direct_sql
in theexecute_code
method is a good security measure. However, ensure that the method is thoroughly tested, especially since it raises an exception that could affect the execution flow.Verification successful
The shell script output indicates that the
_validate_direct_sql
method is indeed called within the tests, specifically intests/test_codemanager.py
andtests/test_smartdatalake.py
. This suggests that the method is covered by the test suite, which is a positive sign for the changes made in the PR.
- The method is directly used in a test in
tests/test_codemanager.py
.- There is also a dedicated test named
test_validate_direct_sql_with_connector
intests/test_smartdatalake.py
.* 200-203: Adding skills to the execution environment based on `used_skills` is a dynamic way to extend functionality. It's important to ensure that the skills being added are secure and do not introduce any vulnerabilities.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the _validate_direct_sql method is covered by tests. rg --type python "_validate_direct_sql" --files-with-matches | xargs rg "def test_.*_validate_direct_sql"Length of output: 136
Script:
#!/bin/bash # Correct the flag for Python files and verify that the _validate_direct_sql method is covered by tests. rg --type py "_validate_direct_sql" --files-with-matches | xargs rg "def test_.*_validate_direct_sql"Length of output: 101
Script:
#!/bin/bash # Search for any tests that call the _validate_direct_sql method. rg --type py "def test_" --files-with-matches | xargs rg "_validate_direct_sql"Length of output: 266
Verification successful
The script output provides the following information:
- The file
tests/skills/test_skills.py
exists and contains several test cases related to skills management, including tests for adding skills, checking if a skill exists, retrieving a skill by function name, and adding used skills. Notably, thetest_add_used_skill
method directly tests the addition of used skills to theSkillsManager
, which is relevant to the original concern about adding skills to the execution environment.- The files
pandasai/tests/skills_manager.py
andpandasai/tests/code_manager.py
do not exist. This suggests that the inferred test file paths based on the directory structure were incorrect, or these test files are not present in the repository.Given the context, we can conclude:
- The
test_skills.py
file contains tests that likely cover the security and functionality of theused_skills
feature.- There is no direct evidence of test files named
skills_manager.py
orcode_manager.py
in thepandasai/tests
directory, which may indicate a gap in testing or a different file structure for tests.Based on the available information, the following points can be made:
- : The
test_skills.py
file includes tests that cover the functionality related toused_skills
, which suggests that the feature is tested.- : The non-existence of
pandasai/tests/skills_manager.py
andpandasai/tests/code_manager.py
should be manually verified to ensure that all relevant code paths are tested, especially since the script could not find these files. It's possible that tests for theskills_manager
andcode_manager
functionalities are located in different files or under different naming conventions.* 560-563: The use of the corrected `_ast_comparator_map` in `_extract_comparisons` aligns with the renaming change and ensures consistency in the codebase.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the skills being added to the environment are secure and tested. rg --type python "used_skills" --files-with-matches | xargs rg "def test_.*_used_skills"Length of output: 119
Script:
#!/bin/bash # Verify that the skills being added to the environment are secure and tested. # Removing the --type python argument to ensure ripgrep searches all files. rg "used_skills" --files-with-matches | xargs rg "def test_.*_used_skills"Length of output: 74
Script:
#!/bin/bash # Search for test files related to the files where `used_skills` is mentioned. # This will help us identify if there are tests covering the functionality of `used_skills`. used_skills_files=$(rg "used_skills" --files-with-matches) echo "Files with 'used_skills':" echo "$used_skills_files" echo "" # For each file that contains `used_skills`, find related test files. echo "Related test files:" for file in $used_skills_files; do test_file=$(echo $file | sed 's/\/helpers\//\/tests\//') echo $test_file doneLength of output: 1138
Script:
#!/bin/bash # Verify the existence of the inferred test files and list their contents to check for relevant tests. test_files=("tests/skills/test_skills.py" "pandasai/tests/skills_manager.py" "pandasai/tests/code_manager.py") for test_file in "${test_files[@]}"; do if [ -f "$test_file" ]; then echo "Contents of $test_file:" cat "$test_file" echo "" else echo "$test_file does not exist." fi doneLength of output: 11153
- 575-578: The update in
_extract_comparisons
to use the renamed_ast_comparator_map
is consistent with the changes made earlier in the class. This maintains the integrity of the comparison extraction logic.pandasai/smart_dataframe/__init__.py (3)
30-30: The import statement has been modified to reflect the new location of the
Skill
class. This change seems consistent with the PR's objective to refactorSkill
into a proper type.236-236: The
__init__
method's parameter type has been updated to accept bothDataFrameType
andBaseConnector
. Ensure that all usages ofSmartDataframe
throughout the codebase have been updated to handle the new parameter type correctly.292-292: The
add_skills
method now accepts a variable number ofSkill
objects directly, rather than a list. This change simplifies the addition of skills but requires verification that all calls toadd_skills
have been updated accordingly.Verification successful
The search with
rg
for the string "add_skills" has produced several results across the codebase, including test files and documentation. Here are the findings:
- Calls to
add_skills
are found intests/skills/test_skills.py
, which suggests that the test suite has been updated to use the new method signature.- The method is also used in
pandasai/smart_datalake/__init__.py
,pandasai/smart_dataframe/__init__.py
,pandasai/helpers/skills_manager.py
, andpandasai/agent/__init__.py
. These instances indicate that the method is part of the public API and is used within the library itself.- There are examples of
add_skills
being used inexamples/skills_example.py
and in the documentation filesdocs/skills.md
anddocs/examples.md
.Given the results, it appears that the calls to
add_skills
within the test suite and possibly within the library have been updated to match the new method signature. However, without seeing the exact context of each call, it's not possible to definitively confirm that all calls passSkill
objects correctly according to the new signature.Since the test suite includes calls to
add_skills
and the tests are passing (as mentioned in the PR objectives), it's reasonable to infer that the method is being used correctly in those instances. For the other occurrences in the library and documentation, manual verification by the user or the maintainers would be necessary to ensure that all instances conform to the new signature.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for calls to `add_skills` to ensure they are passing `Skill` objects correctly. ast-grep --lang python --pattern $'add_skills($$$)'Length of output: 52
Script:
#!/bin/bash # Search for calls to `add_skills` within the context of the SmartDataframe class. ast-grep --lang python --pattern $'class SmartDataframe { $$$ def add_skills($$$) { $$$ } $$$ $$$ }'Length of output: 460
Script:
#!/bin/bash # Use ripgrep to search for the string "add_skills" to find any textual occurrences of the method call. rg 'add_skills'Length of output: 1941
pandasai/smart_datalake/__init__.py (2)
26-26: The import statement has been updated to import
Skill
instead ofskill
. This change aligns with the PR objectives to refactorSkill
into a proper type.246-249: The method signature of
add_skills
has been changed to accept a variable number ofSkill
objects. Ensure that the implementation ofadd_skills
withinSkillsManager
is compatible with this change and that it can handle multipleSkill
objects being passed as separate arguments.Verification successful
The
add_skills
method is indeed present in multiple files within thepandas-ai
codebase, includingsmart_datalake/__init__.py
,smart_dataframe/__init__.py
,helpers/skills_manager.py
, andagent/__init__.py
. Each instance of the method accepts a variable number ofSkill
objects as indicated by the*skills: Skill
in the method signature.Given this information, it can be inferred that the method signature change to accept a variable number of
Skill
objects has been applied consistently across these files. However, without the method bodies, we cannot confirm whether the implementation has been updated to handle individualSkill
objects properly.To verify the implementation, we would need to inspect the bodies of the
add_skills
methods in these files. Since therg
command only provided the lines with the method signatures, we should now attempt to retrieve the method bodies.
The method bodies for
add_skills
have been successfully retrieved from the four files where the method is defined. Here's a summary of the implementation in each file:
- In
pandasai/smart_datalake/__init__.py
, theadd_skills
method delegates the call to an internal_skills
object with the same method name.- In
pandasai/smart_dataframe/__init__.py
, theadd_skills
method delegates the call to alake
object.- In
pandasai/helpers/skills_manager.py
, theadd_skills
method iterates over theskills
variable argument list and performs a check before adding each skill to an internal_skills
list.- In
pandasai/agent/__init__.py
, theadd_skills
method delegates the call to an internal_lake
object.From this information, it can be verified that:
- The
add_skills
method is consistently defined to accept a variable number ofSkill
objects across different modules.- The implementations in
smart_datalake
,smart_dataframe
, andagent
are straightforward, as they pass theSkill
objects to another object'sadd_skills
method.- The implementation in
skills_manager
is more complex, involving a loop and a conditional check to ensure no skill with the same name is added twice.Given the consistency of the method signatures and the appropriate handling of the variable number of
Skill
objects in the method bodies, it can be concluded that the implementation ofadd_skills
is compatible with the change in the method signature.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the SkillsManager's add_skills method can handle multiple Skill objects. ast-grep --lang python --pattern $'class SkillsManager { $$$ def add_skills($_) { $$$ } $$$ }'Length of output: 452
Script:
#!/bin/bash # Refine the ast-grep pattern to search for the add_skills method within any class, to handle potential variations in class names or method definitions. ast-grep --lang python --pattern $'class $_ { $$$ def add_skills($$$) { $$$ } $$$ }'Length of output: 442
Script:
#!/bin/bash # Use ripgrep to search for the add_skills method definition in the codebase. rg "def add_skills" --pythonLength of output: 441
Script:
#!/bin/bash # Correct the ripgrep command by removing the invalid --python flag. rg "def add_skills"Length of output: 324
Script:
#!/bin/bash # Retrieve the method bodies of the add_skills method from the files identified. rg -A 10 "def add_skills" pandasai/smart_datalake/__init__.py rg -A 10 "def add_skills" pandasai/smart_dataframe/__init__.py rg -A 10 "def add_skills" pandasai/helpers/skills_manager.py rg -A 10 "def add_skills" pandasai/agent/__init__.pyLength of output: 1527
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (1)
- pandasai/skills/init.py (1 hunks)
Additional comments: 3
pandasai/skills/__init__.py (3)
48-50: The
__call__
method implementation is correct and adheres to the expected behavior of making theSkill
instance callable.52-64: The
from_function
class method is a clean and Pythonic way to createSkill
instances from functions. It correctly uses class method syntax and forwards the received arguments to the class constructor.66-71: The
__str__
method provides a string representation of theSkill
instance, including the signature and description. This is useful for debugging and logging purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (1)
- examples/skills_example.py (1 hunks)
Additional comments: 2
examples/skills_example.py (2)
- 23-29: The docstring for
plot_salaries
has been updated to reflect the use of matplotlib, which aligns with the changes mentioned in the summary. Ensure that the rest of the codebase that may rely onplot_salaries
is aware of this change fromstreamlit
tomatplotlib
.Verification successful
The previous script contained an error with the file type specification. Let's correct this and rerun the search to ensure that usages of
plot_salaries
are consistent with the changes.
The search results indicate that the
plot_salaries
function is used in tests, examples, and documentation. To ensure consistency with the changes, we need to verify that the documentation and examples reflect the switch fromstreamlit
tomatplotlib
.
The executed script did not find any mentions of 'streamlit' in relation to 'plot_salaries' within the documentation and examples, suggesting that these references have been updated to be consistent with the use of 'matplotlib'.
* 23-29: The `plot_salaries` function is correctly decorated with `@skill`, which is in line with the PR's objective to enhance the `Skill` construct. The function is then added to the agent's skills, which seems appropriate. However, ensure that the `YOUR-API-KEY` placeholder is meant to be replaced by the user and that there is no risk of committing actual API keys to the codebase.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usages of plot_salaries to ensure they are updated to handle matplotlib. rg --type python 'plot_salaries'Length of output: 63
Script:
#!/bin/bash # Search for usages of plot_salaries to ensure they are updated to handle matplotlib. rg 'plot_salaries'Length of output: 1426
Script:
#!/bin/bash # Check the documentation and examples for mentions of streamlit in relation to plot_salaries rg 'plot_salaries' docs/ examples/ | rg 'streamlit'Length of output: 49
@gventuri ready to go :) |
@mspronesti thanks a lot, great changes (and thanks for fixing the skills, I should probably enforce with more tests to make sure this is more robust and cannot easily be broken)! 🚀 |
Hi @gventuri,
this PR aims at making
Skill
a proper, pythonic type while supporting a (proper) decorator. This new implementation requires (purposely) the function to have a docstring when decorated with@skill
. In addition, one can create a skill as followsSummary by CodeRabbit
Refactor
Skill
class with new creation methods and properties.Bug Fixes
Tests
Documentation