Skip to content

CodeXpert: A cutting-edge AI-powered code analysis tool leveraging CodeLlama, FAISS, and HuggingFace for efficient code understanding, explanation, and optimization. πŸš€βœ¨

License

Notifications You must be signed in to change notification settings

MohammedNasserAhmed/CodeXpert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CodeExp with CodeLlama & FAISS 🧠

Python 3.9+ CodeLlama FAISS HuggingFace License


Welcome to the CodeExp, an advanced, state-of-the-art framework designed to analyze, explain, and optimize Python codebases. This repository leverages CodeLlama, LangChain, and FAISS to deliver a seamless, interactive experience for code comprehension and improvement.


πŸš€ Purpose

The Code Analysis Pipeline provides an automated solution for:

  • Code Understanding: Analyze Python code for functionality and structure.
  • Knowledge Extraction: Generate clear and actionable insights using LLMs.
  • Code Optimization: Suggest performance improvements and best practices.
  • Technical Education: Simplify complex code concepts for learners and professionals.

🎯 Techniques & Workflow

  1. Document Loading & Splitting:
    • Recursively scans the specified directory for Python files.
    • Splits large files into manageable chunks for efficient processing.
  2. Semantic Embedding Generation:
    • Extracts embeddings using a HuggingFace embedding model.
  3. Vector Store Creation:
    • Builds a FAISS vector store for semantic search and retrieval.
  4. Question Answering (QA):
    • Processes user queries through a QA Chain with a retriever.
  5. Code Analysis & Explanation:
    • Analyzes results using CodeLlama and simplifies explanations with templates.
  6. Improvement Suggestions:
    • Leverages LLMs to suggest actionable optimizations.

✨ Features

  • πŸ“‚ Recursive Document Loading: Processes entire directories with customizable file extensions.
  • βœ‚οΈ Text Splitting: Splits large files into smaller chunks for precise embeddings.
  • 🧠 Advanced Embedding Models: Uses HuggingFace's embeddings for high-quality vector representations.
  • πŸ” Efficient Retrieval: Semantic search powered by FAISS.
  • πŸ¦™ LLM-Powered Analysis: Code analysis and explanations via CodeLlama.
  • πŸ“ˆ Optimization Suggestions: Provides practical tips for code improvements.
  • πŸ”— Seamless Integration: Designed to integrate with other AI tools and pipelines.

πŸ›  Technologies

Technology Purpose
LangChain Modular framework for building LLM-based workflows.
FAISS Vector similarity search for efficient code retrieval.
CodeLlama Advanced code understanding via LLMs.
HuggingFace Hub Hosting and serving LLMs and embeddings.
Python Primary programming language.

πŸ“‹ Getting Started

1️⃣ Clone the Repository

git clone https://github.com/MohammedNasserAhmed/CodeExp.git
cd code-analysis-pipeline

2️⃣ Install Dependencies

Install required libraries with:

pip install -r requirements.txt

3️⃣ Set Environment Variables

Create a .env file or export these variables directly:

MODEL=<YOUR_LLAMA_MODEL_VERSION>
HUGGINGFACEHUB_API_TOKEN=<Your_HuggingFace_Token>
REPO_ID=<Your_HuggingFace_Repo_ID>
CODEBASE_DIR=<Path_to_Your_Codebase>
EMBEDDING_MODEL=<HuggingFace_Embedding_Model>

4️⃣ Run the Pipeline

python app.py

5️⃣ Interact with the Agent

Provide a query like:

How to replace FAISS with CHORMA .

🌟 Pipeline Architecture

+--------------------+       +--------------------+       +----------------------+
| Document Loader    |-----> | Text Splitter      |-----> | Embedding Generator  |
+--------------------+       +--------------------+       +----------------------+
                                                         |
                                                         v
                                    +----------------------------------+
                                    | FAISS Vector Store               |
                                    +----------------------------------+
                                                         |
                                                         v
                                    +----------------------------------+
                                    | Retrieval-Based QA Chain         |
                                    +----------------------------------+
                                                         |
                                                         v
                            +--------------------------------------------+
                            | CodeLlama Agent for Analysis & Explanations |
                            +--------------------------------------------+
                                                         |
                                                         v
                                   +----------------------------------+
                                   | Suggestions for Code Improvement |
                                   +----------------------------------+

πŸ“‚ Project Structure

CodeExp/
β”‚
β”œβ”€β”€ codeexp/
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ load_document.py           # Handles document loading from the codebase
β”‚   β”‚   β”œβ”€β”€ split_text.py              # Splits documents into manageable chunks
β”‚   β”‚   β”œβ”€β”€ get_embeddings.py          # Generates embeddings using HuggingFace models
β”‚   β”‚   β”œβ”€β”€ codellama_agent.py         # Code analysis agent powered by Llama models
β”‚   β”‚   β”œβ”€β”€ vector_store.py # Manages FAISS vector store initialization
β”‚   β”‚   β”œβ”€β”€ llm_agent.py               # Handles LLM setup and question-answering
β”‚   β”‚
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   β”œβ”€β”€ constants.py               # Contains configurations like API tokens and file paths
β”‚
β”œβ”€β”€ tests/                             # Contains unit tests for all components
β”‚   β”œβ”€β”€ test_load_document.py          # Tests for the document loader
β”‚   β”œβ”€β”€ test_split_text.py             # Tests for the text splitter
β”‚   β”œβ”€β”€ test_get_embeddings.py         # Tests for the embedding generator
β”‚   β”œβ”€β”€ test_codellama_agent.py        # Tests for the CodeLlama agent
β”‚   β”œβ”€β”€ test_initialize_vector_store.py# Tests for the FAISS vector store
β”‚   └── test_llm_agent.py              # Tests for the LLM setup and QA chain
β”‚
β”œβ”€β”€ .gitignore                         # Specifies files and folders to ignore in version control
β”œβ”€β”€ requirements.txt                   # Dependencies required for the project
β”œβ”€β”€ README.md                          # Project documentation (you are here!)

πŸ” Run Tests

To verify the functionality of the components, use pytest:

Run all tests:

pytest CodeExp/tests/

Run tests with detailed output:

pytest -v

Run tests for a specific component:

pytest CodeExp/tests/test_<component_name>.py

Generate a coverage report (requires pytest-cov):

pip install pytest-cov
pytest --cov=CodeExp/codechat

πŸŽ“ Use Cases

  • Developers: Enhance understanding of complex codebases.
  • Educators: Provide clear code explanations for learners.
  • Researchers: Analyze algorithmic code for optimization.
  • Organizations: Maintain clean, optimized, and well-documented repositories.

πŸ›‘ Best Practices

  • File Types: Ensure the target codebase contains supported extensions (e.g., .py).
  • Environment Setup: Use a virtual environment to isolate dependencies.
  • Model Performance: Adjust embedding and LLM parameters for optimal results.

🀝 Contributing

We welcome contributions! If you'd like to improve the pipeline, please:

  1. Fork this repository.
  2. Create a new branch for your feature or fix.
  3. Submit a pull request with a detailed description.

πŸ”§ Project Maintenance

Key Maintainer


πŸ“œ License

This project is licensed under the Apache License. See the LICENSE file for details.


🌐 Contact

Feel free to reach out for questions or feedback:


πŸ† Acknowledgments

Special thanks to:

  • HuggingFace for hosting world-class AI models.
  • LangChain for simplifying LLM workflows.
  • FAISS for fast and efficient retrieval.

πŸš€ Ready to revolutionize code analysis? Dive in today and supercharge your development process! 🦾

About

CodeXpert: A cutting-edge AI-powered code analysis tool leveraging CodeLlama, FAISS, and HuggingFace for efficient code understanding, explanation, and optimization. πŸš€βœ¨

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages