CodeExp with CodeLlama & FAISS 🧠

Welcome to the CodeExp, an advanced, state-of-the-art framework designed to analyze, explain, and optimize Python codebases. This repository leverages CodeLlama, LangChain, and FAISS to deliver a seamless, interactive experience for code comprehension and improvement.

🚀 Purpose

The Code Analysis Pipeline provides an automated solution for:

Code Understanding: Analyze Python code for functionality and structure.
Knowledge Extraction: Generate clear and actionable insights using LLMs.
Code Optimization: Suggest performance improvements and best practices.
Technical Education: Simplify complex code concepts for learners and professionals.

🎯 Techniques & Workflow

Document Loading & Splitting:
- Recursively scans the specified directory for Python files.
- Splits large files into manageable chunks for efficient processing.
Semantic Embedding Generation:
- Extracts embeddings using a HuggingFace embedding model.
Vector Store Creation:
- Builds a FAISS vector store for semantic search and retrieval.
Question Answering (QA):
- Processes user queries through a QA Chain with a retriever.
Code Analysis & Explanation:
- Analyzes results using CodeLlama and simplifies explanations with templates.
Improvement Suggestions:
- Leverages LLMs to suggest actionable optimizations.

✨ Features

📂 Recursive Document Loading: Processes entire directories with customizable file extensions.
✂️ Text Splitting: Splits large files into smaller chunks for precise embeddings.
🧠 Advanced Embedding Models: Uses HuggingFace's embeddings for high-quality vector representations.
🔍 Efficient Retrieval: Semantic search powered by FAISS.
🦙 LLM-Powered Analysis: Code analysis and explanations via CodeLlama.
📈 Optimization Suggestions: Provides practical tips for code improvements.
🔗 Seamless Integration: Designed to integrate with other AI tools and pipelines.

🛠 Technologies

Technology	Purpose
LangChain	Modular framework for building LLM-based workflows.
FAISS	Vector similarity search for efficient code retrieval.
CodeLlama	Advanced code understanding via LLMs.
HuggingFace Hub	Hosting and serving LLMs and embeddings.
Python	Primary programming language.

📋 Getting Started

1️⃣ Clone the Repository

git clone https://github.com/MohammedNasserAhmed/CodeExp.git
cd code-analysis-pipeline

2️⃣ Install Dependencies

Install required libraries with:

pip install -r requirements.txt

3️⃣ Set Environment Variables

Create a .env file or export these variables directly:

MODEL=<YOUR_LLAMA_MODEL_VERSION>
HUGGINGFACEHUB_API_TOKEN=<Your_HuggingFace_Token>
REPO_ID=<Your_HuggingFace_Repo_ID>
CODEBASE_DIR=<Path_to_Your_Codebase>
EMBEDDING_MODEL=<HuggingFace_Embedding_Model>

4️⃣ Run the Pipeline

python app.py

5️⃣ Interact with the Agent

Provide a query like:

How to replace FAISS with CHORMA .

🌟 Pipeline Architecture

+--------------------+       +--------------------+       +----------------------+
| Document Loader    |-----> | Text Splitter      |-----> | Embedding Generator  |
+--------------------+       +--------------------+       +----------------------+
                                                         |
                                                         v
                                    +----------------------------------+
                                    | FAISS Vector Store               |
                                    +----------------------------------+
                                                         |
                                                         v
                                    +----------------------------------+
                                    | Retrieval-Based QA Chain         |
                                    +----------------------------------+
                                                         |
                                                         v
                            +--------------------------------------------+
                            | CodeLlama Agent for Analysis & Explanations |
                            +--------------------------------------------+
                                                         |
                                                         v
                                   +----------------------------------+
                                   | Suggestions for Code Improvement |
                                   +----------------------------------+

📂 Project Structure

CodeExp/
│
├── codeexp/
│   ├── components/
│   │   ├── load_document.py           # Handles document loading from the codebase
│   │   ├── split_text.py              # Splits documents into manageable chunks
│   │   ├── get_embeddings.py          # Generates embeddings using HuggingFace models
│   │   ├── codellama_agent.py         # Code analysis agent powered by Llama models
│   │   ├── vector_store.py # Manages FAISS vector store initialization
│   │   ├── llm_agent.py               # Handles LLM setup and question-answering
│   │
│   ├── config/
│   │   ├── constants.py               # Contains configurations like API tokens and file paths
│
├── tests/                             # Contains unit tests for all components
│   ├── test_load_document.py          # Tests for the document loader
│   ├── test_split_text.py             # Tests for the text splitter
│   ├── test_get_embeddings.py         # Tests for the embedding generator
│   ├── test_codellama_agent.py        # Tests for the CodeLlama agent
│   ├── test_initialize_vector_store.py# Tests for the FAISS vector store
│   └── test_llm_agent.py              # Tests for the LLM setup and QA chain
│
├── .gitignore                         # Specifies files and folders to ignore in version control
├── requirements.txt                   # Dependencies required for the project
├── README.md                          # Project documentation (you are here!)

🔍 Run Tests

To verify the functionality of the components, use pytest:

Run all tests:

pytest CodeExp/tests/

Run tests with detailed output:

pytest -v

Run tests for a specific component:

pytest CodeExp/tests/test_<component_name>.py

Generate a coverage report (requires pytest-cov):

pip install pytest-cov
pytest --cov=CodeExp/codechat

🎓 Use Cases

Developers: Enhance understanding of complex codebases.
Educators: Provide clear code explanations for learners.
Researchers: Analyze algorithmic code for optimization.
Organizations: Maintain clean, optimized, and well-documented repositories.

🛡 Best Practices

File Types: Ensure the target codebase contains supported extensions (e.g., .py).
Environment Setup: Use a virtual environment to isolate dependencies.
Model Performance: Adjust embedding and LLM parameters for optimal results.

🤝 Contributing

We welcome contributions! If you'd like to improve the pipeline, please:

Fork this repository.
Create a new branch for your feature or fix.
Submit a pull request with a detailed description.

🔧 Project Maintenance

Key Maintainer

M. N. Gaber

📜 License

This project is licensed under the Apache License. See the LICENSE file for details.

🌐 Contact

Feel free to reach out for questions or feedback:

📧 Email: abunasserip@gmail.com
🐦 LinkedIn: @M.N.Gaber

🏆 Acknowledgments

Special thanks to:

HuggingFace for hosting world-class AI models.
LangChain for simplifying LLM workflows.
FAISS for fast and efficient retrieval.

🚀 Ready to revolutionize code analysis? Dive in today and supercharge your development process! 🦾

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeExp with CodeLlama & FAISS 🧠

🚀 Purpose

🎯 Techniques & Workflow

✨ Features

🛠 Technologies

📋 Getting Started

1️⃣ Clone the Repository

2️⃣ Install Dependencies

3️⃣ Set Environment Variables

4️⃣ Run the Pipeline

5️⃣ Interact with the Agent

🌟 Pipeline Architecture

📂 Project Structure

🔍 Run Tests

🎓 Use Cases

🛡 Best Practices

🤝 Contributing

🔧 Project Maintenance

Key Maintainer

📜 License

🌐 Contact

🏆 Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
codeexp		codeexp
notebooks		notebooks
templates		templates
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py

License

MohammedNasserAhmed/CodeXpert

Folders and files

Latest commit

History

Repository files navigation

CodeExp with CodeLlama & FAISS 🧠

🚀 Purpose

🎯 Techniques & Workflow

✨ Features

🛠 Technologies

📋 Getting Started

1️⃣ Clone the Repository

2️⃣ Install Dependencies

3️⃣ Set Environment Variables

4️⃣ Run the Pipeline

5️⃣ Interact with the Agent

🌟 Pipeline Architecture

📂 Project Structure

🔍 Run Tests

🎓 Use Cases

🛡 Best Practices

🤝 Contributing

🔧 Project Maintenance

Key Maintainer

📜 License

🌐 Contact

🏆 Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages