Resume Analysis System

A comprehensive system for analyzing resumes in both PDF and Word document formats, extracting key information, and generating detailed analysis reports.

Features

Multi-format Support: Process both PDF and Word documents (.pdf, .docx, .doc)
Intelligent Text Extraction:
- PDF processing using LlamaParse
- Word document processing using python-docx
Comprehensive Analysis:
- Contact information extraction
- Technical skills assessment
- Education history analysis
- Experience evaluation
- Project analysis
- Overall fit scoring
Output Formats:
- Detailed Markdown reports
- CSV format for data analysis
- Filtered reports for top candidates

Prerequisites

pip install -r requirements.txt

Required packages:

python-docx
llama-parse
pandas
pydantic
pydantic-ai
python-dotenv
nest-asyncio

Environment Setup

Create a .env file in the root directory
Add your API keys:

OPENAI_API_KEY=your_openai_api_key
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key

Directory Structure

├── main.py                     # Main resume processing script
├── filter_high_scores.py       # Script for filtering top candidates
├── results/                    # Output directory (git ignored)
│   ├── resume_analysis_results.md
│   ├── resume_analysis_results.csv
│   └── top_candidates.md
├── CVs/                       # Directory containing resumes
│   └── cv_leads/             # Subdirectory for resumes (git ignored)
├── .gitignore                # Git exclusion patterns
└── requirements.txt

Note: The results/ directory and CVs/cv_leads/ directory are excluded from git tracking to avoid committing sensitive data and large files. These directories will be created locally when running the scripts.

Usage

1. Processing Resumes

Run the main script to process all resumes:

python main.py

This will:

Process all PDF and Word documents in the specified directory
Generate detailed analysis for each resume
Save results in both markdown and CSV formats
Create a results directory if it doesn't exist

2. Filtering Top Candidates

After processing resumes, run the filtering script:

python filter_high_scores.py

This will:

Read the CSV results file
Filter candidates with scores >= 8.0
Generate a new markdown file with detailed information about top candidates
Include a summary of the filtering results

Analysis Components

Resume Analysis

The system analyzes the following aspects:

Contact Information:
- Full name
- Email address
- Phone number
Technical Skills:
- Python experience and frameworks
- Other programming languages
- Django experience
- SQL proficiency
- Cloud platform experience (Azure, AWS)
- GitHub repositories and profiles
Education:
- Degrees (Bachelor's, Master's, PhD)
- Universities and graduation years
- Awards and certifications
Experience:
- Years of relevant experience
- Data science projects
- Healthcare industry experience
- Leadership roles
Scoring System:
- Technical expertise (40%)
- Relevant experience (30%)
- Education (20%)
- Leadership potential (10%)

Output Formats

Markdown Report (resume_analysis_results.md):
- Detailed analysis for each candidate
- Formatted sections with emojis
- Easy to read and share
CSV File (resume_analysis_results.csv):
- Structured data format
- Easy to import into other tools
- Suitable for further analysis
Top Candidates Report (top_candidates.md):
- Filtered view of best candidates
- Sorted by score
- Summary statistics

Error Handling

The system includes:

Retry logic for API calls
Graceful handling of missing data
Error reporting for failed processing
File format validation

Customization

Adjusting Score Threshold

In filter_high_scores.py, modify the min_score parameter:

filter_and_save_high_scores(input_csv, output_md, min_score=7.0)  # Change to desired threshold

Modifying Output Directory

In both scripts, update the output paths:

markdown_file = 'your/custom/path/results.md'
csv_file = 'your/custom/path/results.csv'

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

LlamaParse for PDF processing
python-docx for Word document processing
Pandas for data manipulation

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
CVs		CVs
Documentation		Documentation
results		results
testing		testing
working_code		working_code
.env_example		.env_example
.gitignore		.gitignore
README.md		README.md
filter_high_scores.py		filter_high_scores.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Analysis System

Features

Prerequisites

Environment Setup

Directory Structure

Usage

1. Processing Resumes

2. Filtering Top Candidates

Analysis Components

Resume Analysis

Output Formats

Error Handling

Customization

Adjusting Score Threshold

Modifying Output Directory

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

IyadSultan/CV_extract

Folders and files

Latest commit

History

Repository files navigation

Resume Analysis System

Features

Prerequisites

Environment Setup

Directory Structure

Usage

1. Processing Resumes

2. Filtering Top Candidates

Analysis Components

Resume Analysis

Output Formats

Error Handling

Customization

Adjusting Score Threshold

Modifying Output Directory

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages