Enhance and Debug PDF Data Extraction Program (Python)

Closed Posted last week Paid on delivery
Closed Paid on delivery

I’m looking for an experienced Python programmer to debug, improve, and optimize a PDF data extraction program that I’ve been developing. The program is designed to extract specific information (e.g., ID Constructie, Suprafata constructie, Nr. CF, Nume Proprietar, and Intravilan status) from Romanian land registry PDFs. While it’s functional in some areas, certain parts do not work as expected, and the code overall needs optimization.

What the Program Does:

Extracts data from PDF files using pdfplumber and regex.

Extracts key fields like:

ID Constructie (Construction IDs)

Suprafata constructie (Construction Area)

Nr. CF (Land Registry Number)

Nume Proprietar (Owner Name)

Intravilan status (Whether the land is "DA" or "NU")

Categorie de folosinta (Land Usage Category)

The extracted data is saved into an Excel file using pandas and openpyxl.

Issues Faced:

ID Construcție & Suprafață Construcție:

These fields are not extracted accurately. The correct logic should be based on the A1.x format for IDs and values following specific patterns. Currently, the function doesn't meet expectations.

Inconsistent PDF Formats:

PDFs often vary in structure, especially for key phrases like "Date referitoare la teren" or "Lungime Segmente". Some PDFs lack these sections altogether, causing failures.

Fallback Mechanisms:

When sections like "679/2016" are missing, the program should search alternate ranges, but this logic needs fine-tuning.

Other Enhancements:

General code improvements: Robust error handling, optimized regular expressions, and flexibility to adapt to varied PDF layouts.

Here's a solid description for your Freelancer post, tailored to your specific needs and project progress so far:

Project Title:

"Debug and Enhance PDF Data Extraction Program (Python)"

Description:

I have a Python program designed to extract specific data fields from PDFs, such as property documents ("Cărți Funciare"). The program uses libraries like pdfplumber, re, and pandas to process the PDFs and output results into an Excel file. While the core functionality is implemented, there are issues and areas for improvement that need an expert to resolve.

What the Program Does:

The current program extracts:

Nr. CF - Land registry number.

Nume Proprietar - Owner's name(s).

Suprafață Teren - Land area.

ID Construcție & Suprafață Construcție - IDs of constructions and their respective areas.

Intravilan - Status ("DA" or "NU") indicating land classification.

Categorie de Folosință - Category of land usage (e.g., Arabil, Padure, etc.).

The extracted data is then saved into an Excel file using pandas and openpyxl.

Issues Faced:

ID Construcție & Suprafață Construcție:

These fields are not extracted accurately. The correct logic should be based on the A1.x format for IDs and values following specific patterns. Currently, the function doesn't meet expectations.

Inconsistent PDF Formats:

PDFs often vary in structure, especially for key phrases like "Date referitoare la teren" or "Lungime Segmente". Some PDFs lack these sections altogether, causing failures.

Fallback Mechanisms:

When sections like "679/2016" are missing, the program should search alternate ranges, but this logic needs fine-tuning.

Other Enhancements:

General code improvements: Robust error handling, optimized regular expressions, and flexibility to adapt to varied PDF layouts.

What I Need:

Debug and fix the extraction of "ID Construcție" and "Suprafață Construcție". IDs should be accurately matched (e.g., "A1.x" format).

Improve extract_intravilan_status and ensure it searches multiple ranges if one fails.

Enhance program flexibility to handle PDFs with inconsistent or missing sections.

Clean and optimize regular expressions and search logic for better accuracy.

Implement fallback mechanisms for edge cases when specific sections are not found.

Review other functions (like extract_categorie_folosinta and extract_nume_proprietar) and improve efficiency and reliability.

Ideal Skills:

- Proficient in Python

- Experience with PDF data extraction

- Strong debugging skills

- Ability to enhance program functionality

- Familiarity with handling varied data formats

Deliverables:

A working Python script with improved functionality.

Debugged and accurate extraction for all required fields (IDs, areas, intravilan status, etc.).

Clear documentation on updates made, especially new logic added.

Program capable of handling varied PDF formats and edge cases.

Python Debugging Data Processing Pandas

Project ID: #38897992

About the project

36 proposals Remote project Active last week

36 freelancers are bidding on average $83 for this job

schoudhary1553

Top 1% in Freelancer.com Hi, Greetings! ✅checked your project details: ✅Completed Time: In project deadline We have worked on 900 + Projects. I have 6 + years of the experience in same kind of projects. If you are look More

$180 USD in 4 days
(454 Reviews)
8.4
hafeelmo

Hi, I hope you’re doing well! I’m interested in the job you posted. Based on my experience as a senior software engineer, I believe I could be a good fit. Are you open to a quick chat to discuss the details? I’d love More

$100 USD in 7 days
(8 Reviews)
5.3
elhadfi

Hey there, I am a Python programmer with over 5 years of experience specializing in PDF data extraction and optimization. I can debug and enhance your program to accurately extract fields like ID Construcție, Suprafață More

$175 USD in 2 days
(18 Reviews)
5.1
saim2105

With over x years of experience in Python development and data processing, I’m well-equipped to debug, enhance, and optimize your PDF data extraction program. I have proven my worth in solving complex data extraction c More

$75 USD in 1 day
(21 Reviews)
4.7
trm66614

Hi there,Good afternoon I am Talha. I have read you project details i saw you need help with Debugging, Python, Data Processing and Pandas I am writing to propose an innovative approach to tackle your project. Our pro More

$50 USD in 9 days
(5 Reviews)
4.7
MQamar123

With my comprehensive understanding of both Python and problematic data extraction, I strongly believe that I hold the very skills that your project demands. Over the past 7 years, I have cultivated my programming expe More

$75 USD in 7 days
(21 Reviews)
5.9
sarbtech123

No challenge is too complex for me to overcome, and bringing your PDF data extraction program to its full potential is my top priority. My proficiency in Python, especially with renowned libraries like pdfplumber, re, More

$95 USD in 5 days
(23 Reviews)
4.4
Anas981

Absolutely. I am excited to offer my expertise to debug, improve, and optimize your PDF data extraction program tailored to Romanian land registry documents. With extensive experience in Python and data extraction from More

$100 USD in 1 day
(14 Reviews)
4.5
ahmadkamaleddin

I can help by leveraging my expertise in Python and document processing, specifically with the PDF-based models I use for conversion, extraction, updating, and merging. My experience with open-source ERP systems allows More

$75 USD in 7 days
(5 Reviews)
3.9
changshanlife

I am confident that I am the ideal candidate to enhance and debug your PDF data extraction program. With my in-depth comprehension of the Python environment, as well as extensive experience working with libraries such More

$100 USD in 7 days
(3 Reviews)
3.6
malkesh3m

⭐ Hi, My availability is immediate. I read your project post on Python Developer to Enhance and Debug PDF Data Extraction Program (Python). We are experienced full-stack Python developers with skill sets in - Python, More

$89 USD in 1 day
(14 Reviews)
4.2
eslamafify

Hello Hope everything is well with you.I read your job description and I'm interested in it. I did such projects before related to PDF data extraction. Let's discuss more details.

$70 USD in 7 days
(10 Reviews)
3.6
zainabsaleem13

Hello! I’m an experienced Python programmer with a strong background in PDF data extraction and optimization. I can help debug, improve, and optimize your existing PDF data extraction program to ensure accuracy, effici More

$100 USD in 8 days
(4 Reviews)
2.9
vipuls22

I am a data scientist with 3+ years of relevant experience and extensive knowledge of Python, NumPy, Pandas, PyTorch, Keras, sci-kit-learn, Hugging Face and TensorFlow. I have successfully implemented object detection More

$65 USD in 2 days
(3 Reviews)
3.0
Shabanahoney1976

As a seasoned Python pro, I possess the tenacity and precision that debugging entails. I can troubleshoot the current challenges you're confronting with "ID Construcție" and "Suprafață Construcție", ensuring they are a More

$50 USD in 2 days
(1 Review)
2.0
ivans69

Hi Tim O., I’m genuinely excited about the possibility of contributing to your project. With hands-on experience in building scalable solutions using technologies like Data Processing, Pandas, Python and Debugging, I More

$100 USD in 2 days
(1 Review)
1.4
vishals397

Dear client, Hope you are doing well! I am passionate about data and have a lot of experience in handling it. I have worked on many projects where I build strong and reliable systems to collect, clean, and store data More

$75 USD in 7 days
(1 Review)
0.5
OliviaRoach29

Hi, there. I have read your job detail carefully and I can do this project "Enhance and Debug PDF Data Extraction Program (Python)". As a Full Stack Web developer, for last 7 years, I've developed many web applications More

$85 USD in 5 days
(0 Reviews)
0.0
di254117

Hello I can start working right now and be able to work with your timezone as a fulltime. I think I can deliver high-quality work on time. I excel in communication, problem-solving, and collaboration. My attention to More

$50 USD in 1 day
(0 Reviews)
0.0
codingsparrow

Hhi I am experienced in this and I can start right now but i have few doubts and questions lets have a quick chat and get it started waiting for your replyyy ! r

$75 USD in 7 days
(0 Reviews)
0.0