Enhance and Debug PDF Data Extraction Program (Python)

Suljettu Julkaistu viime viikolla Maksettu toimituksen yhteydessä

$50-100 USD

Maksettu toimituksen yhteydessä

Suljettu Maksettu toimituksen yhteydessä

I’m looking for an experienced Python programmer to debug, improve, and optimize a PDF data extraction program that I’ve been developing. The program is designed to extract specific information (e.g., ID Constructie, Suprafata constructie, Nr. CF, Nume Proprietar, and Intravilan status) from Romanian land registry PDFs. While it’s functional in some areas, certain parts do not work as expected, and the code overall needs optimization.

What the Program Does:

Extracts data from PDF files using pdfplumber and regex.

Extracts key fields like:

ID Constructie (Construction IDs)

Suprafata constructie (Construction Area)

Nr. CF (Land Registry Number)

Nume Proprietar (Owner Name)

Intravilan status (Whether the land is "DA" or "NU")

Categorie de folosinta (Land Usage Category)

The extracted data is saved into an Excel file using pandas and openpyxl.

Issues Faced:

ID Construcție & Suprafață Construcție:

These fields are not extracted accurately. The correct logic should be based on the A1.x format for IDs and values following specific patterns. Currently, the function doesn't meet expectations.

Inconsistent PDF Formats:

PDFs often vary in structure, especially for key phrases like "Date referitoare la teren" or "Lungime Segmente". Some PDFs lack these sections altogether, causing failures.

Fallback Mechanisms:

When sections like "679/2016" are missing, the program should search alternate ranges, but this logic needs fine-tuning.

Other Enhancements:

General code improvements: Robust error handling, optimized regular expressions, and flexibility to adapt to varied PDF layouts.

Here's a solid description for your Freelancer post, tailored to your specific needs and project progress so far:

Project Title:

"Debug and Enhance PDF Data Extraction Program (Python)"

Description:

I have a Python program designed to extract specific data fields from PDFs, such as property documents ("Cărți Funciare"). The program uses libraries like pdfplumber, re, and pandas to process the PDFs and output results into an Excel file. While the core functionality is implemented, there are issues and areas for improvement that need an expert to resolve.

What the Program Does:

The current program extracts:

Nr. CF - Land registry number.

Nume Proprietar - Owner's name(s).

Suprafață Teren - Land area.

ID Construcție & Suprafață Construcție - IDs of constructions and their respective areas.

Intravilan - Status ("DA" or "NU") indicating land classification.

Categorie de Folosință - Category of land usage (e.g., Arabil, Padure, etc.).

The extracted data is then saved into an Excel file using pandas and openpyxl.

Issues Faced:

ID Construcție & Suprafață Construcție:

These fields are not extracted accurately. The correct logic should be based on the A1.x format for IDs and values following specific patterns. Currently, the function doesn't meet expectations.

Inconsistent PDF Formats:

PDFs often vary in structure, especially for key phrases like "Date referitoare la teren" or "Lungime Segmente". Some PDFs lack these sections altogether, causing failures.

Fallback Mechanisms:

When sections like "679/2016" are missing, the program should search alternate ranges, but this logic needs fine-tuning.

Other Enhancements:

General code improvements: Robust error handling, optimized regular expressions, and flexibility to adapt to varied PDF layouts.

What I Need:

Debug and fix the extraction of "ID Construcție" and "Suprafață Construcție". IDs should be accurately matched (e.g., "A1.x" format).

Improve extract_intravilan_status and ensure it searches multiple ranges if one fails.

Enhance program flexibility to handle PDFs with inconsistent or missing sections.

Clean and optimize regular expressions and search logic for better accuracy.

Implement fallback mechanisms for edge cases when specific sections are not found.

Review other functions (like extract_categorie_folosinta and extract_nume_proprietar) and improve efficiency and reliability.

Ideal Skills:

- Proficient in Python

- Experience with PDF data extraction

- Strong debugging skills

- Ability to enhance program functionality

- Familiarity with handling varied data formats

Deliverables:

A working Python script with improved functionality.

Debugged and accurate extraction for all required fields (IDs, areas, intravilan status, etc.).

Clear documentation on updates made, especially new logic added.

Program capable of handling varied PDF formats and edge cases.

Python Virheenjäljitys Tietojenkäsittely Pandas

Projektin tunnus: #38897992

Tietoa projektista

36 ehdotusta Etäprojekti Aktiivinen viime viikolla

Haluatko ansaita rahaa?

Freelancerin tarjouskilpailun edut

Aseta budjettisi ja aikataulu

Saa maksu työstäsi

Hahmottele tarjouksesi

Rekisteröinti ja töihin tarjoaminen on ilmaista

36 freelanceria on tarjonnut keskimäärin $83 tähän työhön

schoudhary1553

Top 1% in Freelancer.com Hi, Greetings! ✅checked your project details: ✅Completed Time: In project deadline We have worked on 900 + Projects. I have 6 + years of the experience in same kind of projects. If you are look Lisää

$180 USD 4 päivässä

(454 arvostelua)

8.4

hafeelmo

Hi, I hope you’re doing well! I’m interested in the job you posted. Based on my experience as a senior software engineer, I believe I could be a good fit. Are you open to a quick chat to discuss the details? I’d love Lisää

$100 USD 7 päivässä

(8 arvostelua)

5.3

elhadfi

Hey there, I am a Python programmer with over 5 years of experience specializing in PDF data extraction and optimization. I can debug and enhance your program to accurately extract fields like ID Construcție, Suprafață Lisää

$175 USD 2 päivässä

(18 arvostelua)

5.1

saim2105

With over x years of experience in Python development and data processing, I’m well-equipped to debug, enhance, and optimize your PDF data extraction program. I have proven my worth in solving complex data extraction c Lisää

$75 USD 1 päivässä

(21 arvostelua)

4.7

trm66614

Hi there,Good afternoon I am Talha. I have read you project details i saw you need help with Debugging, Python, Data Processing and Pandas I am writing to propose an innovative approach to tackle your project. Our pro Lisää

$50 USD 9 päivässä

(5 arvostelua)

4.7

MQamar123

With my comprehensive understanding of both Python and problematic data extraction, I strongly believe that I hold the very skills that your project demands. Over the past 7 years, I have cultivated my programming expe Lisää

$75 USD 7 päivässä

(21 arvostelua)

5.9

sarbtech123

No challenge is too complex for me to overcome, and bringing your PDF data extraction program to its full potential is my top priority. My proficiency in Python, especially with renowned libraries like pdfplumber, re, Lisää

$95 USD 5 päivässä

(23 arvostelua)

4.4

Anas981

Absolutely. I am excited to offer my expertise to debug, improve, and optimize your PDF data extraction program tailored to Romanian land registry documents. With extensive experience in Python and data extraction from Lisää

$100 USD 1 päivässä

(14 arvostelua)

4.5

ahmadkamaleddin

I can help by leveraging my expertise in Python and document processing, specifically with the PDF-based models I use for conversion, extraction, updating, and merging. My experience with open-source ERP systems allows Lisää

$75 USD 7 päivässä

(5 arvostelua)

3.9

changshanlife

I am confident that I am the ideal candidate to enhance and debug your PDF data extraction program. With my in-depth comprehension of the Python environment, as well as extensive experience working with libraries such Lisää

$100 USD 7 päivässä

(3 arvostelua)

3.6

malkesh3m

⭐ Hi, My availability is immediate. I read your project post on Python Developer to Enhance and Debug PDF Data Extraction Program (Python). We are experienced full-stack Python developers with skill sets in - Python, Lisää

$89 USD 1 päivässä

(14 arvostelua)

4.2

eslamafify

Hello Hope everything is well with you.I read your job description and I'm interested in it. I did such projects before related to PDF data extraction. Let's discuss more details.

$70 USD 7 päivässä

(10 arvostelua)

3.6

zainabsaleem13

Hello! I’m an experienced Python programmer with a strong background in PDF data extraction and optimization. I can help debug, improve, and optimize your existing PDF data extraction program to ensure accuracy, effici Lisää

$100 USD 8 päivässä

(4 arvostelua)

2.9

vipuls22

I am a data scientist with 3+ years of relevant experience and extensive knowledge of Python, NumPy, Pandas, PyTorch, Keras, sci-kit-learn, Hugging Face and TensorFlow. I have successfully implemented object detection Lisää

$65 USD 2 päivässä

(3 arvostelua)

3.0

Shabanahoney1976

As a seasoned Python pro, I possess the tenacity and precision that debugging entails. I can troubleshoot the current challenges you're confronting with "ID Construcție" and "Suprafață Construcție", ensuring they are a Lisää

$50 USD 2 päivässä

(1 arvostelu)

2.0

ivans69

Hi Tim O., I’m genuinely excited about the possibility of contributing to your project. With hands-on experience in building scalable solutions using technologies like Data Processing, Pandas, Python and Debugging, I Lisää

$100 USD 2 päivässä

(1 arvostelu)

1.4

vishals397

Dear client, Hope you are doing well! I am passionate about data and have a lot of experience in handling it. I have worked on many projects where I build strong and reliable systems to collect, clean, and store data Lisää

$75 USD 7 päivässä

(1 arvostelu)

0.5

OliviaRoach29

Hi, there. I have read your job detail carefully and I can do this project "Enhance and Debug PDF Data Extraction Program (Python)". As a Full Stack Web developer, for last 7 years, I've developed many web applications Lisää

$85 USD 5 päivässä

(0 arvostelua)

0.0

di254117

Hello I can start working right now and be able to work with your timezone as a fulltime. I think I can deliver high-quality work on time. I excel in communication, problem-solving, and collaboration. My attention to Lisää

$50 USD 1 päivässä

(0 arvostelua)

0.0

codingsparrow

Hhi I am experienced in this and I can start right now but i have few doubts and questions lets have a quick chat and get it started waiting for your replyyy ! r

$75 USD 7 päivässä

(0 arvostelua)

0.0

Ilmoita samanlainen projekti

Enhance and Debug PDF Data Extraction Program (Python)

Tietoa projektista

Haluatko ansaita rahaa?

Freelancerin tarjouskilpailun edut

36 freelanceria on tarjonnut keskimäärin $83 tähän työhön

Freelancer

Tietoa

Ehdot

Sovellukset