only for @prayogo803 Automated Extraction and Structuring of Recipes from PDF Cookbooks
$250-750 USD
Thanh toán khi bàn giao
I have a collection of approximately 50 PDF cookbooks, containing an estimated total of 13,000 recipes. Some of these cookbooks have already been split into individual recipe files, while others remain in their original format. I am seeking a freelancer to automate the process of separating the remaining recipes, using the most appropriate tools, potentially including DALL-E or other advanced OCR and text recognition software.
Scope of Work:
The selected freelancer will be responsible for the following tasks:
1. Text Recognition:
• Extract text from the recipes, even if they are embedded in images within the PDF files.
• Ensure the recognition process accurately captures all text, including any non-standard fonts or formats used in the cookbooks.
2. Recipe Identification:
• Detect and separate individual recipes, even if they span across multiple pages.
• Ensure that each recipe is fully captured, without splitting across pages unless the recipe itself does.
3. Data Conversion:
• Convert the extracted text from each recipe into a structured JSON format.
• The JSON should include fields such as Title, Title of the Book, Short Description, Ingredients, Cooking Process, Categories, and Tags.
• Categories and Tags will be provided by me.
4. Language Consistency:
• Ensure that all extracted recipes are in English. For any non-English recipes, a translation process may be necessary.
5. Database Creation:
• Input the JSON-formatted recipes into a database platform, such as Airtable, which I will provide access to.
• Each recipe entry in the database must include a unique identification number and all the relevant fields.
Deliverables:
• A fully populated Airtable database containing all the recipes, accurately categorized and tagged.
• JSON files for each recipe, stored in a systematic folder structure.
• A report detailing the process, including any challenges encountered and how they were addressed.
To ensure that we can be autonomous in future projects involving new books, it is essential that the deliverables include all the supporting files used throughout the process. This should encompass everything from initial materials to final versions, as well as any relevant documentation explaining the workflow and configurations used. These files will enable us to replicate and adapt the processes independently in the future, ensuring continuity and efficiency in our publishing projects.
Skills Required:
• Expertise in OCR technology and text extraction from PDFs, especially where text is embedded in images.
• Experience with tools like DALL-E or similar for image and text recognition.
• Strong knowledge of JSON formatting and database management.
• Familiarity with Airtable or similar database platforms.
• Fluency in English, with experience in translation if necessary.
Timeline:
Please provide an estimated timeline for completing this project, considering the volume of work involved.
Budget:
I am open to bids, but please provide a detailed breakdown of costs, including any software licenses or tools that may be required.
Application Requirements:
• Please provide examples of similar projects you have completed.
• A brief outline of the tools and methods you would use to accomplish this project.
• Your proposed timeline and budget.
ID dự án: #38527493
Về dự án
Được trao cho:
⭐Good day⭐Thanks for your job posting because it well fits to my skills. If you want perfect result, please ping me asap. thanks, Prayogo
36 freelancer chào giá trung bình$530 cho công việc này
Hello Good evening , I hope you are doing great. Just finished reading the brief details of your job . I see you have been looking for a freelancer who has experience with JSON, Data Collection, OCR, Python and Dall-E Thêm
⭐⭐⭐⭐⭐OCR is one of the essential tools in our repertoire. We have extensive experience working with OCR technology, especially where text is embedded in images, using a variety of software and libraries. Additionally, Thêm
Hello Dear! Here is the best freelancer to automate the process of separating the recipes, using the most appropriate tools, potentially including DALL-E or other advanced OCR and text recognition software. I will shar Thêm
Hey Jose M C. , I just finished reading the job description and I see you are looking for someone experienced in JSON, Dall-E, Data Collection, OCR and Python. This is something I can do, Please review my profile to c Thêm
Hi there, I understand the complexity and importance of accurately extracting and organizing the 13,000 recipes from your PDF cookbooks. The key challenge here lies in ensuring precise text recognition, even when reci Thêm
Hi I've read your description carefully and I am sure I can deliver the result you want. I would like to discuss your project in more detail via chat. Regards, Joseph
Dear Jose M C., I went through your project description and it seems like I am a great fit for this job. I am an expert full stack developer with 7+ years of experience in software development. With years of experie Thêm
Hello @prayogo803, I understand that you require automation for extracting and structuring recipes from PDF cookbooks using tools like DALL-E and OCR technology. As an experienced Python developer with expertise in OC Thêm
Hello. Expert here! I have rich experience in your project. ✔️Opportunities don't just happen. I create them.✔️ If u hire me, I will do my best and deliver perfect result. Thanks. Paul
I am writing to express my strong interest in your project. With my expertise in JSON, Dall-E, Data Collection, OCR and Python, I am sure I can deliver the best solutions and high-quality results for your needs. I g Thêm