Website Content Extractor in Python

Cerrado Publicado hace 3 meses Pagado a la entrega
Cerrado Pagado a la entrega

Request for Proposal (RFP)

Project Title

Python Program for Extracting Articles from a Website Site Map into .docx Files

Project Overview

We are seeking a proficient Python developer to create a program that extracts articles from a specific website’s site map (e.g., [login to view URL]) and downloads each article published within a specified time range (e.g., the past 24 hours). Each article should be saved into a separate .docx file, named according to the publication date and time. The final program should be user-friendly and well-documented to allow non-technical users to configure and run the script.

Project Scope and Deliverables

1. Python Script (.py file):

• Develop a Python script that takes a site map URL (e.g., [login to view URL]) as input and extracts all article URLs from the specified page.

• Implement logic to filter articles based on a given time range (e.g., the last 24 hours or between specific start and end times).

• Download each article found in the specified time range and save it in a separate .docx file. The .docx file should include:

• Title of the article (as the document header)

• URL of the source page

• Publication Date and Time

• Author Name (if available)

• Main Body Text

• Filename format: [login to view URL] (e.g., [login to view URL]).

• Implement options to include/exclude metadata (e.g., tags, categories) as needed.

2. Output Files:

• Each article should be saved in a separate .docx file in the specified output directory.

• Store additional metadata or a summary file (e.g., a .csv file listing all downloaded articles with their URLs and publication times) if needed.

3. User Interface & Usability:

• Provide a user-friendly interface or command-line options for configuring parameters such as:

• Site Map URL: Input the URL of the site map page (e.g., [login to view URL]).

• Time Range: Specify a time range for filtering articles (e.g., “last 24 hours” or between YYYY-MM-DD and YYYY-MM-DD).

• Output Directory: Set the destination folder for saving the downloaded .docx files.

• Error handling should be robust, with clear messages for common issues (e.g., “Invalid site map URL” or “No articles found in the specified time range”).

4. Detailed Documentation:

• Provide a README file with:

• Installation instructions (including dependencies).

• Detailed usage instructions, covering:

• How to set up and run the script.

• How to specify the time range and site map URL.

• Optional configuration settings.

• Troubleshooting guide for common errors.

5. Code Quality:

• The code should be clean, modular, and well-commented, adhering to Python best practices and the PEP8 coding standard.

• Use meaningful variable names and clear function structures.

Technical Requirements

1. Programming Language: Python (Latest stable version).

2. Libraries:

• Suggested libraries include BeautifulSoup, requests, lxml, and python-docx.

• The developer can recommend additional libraries as needed, but must document their usage in a [login to view URL] file.

3. Environment Compatibility: The script should be compatible with Windows and Unix-based systems.

4. Time Range Specification: Implement logic to handle time ranges in hours or days (e.g., articles published within the last 24 hours, or between specific start and end dates).

5. Data Compliance: Ensure the solution adheres to the target website’s Terms of Service and does not violate any legal restrictions.

Project Timeline

The project is expected to be completed within 4 weeks from the award date, with the following milestones:

1. Day 1: Initial project setup and development of site map extraction module.

2. Day 2: Implementation of time range filtering and .docx export functionality.

3. Day 3: Internal testing and optimization of the script.

4. Day 4: Delivery of a beta version for client review, followed by final adjustments and delivery of the completed project.

Project Budget

Proposals should include a detailed cost breakdown, including estimated hours for each development phase and any additional costs for third-party libraries or tools.

Submission Requirements

1. Proposal Submission Deadline: [Insert Deadline Date]

2. Proposal Format:

• Company or freelancer profile.

• Portfolio of relevant Python and web scraping projects.

• Proposed approach and implementation strategy.

• Project timeline and cost estimate.

• Contact details.

3. Evaluation Criteria:

• Expertise in Python programming, web scraping, and data extraction.

• Experience in working with .docx file formats.

• Ability to create a user-friendly solution.

• Adherence to the timeline and budget constraints.

Submission Contact

All proposals should be submitted to:

• Contact Name:

• Email Address:

Additional Notes

1. The developer must provide post-delivery support for a period of 2 weeks to address any bugs or issues discovered in the program.

2. All intellectual property rights to the source code and documentation will be transferred to the client upon project completion and final payment.

3. Any changes to the project scope should be mutually agreed upon and documented.

Python MySQL

Nº del proyecto: #38665755

Sobre el proyecto

63 propuestas Proyecto remoto Activo hace 2 meses

63 freelancers están ofertando un promedio de £168 por este trabajo

MashoodurRehman1

I am Python developer familiar with web scraping and data extraction, and I can create a user-friendly Python program to extract articles from a specific website's site map and save them as .docx files based on specifi Más

£250 GBP en 2 días
(197 comentarios)
8.0
schoudhary1553

Top 1% in Freelancer.com Hi, Greetings! ✅checked your project details: ✅Completed Time: In project deadline We have worked on 900 + Projects. I have 6 + years of the experience in same kind of projects. If you are look Más

£220 GBP en 4 días
(141 comentarios)
7.3
mananraja

Hi I have expertise in Web Scraping with Python and can develop you a program to download articles from techcrunch's sitemap, into separate and properly named .docx files. I understand the scope and requirements of th Más

£130 GBP en 2 días
(177 comentarios)
7.0
Softeria

✅ I am a proficient Python developer with extensive experience in web scraping and data extraction, poised to deliver a tailored Python program for extracting articles from a website's sitemap into .docx files. My skil Más

£250 GBP en 7 días
(53 comentarios)
6.6
elhadfi

Hey there, I am a Python developer with over 5 years of experience in web scraping and automation. I can help you build a Python program that efficiently extracts articles from your specified sitemap and downloads each Más

£135 GBP en 3 días
(17 comentarios)
5.1
Muhammadzeesha59

As the Senior Full Stack Developer with over six years of experience, I have gained extensive knowledge and expertise in a wide range of programming areas that align perfectly with this project's requirements. My deep Más

£135 GBP en 2 días
(31 comentarios)
5.3
prayogo803

Hello, Sai Wing L. my name is Prayogo, and I have been working as a Full-stack Engineer for 12 years. I have carefully read your job description and feel confident that I can successfully complete your project. I am pr Más

£150 GBP en 29 días
(7 comentarios)
4.7
JustinJcob

Hi Thank you for the opportunity to bid on your project. Bid Proposal: Experienced Python developer ready to kick off your project to create a Python program that extracts and downloads articles from a designated w Más

£135 GBP en 7 días
(7 comentarios)
4.6
skalogir

As a seasoned Python developer and scrapaholic, I'm confident I can develop a top-notch website content extractor tailored specifically to your needs. Over the years, I've built a strong portfolio with proficiencies in Más

£135 GBP en 7 días
(8 comentarios)
4.7
saim2105

As a seasoned Python developer with a specialty in data extraction, I am uniquely positioned to deliver a high-quality solution for your project. Over my career, I have built a reputation for creating customized script Más

£80 GBP en 1 día
(14 comentarios)
4.1
easycoders

Hi, Hope you are doing well and good. I have already scraped techcrunch site using php and node headless browser. Can we discuss more about the job.

£500 GBP en 7 días
(26 comentarios)
5.6
malkesh3m

⭐ Hi, My availability is immediate. I read your project post on Python Developer for Extracting Articles from a Website Site Map into .docx Files. We are experienced full-stack Python developers with skill sets in - P Más

£230 GBP en 3 días
(16 comentarios)
4.2
stheven19

As a Python enthusiast with a keen eye for detail, I believe I have the necessary skills and experience to successfully deliver your Website Content Extractor project. Over 5 years of professional experience in Python Más

£250 GBP en 3 días
(8 comentarios)
3.7
asifmahmudpro

Hi there, How are you? I am a software engineer. Let's do this right now. I am a Python, Django, and Flask developer. I know Python, Beautiful Soup, Scrapy, Playwright, Requests, Urllib3, Selenium, Chromium, and other Más

£250 GBP en 6 días
(20 comentarios)
4.0
VasiliosMakris

Estimated time line: One day. Budget: 20.0GBP fixed. I will provide the first result within one day. Dear [Client], I have successfully completed similar projects involving Python web scraping and document extracti Más

£20 GBP en 3 días
(6 comentarios)
3.1
DeveloperStation

Hi, I have extensive experience in Python development, particularly in web scraping, data extraction, and automation using libraries like BeautifulSoup, requests, lxml, and python-docx. My previous work includes build Más

£135 GBP en 7 días
(1 comentario)
3.1
Khelifa90

Hello, i have a good experience scraping variety of sites with python, i can start right away, contact me to discuss more project details, thanks

£120 GBP en 4 días
(12 comentarios)
2.9
rmttdmkk

Welcome Sir I am writing to express my strong interest in the Python development project for creating a web scraping program to extract articles from site maps. With extensive experience in Python, web scraping, and d Más

£135 GBP en 7 días
(4 comentarios)
3.0
amhatre6

Greetings! With ample of experience in python, I can get your job done efficiently. Kindly confirm if you are good to proceed with this deal. Good day! Thanks Abhishek

£112 GBP en 7 días
(5 comentarios)
2.4
namns1412

Hello, Based on my history completed scalping jobs on freelancer.com, you can see that 100% done and get high recommends. Sure that your requirements will be done easily with me. Thank you for your attention. Have a Más

£220 GBP en 7 días
(9 comentarios)
2.6