The project is to make 4 scrappers in Java 6 with jsoup
Java must be like command line tool or function in a library, no user interface
Scrapper 1&2 will browse 2 museums websites to extract the list of paintings with details - the data will be ouput into a text file CSV
The data we would like to have are :
- Painting Page URL
- Painting ImageJPG URL (large image)
- Painting Artist Name
- Painting Artist Born/Death date (If information exist)
- Painting Title
- Painting Medium (If information exist)
- Painting Material (If information exist)
- Painting Technique (If information exist)
- Painting Subjects/Tags (If information exist)
- Painting Type (If information exist)
- Painting Date (If information exist)
- Painting Dimension/Size (If information exist)
- Painting Description/Comment (If information exist)
Scrapper 1 :
[login to view URL] (to search word peinture and enable avecimage)
Scrapper 2 :
[login to view URL]:on-55326/page/2
Scrapper 3 : is a function Java which have 2 parameters : Name of an artist and LanguageCode
Scrapper languageCode parameter can be : EN DE IT ES FR PT JA RU CZ
Scrapper search artist page in languagecode in [login to view URL] and return this informations :
Url of wikipedia page in LanguageCode of the artist
ArtistName (some names can vary in differents languages)
url of ArtistName portrait image JPG
Artist born date
Artist death date
country of artist born into
City name of artist born into
ArtistName biography extract in LanguageCode (maximum 10Kb) plain text without reference, without paintings examples, only the most important part of his biography
Example of request : GetArtistInfo( "Leonardo da vinci" , "FR" ) = infos from [login to view URL]éonard_de_Vinci
Scrapper 4 : is a function Java which have 4 parameters : Name of Museum, city of museum, country of museum and LanguageCode
Scrapper languageCode parameter can be : EN DE IT ES FR PT JA RU CZ
Scrapper search Museum page in languagecode in [login to view URL] and return this informations :
Url of wikipedia page in LanguageCode of the museum
Museum name found in LanguageCode (some names can vary in differents languages)
url of Museum photo image JPG
country of the museum found in page
city of the museum found in page
Museum description text in LanguageCode (maximum 5Kb) plain text without reference, without paintings examples, only the most important part of museum description
Url of Museum official website
Example of request : GetMuseumInfos( "Louvre" , "paris", "france", "fr" ) return infos from [login to view URL]ée_du_Louvre
The project deliverables are
4 source code Java
4 examples of output containing minimum of 50 lines (Scrappers 1&2) or 10 requests (Scrappers 3&4)
Hello
I am Java expert with previous experience in JSoup and interested in this project. I have reviewed your requirements and confident to handle this project perfectly.
Please communicate to discuss further.
Regards
Anshu
Hi sir,
I am scraping expert, I have did too many similar projects, please check my feedback then you will know.
Can you tell me more details? then I will provide demo data for you.
Thanks,
Kimi
Hey,
I have much experience in JSoup and Java. I've just completed a JSoup project where I was able to achieve 1000 connections per 1-2 minutes from a large shopping website called ezbuy. You're requirements can be done within 7 days.
Message me if you're interested.
Sincerely,
Owen McMonagle.
Software Eureka.
I am experienced with Jsoup, need 5 days to finish the project because of French sites.
Questions:
Tell me if scrapper 1&2 should download any data or only data limited by request. (if yes give me example of request)
Do you need logging - if yes tell me which library do you prefer (for example log4j)
If you want I can store it in database using Hibernate and any database (+20$ )