Skip to content

liaojianqiang/Ali_Auction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ali_Auction

Publication

SIGHCI 2023 Proceedings: When Online Auction Meets Virtual Reality: An Empirical Investigation estigation

Data

source

Ali auction website historical auction records of houses

Details

Collection requirements:

  • Type of subject matter: residential houses
  • Location of the subject matter: Suzhou, Wuxi, Hangzhou, Wenzhou, Hefei, Chengdu
  • Type of asset: unlimited
  • Auction status: terminated (main), suspended, withdrawn
  • date: January 1, 2020 to June 30, 2022

Crawling

Workflow

image

Crawling the listpage and detail page

crawler_alfp.py

Crawl the specific content (including listpage and corresponding detail page) of the subject matter. First, modify line 347 of crawler_alfp_city.py to set crawler_list = True to collect listpage (need to slice), then set crawler_list = False for detail page collection. The second process takes a long time.

This part gets the listpage.csv, source.csv and html local files.

Parsing data

parse_source.py

Run the parsing code to clean the fields based on source.csv and then get the parsed data. After all the pages are collected, modify the last line of the parse_source.py to change the parameter of run to the path of the existing source.csv, for parsing and normalization.

This part gets the std_city_final.csv file.

Downloading attachments

get_file.py

After all the fields are parsed and standardized, run get_file.py to download attachments to local.

About

Crawl data from https://sf.taobao.com/ and clean data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%