update readme

csuldw · Sep 4, 2019 · 9fc4a8b · 9fc4a8b
1 parent 302eaaa
commit 9fc4a8b
Show file tree

Hide file tree

Showing 3 changed files with 4 additions and 2 deletions.
diff --git a/README.MD b/README.MD
@@ -1,8 +1,8 @@
-# 豆瓣千万影评数据爬取
+# 300万豆瓣影评数据爬取说明
 
 ## 介绍
 
-本项目是针对豆瓣电影、名人、书籍、评论于一体的爬虫项目，关于爬虫的代码细节，笔者正在整理中，还请读者见谅。爬虫框架说明和爬虫数据分析介绍可参考笔者下面的文章。本项目配置代理之后，将并发数调至1000，在Mac单机上面，一晚上可以爬取千万影评，如果没有代理，就另当别论啦。代理其实网上有免费的，但是不太好用，笔者花了几十块从代理商买了一周，足够用了。这里就不专门给代理商打广告了，如读者需要我推荐相关代理，可在关注笔者的公众号，在公众号留言即可，笔者每天都会回复的。
+本项目是针对豆瓣电影、名人、书籍、评论于一体的爬虫项目，关于爬虫的代码细节，笔者正在整理中，还请读者见谅。爬虫框架说明和爬虫数据分析介绍可参考笔者下面的文章。本项目配置代理之后，将并发数上调，在Mac单机上面，不用一晚上就可以爬取300万的影评数据（电影+演员+评论），如果没有代理，就另当别论啦。代理其实网上有免费的，但是不太好用，笔者花了几十块从代理商买了一周，足够用了。这里就不专门给代理商打广告了，如读者需要我推荐相关代理，可在关注笔者的公众号，在公众号留言即可，笔者每天都会回复的。
 
 
 1. [13万豆瓣电影数据爬取原理剖析](http://www.csuldw.com/2019/08/29/2019-08-29-douban-spider/)

diff --git a/scrapy/douban/spiders/__pycache__/person_meta.cpython-37.pyc b/scrapy/douban/spiders/__pycache__/person_meta.cpython-37.pyc
diff --git a/scrapy/douban/spiders/person_meta.py b/scrapy/douban/spiders/person_meta.py
@@ -77,6 +77,8 @@ def get_birth(self, meta, response):
         print("============get_birth:", data)
         if data:
             meta['birth'] = validator.str_to_date(validator.match_date(data[0].strip("\n")))
+            if not meta['birth']:
+                meta['birth'] = data[0].strip("\n").split(":")[-1]
         return meta