Focuspoints' Blog

A blog with My IT Life.


hey,欢迎来到我的小博客~

python爬虫爬取糗事百科段子

技术点

  • python
  • request模块
  • re模块
  • 正则表达式

爬取糗事百科热图

import re
import requests
import os
from urllib import request

if not os.path.exists('pic'):
    os.mkdir('pic')

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Safari/537.36"
}
    
url = 'https://www.qiushibaike.com/imgrank/'

page_text = requests.get(url=url,headers=headers).text
#解析img标签的src属性值
ex = '<div class="thumb">.*?<img src="(.*?)" alt=.*?</div>'

img_url_list = re.findall(ex,page_text,re.S)

for img_url in img_url_list:
    img_url = 'https:'+img_url
    imgPath = 'pic/'+img_url.split('/')[-1]
    #对图片url发请求
    request.urlretrieve(url=img_url,filename=imgPath)
    print(imgPath+'下载成功!!!')
更早的文章

python爬虫入门之request模块爬虫

python爬虫入门之request模块爬虫爬虫概念爬虫呢,就是编写一个程序去模仿上网过程,让其去互…

python  爬虫  继续阅读
0评论