您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# Python如何使用Requests抓取包图网小视频
## 前言
在网络爬虫开发中,视频资源的抓取一直是技术难点之一。本文将以包图网(ibaotu.com)为例,详细介绍如何通过Python的Requests库实现小视频的自动化抓取。文章包含环境准备、页面分析、反爬对策、视频下载等完整流程,并提供可直接运行的代码示例。
---
## 一、环境准备
### 1.1 安装必要库
```bash
pip install requests
pip install beautifulsoup4
pip install fake-useragent
包图网视频详情页URL格式:
https://ibaotu.com/shipin/7-0-0-0-0-1.html
其中关键参数: - 数字7表示视频ID - 末尾1表示页码
通过开发者工具分析发现:
1. 真实视频地址隐藏在<video>
标签的src
属性中
2. 部分视频采用动态加载,需要解析JavaScript
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
def get_video_page(video_id):
url = f'https://ibaotu.com/shipin/{video_id}.html'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
return soup
def extract_video_url(soup):
video_tag = soup.find('video', {'class': 'vjs-tech'})
if video_tag:
return video_tag.get('src')
return None
部分视频采用AJAX加载,需要模拟XHR请求:
def get_dynamic_video(video_id):
api_url = f'https://ibaotu.com/ajax/getVideoInfo.php?id={video_id}'
response = requests.get(api_url, headers=headers)
return response.json().get('video_url')
from fake_useragent import UserAgent
ua = UserAgent()
headers = {
'User-Agent': ua.random,
'Referer': 'https://ibaotu.com/'
}
proxies = {
'http': 'http://127.0.0.1:8888',
'https': 'http://127.0.0.1:8888'
}
response = requests.get(url, proxies=proxies)
import time
import random
time.sleep(random.uniform(1, 3))
def download_video(url, save_path):
response = requests.get(url, stream=True, headers=headers)
with open(save_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024*1024):
if chunk:
f.write(chunk)
def resume_download(url, save_path):
headers['Range'] = f'bytes={os.path.getsize(save_path)}-'
response = requests.get(url, headers=headers, stream=True)
with open(save_path, 'ab') as f:
for chunk in response.iter_content(chunk_size=1024):
f.write(chunk)
import os
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
class BaoTuVideoDownloader:
def __init__(self):
self.ua = UserAgent()
self.session = requests.Session()
def get_headers(self):
return {
'User-Agent': self.ua.random,
'Referer': 'https://ibaotu.com/'
}
def get_video_info(self, video_id):
# 两种获取方式组合使用
html_url = f'https://ibaotu.com/shipin/{video_id}.html'
api_url = f'https://ibaotu.com/ajax/getVideoInfo.php?id={video_id}'
# 先尝试API接口
try:
resp = self.session.get(api_url, headers=self.get_headers())
if resp.status_code == 200:
return resp.json().get('video_url')
except:
pass
# 备用HTML解析方案
try:
resp = self.session.get(html_url, headers=self.get_headers())
soup = BeautifulSoup(resp.text, 'html.parser')
video = soup.find('video', {'class': 'vjs-tech'})
if video:
return video.get('src')
except:
pass
return None
def download(self, video_url, save_path):
try:
with self.session.get(video_url, stream=True,
headers=self.get_headers()) as r:
r.raise_for_status()
with open(save_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return True
except Exception as e:
print(f"下载失败: {str(e)}")
return False
if __name__ == '__main__':
downloader = BaoTuVideoDownloader()
video_url = downloader.get_video_info('12345') # 替换实际视频ID
if video_url:
downloader.download(video_url, 'demo.mp4')
# 使用FFmpeg转换格式
ffmpeg -i input.flv -c copy output.mp4
本文详细讲解了使用Requests库抓取包图网视频的全流程。实际开发中还需考虑: - 验证码识别 - 登录态维持 - 分布式爬取 - 资源去重等高级功能
建议先小规模测试,确认无误后再扩大抓取范围。完整项目代码已托管至GitHub(示例地址)。 “`
注:本文代码示例仅供参考,实际使用时请遵守相关网站的使用条款。建议在抓取前先联系网站方确认权限,避免法律风险。
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。