您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# Python怎么爬取动漫桌面高清壁纸
在当今数字时代,高清动漫壁纸深受爱好者喜爱。本文将详细介绍如何用Python爬取动漫桌面壁纸,涵盖技术选型、反爬策略和实战代码。
## 一、技术选型与工具准备
### 1. 核心工具包
```python
import requests # 网络请求
from bs4 import BeautifulSoup # HTML解析
import os # 文件操作
import time # 延时控制
pip install requests beautifulsoup4
典型壁纸站特点:
- 分页URL规律:https://wallhaven.cc/search?q=anime&page=2
- 图片详情页包含原始尺寸下载链接
# 获取缩略图容器
thumbnails = soup.select('figure.thumb')
# 提取高清图链接
hd_url = soup.select_one('#wallpaper')['src']
def get_wallpapers(keyword='anime', pages=3):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
for page in range(1, pages+1):
url = f'https://wallhaven.cc/search?q={keyword}&page={page}'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 解析图片列表...
def download_image(url, save_dir='wallpapers'):
if not os.path.exists(save_dir):
os.makedirs(save_dir)
filename = os.path.join(save_dir, url.split('/')[-1])
with requests.get(url, stream=True) as r:
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
print(f'已保存: {filename}')
from fake_useragent import UserAgent
ua = UserAgent()
headers = {'User-Agent': ua.random}
proxies = {
'http': 'http://127.0.0.1:1080',
'https': 'https://127.0.0.1:1080'
}
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=5) as executor:
executor.map(download_image, img_urls)
time.sleep(random.uniform(1, 3))
import requests
from bs4 import BeautifulSoup
import os
import time
from concurrent.futures import ThreadPoolExecutor
def main():
keyword = input("输入搜索关键词(如anime): ")
pages = int(input("需要爬取的页数: "))
base_url = "https://wallhaven.cc/search"
headers = {'User-Agent': 'Mozilla/5.0'}
img_urls = []
for page in range(1, pages+1):
params = {'q': keyword, 'page': page}
res = requests.get(base_url, params=params, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
for img in soup.select('figure.thumb'):
detail_link = 'https:' + img.a['href'] if not img.a['href'].startswith('http') else img.a['href']
img_urls.append(get_hd_url(detail_link))
time.sleep(2)
with ThreadPoolExecutor(4) as executor:
executor.map(download_image, img_urls)
if __name__ == '__main__':
main()
通过本文介绍的方法,你可以轻松建立专属动漫壁纸库。建议先从少量测试开始,遵守网站爬取规则,享受技术带来的便利与乐趣! “`
注:实际运行时请替换示例网站为合法的可爬取目标,并确保遵守目标网站的服务条款。
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。