您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# Python如何爬取壁纸网站数据
网络爬虫是获取互联网公开数据的有效工具,本文将以壁纸网站为例,详细介绍使用Python爬取图片数据的完整流程。我们将使用`requests`、`BeautifulSoup`和`os`等库实现这个项目。
## 一、准备工作
### 1.1 安装必要库
```python
pip install requests beautifulsoup4
以Wallhaven.cc为例(实际使用时请遵守网站robots.txt规则): - 页面结构:壁纸以缩略图形式展示 - 图片URL规律:点击缩略图后进入详情页获取原图 - 分页机制:URL带有page参数
import requests
from bs4 import BeautifulSoup
def get_html(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
}
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
print(f"请求失败: {e}")
return None
def parse_links(html):
soup = BeautifulSoup(html, 'html.parser')
thumbnails = soup.select('img.preview[data-src]')
return [img['data-src'].replace('small', 'full') for img in thumbnails]
def crawl_multiple_pages(base_url, max_pages=5):
for page in range(1, max_pages + 1):
url = f"{base_url}/page/{page}"
html = get_html(url)
if html:
image_links = parse_links(html)
download_images(image_links)
import os
def download_images(urls, folder='wallpapers'):
if not os.path.exists(folder):
os.makedirs(folder)
for url in urls:
try:
filename = os.path.join(folder, url.split('/')[-1])
with open(filename, 'wb') as f:
response = requests.get(url, stream=True)
for chunk in response.iter_content(1024):
f.write(chunk)
print(f"下载成功: {filename}")
except Exception as e:
print(f"下载失败 {url}: {e}")
headers = {
'User-Agent': 'Mozilla/5.0',
'Referer': 'https://wallhaven.cc/',
'Accept-Language': 'en-US,en;q=0.9'
}
import time
import random
def delayed_request(url):
time.sleep(random.uniform(1, 3))
return requests.get(url)
import os
import time
import random
import requests
from bs4 import BeautifulSoup
class WallpaperCrawler:
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0',
'Referer': 'https://wallhaven.cc/'
}
def crawl(self, category='general', pages=3):
base_url = f"https://wallhaven.cc/{category}"
for page in range(1, pages + 1):
self.download_page(f"{base_url}?page={page}")
time.sleep(random.uniform(2, 5))
def download_page(self, url):
html = self.get_html(url)
if html:
soup = BeautifulSoup(html, 'html.parser')
wallpapers = soup.select('figure.thumb')
for wp in wallpapers:
detail_url = wp.find('a')['href']
self.download_wallpaper(detail_url)
def download_wallpaper(self, url):
html = self.get_html(url)
if html:
soup = BeautifulSoup(html, 'html.parser')
img_url = soup.select_one('img#wallpaper')['src']
self.save_image(img_url)
def save_image(self, url):
try:
filename = os.path.join('wallpapers', url.split('/')[-1])
response = requests.get(url, headers=self.headers)
with open(filename, 'wb') as f:
f.write(response.content)
print(f"Saved: {filename}")
except Exception as e:
print(f"Error saving {url}: {e}")
def get_html(self, url):
try:
response = requests.get(url, headers=self.headers)
return response.text
except Exception as e:
print(f"Request failed: {e}")
return None
if __name__ == "__main__":
crawler = WallpaperCrawler()
crawler.crawl(pages=2)
通过以上方法,你可以构建一个高效的壁纸爬虫。如需扩展功能,可以考虑: - 添加分辨率筛选 - 实现多线程下载 - 开发GUI界面 - 增加自动换壁纸功能 “`
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。