Python3怎么爬取英雄联盟所有英雄皮肤

发布时间:2021-10-11 17:54:56 作者:柒染
来源:亿速云 阅读:162
# Python3怎么爬取英雄联盟所有英雄皮肤

## 前言

英雄联盟(League of Legends)作为全球最受欢迎的MOBA游戏之一,拥有超过150位英雄和数千款精美皮肤。本文将详细介绍如何使用Python3爬取英雄联盟官网的所有英雄皮肤数据,包括高清原画和皮肤信息。

---

## 技术栈准备

### 所需工具
- Python 3.8+
- requests库(网络请求)
- BeautifulSoup4(HTML解析)
- json(数据处理)
- os(文件操作)
- concurrent.futures(多线程加速)

### 安装依赖
```bash
pip install requests beautifulsoup4

爬虫流程设计

1. 分析数据源

通过浏览器开发者工具分析英雄联盟官网(https://lol.qq.com/data/)发现: - 英雄数据接口:https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js - 皮肤数据接口:https://game.gtimg.cn/images/lol/act/img/js/hero/英雄ID.js

2. 爬取步骤

  1. 获取所有英雄ID和名称
  2. 遍历英雄列表获取每个英雄的皮肤数据
  3. 下载高清皮肤原画(1920x1080分辨率)
  4. 存储皮肤元数据到JSON文件

代码实现

1. 获取英雄列表

import requests
import json
import os

def get_hero_list():
    url = "https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
    }
    response = requests.get(url, headers=headers)
    data = json.loads(response.text[response.text.index('{'):])
    return data["hero"]

hero_list = get_hero_list()
print(f"共获取到{len(hero_list)}位英雄数据")

2. 获取英雄皮肤数据

def get_hero_skins(hero_id):
    url = f"https://game.gtimg.cn/images/lol/act/img/js/hero/{hero_id}.js"
    response = requests.get(url)
    skin_data = json.loads(response.text[response.text.index('{'):])
    return skin_data["skins"]

# 示例:获取亚索(Yasuo)的皮肤
yasuo_skins = get_hero_skins("157")
print(json.dumps(yasuo_skins[:2], indent=2, ensure_ascii=False))

3. 下载皮肤原画

def download_skin(skin, save_dir="skins"):
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)
    
    hero_name = skin["heroName"]
    skin_name = skin["name"]
    skin_id = skin["skinId"]
    
    # 构造高清原画URL(1920x1080)
    img_url = f"https://game.gtimg.cn/images/lol/act/img/skin/big{skin_id}.jpg"
    
    try:
        response = requests.get(img_url, stream=True)
        if response.status_code == 200:
            file_path = f"{save_dir}/{hero_name}_{skin_name}.jpg"
            with open(file_path, 'wb') as f:
                for chunk in response.iter_content(1024):
                    f.write(chunk)
            print(f"已下载: {hero_name} - {skin_name}")
    except Exception as e:
        print(f"下载失败: {skin_name} - {str(e)}")

4. 多线程加速下载

from concurrent.futures import ThreadPoolExecutor

def batch_download_skins(max_workers=5):
    hero_list = get_hero_list()
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        for hero in hero_list:
            skins = get_hero_skins(hero["heroId"])
            for skin in skins:
                if skin["mainImg"]:  # 过滤默认皮肤
                    executor.submit(download_skin, skin)

完整代码示例

import requests
import json
import os
from concurrent.futures import ThreadPoolExecutor

class LOLSkinSpider:
    def __init__(self):
        self.base_url = "https://game.gtimg.cn/images/lol/act/img/js"
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
        }
    
    def get_hero_list(self):
        url = f"{self.base_url}/heroList/hero_list.js"
        response = requests.get(url, headers=self.headers)
        data = json.loads(response.text[response.text.index('{'):])
        return data["hero"]
    
    def get_hero_skins(self, hero_id):
        url = f"{self.base_url}/hero/{hero_id}.js"
        response = requests.get(url, headers=self.headers)
        skin_data = json.loads(response.text[response.text.index('{'):])
        return skin_data["skins"]
    
    def download_skin(self, skin, save_dir="skins"):
        if not os.path.exists(save_dir):
            os.makedirs(save_dir)
        
        hero_name = skin["heroName"]
        skin_name = skin["name"].replace("/", "_")  # 处理特殊字符
        skin_id = skin["skinId"]
        
        img_url = f"https://game.gtimg.cn/images/lol/act/img/skin/big{skin_id}.jpg"
        
        try:
            response = requests.get(img_url, headers=self.headers, stream=True)
            if response.status_code == 200:
                file_path = f"{save_dir}/{hero_name}_{skin_name}.jpg"
                with open(file_path, 'wb') as f:
                    for chunk in response.iter_content(1024):
                        f.write(chunk)
                print(f"下载成功: {hero_name} - {skin_name}")
        except Exception as e:
            print(f"下载失败: {skin_name} - {str(e)}")
    
    def run(self, max_workers=5):
        heroes = self.get_hero_list()
        print(f"开始爬取{len(heroes)}位英雄的皮肤数据...")
        
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            for hero in heroes:
                skins = self.get_hero_skins(hero["heroId"])
                for skin in skins:
                    if skin["mainImg"] and skin["skinId"] != "0":  # 过滤默认皮肤
                        executor.submit(self.download_skin, skin)
        
        print("所有皮肤下载完成!")

if __name__ == "__main__":
    spider = LOLSkinSpider()
    spider.run()

数据处理与存储

1. 保存元数据到JSON

def save_metadata(hero_list):
    all_skins = []
    for hero in hero_list:
        skins = get_hero_skins(hero["heroId"])
        all_skins.extend([{
            "hero_id": hero["heroId"],
            "hero_name": hero["name"],
            "skin_id": skin["skinId"],
            "skin_name": skin["name"],
            "price": skin.get("price", "未知"),
            "release_date": skin.get("publishTime", "未知")
        } for skin in skins if skin["mainImg"]])
    
    with open("lol_skins_metadata.json", "w", encoding="utf-8") as f:
        json.dump(all_skins, f, indent=2, ensure_ascii=False)

2. 数据去重处理

import hashlib

def remove_duplicates(dir_path):
    unique_files = {}
    for filename in os.listdir(dir_path):
        file_path = os.path.join(dir_path, filename)
        with open(file_path, "rb") as f:
            file_hash = hashlib.md5(f.read()).hexdigest()
        if file_hash not in unique_files:
            unique_files[file_hash] = file_path
        else:
            os.remove(file_path)

反爬虫策略应对

1. 请求头伪装

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
    "Referer": "https://lol.qq.com/data/",
    "Accept-Language": "zh-CN,zh;q=0.9"
}

2. 请求间隔控制

import time
import random

def delayed_request(url):
    time.sleep(random.uniform(0.5, 1.5))  # 随机延迟
    return requests.get(url, headers=headers)

3. IP代理池(可选)

proxies = {
    "http": "http://your_proxy:port",
    "https": "https://your_proxy:port"
}
response = requests.get(url, headers=headers, proxies=proxies)

项目优化建议

  1. 断点续传:记录已下载的皮肤ID,避免重复下载
  2. 增量更新:定期检查新英雄/新皮肤
  3. 分类存储:按英雄创建子目录存放皮肤
  4. 分辨率选择:支持下载不同尺寸的皮肤图片(大/中/小)
  5. GUI界面:使用PyQt/Tkinter构建可视化工具

法律与道德声明

  1. 本代码仅用于学习交流,请勿用于商业用途
  2. 爬取数据时请控制请求频率,避免对服务器造成压力
  3. 英雄联盟所有美术资源版权归Riot Games所有
  4. 建议在个人项目中使用时添加版权声明

结语

通过本文介绍的Python爬虫技术,我们可以高效地获取英雄联盟全英雄的皮肤数据。这套方法同样适用于其他游戏的数据采集,只需修改对应的接口地址即可。希望本文能帮助大家更好地理解网络爬虫的开发流程和实战技巧。

完整项目代码已上传GitHub:https://github.com/yourname/lol-skin-spider “`

(注:实际文章约2350字,此处为保持结构清晰做了适当精简,完整版可扩展各部分说明和代码注释)

推荐阅读:
  1. Python练手项目:20行爬取全王者全英雄皮肤
  2. python如何爬取王者荣耀全皮肤

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。

python

上一篇:Python的基础代码怎么写

下一篇:如何使用Python爬虫获取王者荣耀皮肤高清图

相关阅读

您好,登录后才能下订单哦!

密码登录
登录注册
其他方式登录
点击 登录注册 即表示同意《亿速云用户服务条款》