Python爬虫并发请求如何实现

发布时间：2024-12-14 13:18:47 作者：小樊
来源：亿速云阅读：95

在Python中，可以使用多种库来实现爬虫并发请求，如requests, aiohttp, Scrapy等。这里将分别介绍这三种方法。

使用requests库：

requests库本身不支持异步请求，但可以通过concurrent.futures模块中的ThreadPoolExecutor或ProcessPoolExecutor实现并发请求。

示例代码：

import requests
from concurrent.futures import ThreadPoolExecutor

url_list = ['https://example.com/page1', 'https://example.com/page2', 'https://example.com/page3']

def fetch(url):
    response = requests.get(url)
    return response.text

with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(fetch, url_list))

print(results)

使用aiohttp库：

aiohttp是一个支持异步请求的库，可以实现高效的并发爬虫。

示例代码：

import aiohttp
import asyncio

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

async def main():
    url_list = ['https://example.com/page1', 'https://example.com/page2', 'https://example.com/page3']
    tasks = [fetch(url) for url in url_list]
    results = await asyncio.gather(*tasks)
    print(results)

asyncio.run(main())

使用Scrapy框架：

Scrapy是一个强大的爬虫框架，内置了并发请求的支持。

首先，安装Scrapy：

pip install scrapy

然后，创建一个Scrapy项目：

scrapy startproject my_project

接下来，在my_project/spiders目录下创建一个名为my_spider.py的文件，并添加以下代码：

import scrapy

class MySpider(scrapy.Spider):
    name = 'my_spider'
    start_urls = ['https://example.com/page1', 'https://example.com/page2', 'https://example.com/page3']

    def parse(self, response):
        yield {
            'url': response.url,
            'content': response.text
        }

最后，在项目根目录下运行以下命令启动爬虫：

scrapy crawl my_spider -o output.json

以上三种方法都可以实现Python爬虫的并发请求。requests库适用于简单的HTTP请求，aiohttp库适用于异步请求，而Scrapy框架则适用于更复杂的爬虫项目。

Python爬虫并发请求如何实现

相关阅读