如何用python爬虫 - 问答

要用Python进行网络爬虫，您需要使用一些库和工具。以下是一些建议的步骤：

安装依赖库：在开始编写爬虫之前，您需要安装一些Python库，如requests（用于发送HTTP请求）和BeautifulSoup（用于解析HTML文档）。您可以使用pip命令来安装这些库：

pip install requests
pip install beautifulsoup4

import requests
from bs4 import BeautifulSoup

url = 'https://example.com'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

paragraphs = soup.find_all('p')
for p in paragraphs:
    print(p.get_text())

with open('output.txt', 'w', encoding='utf-8') as f:
    for p in paragraphs:
        f.write(p.get_text() + '\n')

这只是一个简单的爬虫示例。实际上，您可能需要处理更复杂的网站结构和分页。您还可以查看其他库，如Scrapy（用于构建强大的爬虫）和Selenium（用于处理JavaScript渲染的网站）。

0 赞

0 踩