要绕过JavaScript渲染,您可以使用以下方法:
from selenium import webdriver
url = 'https://example.com'
driver = webdriver.Chrome()
driver.get(url)
content = driver.page_source
import requests
from bs4 import BeautifulSoup
url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
content = soup.prettify()
import scrapy
from scrapy_splash import SplashRequest
class ExampleSpider(scrapy.Spider):
name = 'example'
start_urls = ['https://example.com']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url, callback=self.parse, args={'wait': 1})
def parse(self, response):
content = response.text
请注意,绕过JavaScript渲染可能会导致您无法获取到所有数据,因为有些内容是动态加载的。在使用这些方法时,请确保遵守目标网站的robots.txt规则和相关法律法规。