Python爬虫可以通过以下几种方式来伪装自己,以避免被网站封禁或限制访问:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = requests.get(url, headers=headers)
import requests
headers = {
'Referer': 'https://www.example.com'
}
response = requests.get(url, headers=headers)
import requests
headers = {
'Cookie': 'sessionid=xxxxxx'
}
response = requests.get(url, headers=headers)
import requests
proxies = {
'http': 'http://127.0.0.1:8888',
'https': 'https://127.0.0.1:8888'
}
response = requests.get(url, proxies=proxies)
需要注意的是,伪装爬虫的方式并不是绝对可靠的,有些网站可能会采取更复杂的反爬虫措施。在进行爬虫时,应该尊重网站的爬取规则,遵守robots.txt协议,并适度控制爬取频率,以避免给对方服务器带来过大的负担。