在进行Python爬虫开发时,异常处理是确保程序稳定运行的关键。以下是一些常见的异常处理方法:
使用try-except
块:
在可能抛出异常的代码块中使用try
和except
块来捕获和处理异常。
import requests
try:
response = requests.get('http://example.com')
response.raise_for_status() # 如果响应状态码不是200,会抛出HTTPError异常
except requests.exceptions.HTTPError as e:
print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
print(f"Request Exception: {e}")
except Exception as e:
print(f"Unexpected Error: {e}")
else:
print("Request successful")
# 处理成功的响应
使用logging
模块:
使用logging
模块记录异常信息,以便后续分析和调试。
import logging
import requests
logging.basicConfig(filename='spider.log', level=logging.ERROR)
try:
response = requests.get('http://example.com')
response.raise_for_status()
except requests.exceptions.HTTPError as e:
logging.error(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
logging.error(f"Request Exception: {e}")
except Exception as e:
logging.error(f"Unexpected Error: {e}")
else:
print("Request successful")
# 处理成功的响应
使用finally
块:
finally
块中的代码无论是否发生异常都会执行,适合用于清理资源。
import requests
try:
response = requests.get('http://example.com')
response.raise_for_status()
except requests.exceptions.HTTPError as e:
print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
print(f"Request Exception: {e}")
except Exception as e:
print(f"Unexpected Error: {e}")
else:
print("Request successful")
# 处理成功的响应
finally:
print("Request completed")
使用asyncio
和aiohttp
进行异步爬虫:
在异步爬虫中,可以使用try-except
块来捕获和处理异常。
import aiohttp
import asyncio
async def fetch(session, url):
try:
async with session.get(url) as response:
response.raise_for_status()
return await response.text()
except aiohttp.ClientError as e:
print(f"Client Error: {e}")
except Exception as e:
print(f"Unexpected Error: {e}")
async def main():
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'http://example.com')
print(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
通过这些方法,可以有效地处理爬虫过程中可能出现的各种异常,确保程序的稳定性和可靠性。