您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# 如何用Python进行静态爬虫及地址经纬度转换
## 一、静态网页爬虫基础
静态网页爬虫是指从无需JavaScript渲染的HTML页面中直接提取数据的技术。Python凭借丰富的库成为爬虫开发的首选语言。
### 1.1 核心工具库
```python
import requests # 网络请求
from bs4 import BeautifulSoup # HTML解析
import pandas as pd # 数据存储
url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# 示例:提取所有链接
links = [a['href'] for a in soup.find_all('a', href=True)]
以下示例演示如何从静态页面获取地址信息:
def scrape_addresses():
url = "http://www.address-source.com/cities"
headers = {'User-Agent': 'Mozilla/5.0'}
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'lxml')
addresses = []
for item in soup.select('.address-item'):
addr = {
'city': item.find('h2').text.strip(),
'street': item.find('span', class_='street').text,
'zipcode': item.find('span', class_='zip').text
}
addresses.append(addr)
return pd.DataFrame(addresses)
except Exception as e:
print(f"爬取失败: {e}")
return None
推荐使用以下服务: - 高德地图API(国内推荐) - Google Maps Geocoding API - 百度地图API
import hashlib
def gaode_geocode(address, api_key):
base_url = "https://restapi.amap.com/v3/geocode/geo"
params = {
'address': address,
'key': api_key,
'output': 'JSON'
}
response = requests.get(base_url, params=params)
data = response.json()
if data['status'] == '1' and data['geocodes']:
location = data['geocodes'][0]['location']
lng, lat = location.split(',')
return float(lng), float(lat)
return None
def batch_geocode(df, api_key):
results = []
for addr in df['full_address']:
coords = gaode_geocode(addr, api_key)
results.append({
'address': addr,
'longitude': coords[0] if coords else None,
'latitude': coords[1] if coords else None
})
return pd.DataFrame(results)
# 步骤1:爬取地址数据
address_df = scrape_addresses()
# 步骤2:拼接完整地址
address_df['full_address'] = (address_df['city'] +
address_df['street'] +
address_df['zipcode'])
# 步骤3:地理编码转换
api_key = "your_amap_api_key" # 需提前申请
geo_df = batch_geocode(address_df, api_key)
# 步骤4:保存结果
geo_df.to_csv('address_with_coordinates.csv', index=False)
import time
time.sleep(1) # 每次请求间隔1秒
import folium
def create_map(geo_df):
m = folium.Map(location=[geo_df['latitude'].mean(),
geo_df['longitude'].mean()],
zoom_start=12)
for _, row in geo_df.iterrows():
folium.Marker([row['latitude'], row['longitude']],
popup=row['address']).add_to(m)
return m
通过以上方法,您可以高效实现地址信息的采集与地理坐标转换,为后续的空间分析奠定数据基础。 “`
(注:实际使用时需替换示例中的网址和API密钥,并确保遵守相关网站的使用条款)
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。