【从0开始Python开发实战】掌握Python urllib，HTTP开发进阶

发布时间：2020-05-19 05:50:23 作者：jextop
来源：网络阅读：644

目录：

1. urllib介绍

2. 功能开发http_util.py

3. 单元测试test_http_util.py，实际调用百度AI语音合成接口

4. 常见问题和解决方法

一，urllib介绍

Python在AI领域是主流开发语言，更多的软件应用开发也开始使用Python，有非常多且好用的标准库和第三方组件，urllib是Python自带的标准库，不需要安装，用于HTTP开发，主要模块和功能：

1. urllib.request: 发送HTTP请求，接收处理HTTP响应

2. urllib.error: 处理错误和异常，包含urllib.request抛出的异常

3. urllib.parse: 解析URL和编码数据

4. urllib.robotparser: 解析robots.txt

本文介绍urllib库中的request, error, parse三个模块在HTTP开发时的代码应用，并封装成http_util，单元测试函数实际调用百度AI语音合成接口，和项目开发一致：

1，调用鉴权接口获取token：https://openapi.baidu.com/oauth/2.0/token

2，调用语音合成接口转换文本为语音：https://tsn.baidu.com/text2audio

开源项目：https://github.com/jextop/starter_service

示例代码：https://github.com/rickding/HelloPython/tree/master/hello_http

├── util

│ └── http_util.py

│ └── file_util.py

│ └── code_util.py

├── tests

│ └── test_http_util.py

二，功能开发

	代码文件	功能要点
功能开发	http_util.py	调用urllib发送请求，接收和处理响应
单元测试	test_http_util.py	测试封装的功能函数，调用百度AI鉴权接口获取access_token，调用语音合成接口将文本转换为语音
辅助功能	file_util.py	下载文件时，保存文件
辅助功能	code_util.py	保存文件时生成唯一名称

1. urllib.request.urlopen()可以直接调用url地址，得到响应，读取数据，简单HTTP请求使用起来非常方便：

from urllib import request

resp = request.urlopen(r'http://www.baidu.com')
data = resp.read()
data_str = data.decode('utf-8')
print(data_str)

2. urllib.request.urlopen()支持Request类型参数，构造Request时配置更多的参数如header, data：

from urllib import parse
from urllib import request

req = request.Request(
    'https://tsn.baidu.com/text2audio',
    headers={
        'Content-Type': 'application/x-www-form-urlencoded',
    },
    data=parse.urlencode({
        'tex': 'Python开发HTTP请求和处理响应',
    }).encode('utf-8'),
    method='POST'
)
resp = request.urlopen(req)
data = resp.read()
data_str = data.decode('utf-8')
print(data_str)

3. 封装http_data()函数，发起http请求和处理响应，注意encode()函数是对parse.urlencode()的调用封装，将data转换为byte数据：

import logging
from urllib import error
from urllib import parse
from urllib import request

log = logging.getLogger(__name__)

def http_data(url, headers={}, data=None, method=None):
    try:
        req = request.Request(url, headers=headers, method=method, data=encode(data))
        resp = request.urlopen(req)
        data = resp.read()
        log.info('http_data returns: %d, %s, %s' % (len(data), type(data), url))
        return data
    except (ConnectionError, error.URLError) as e:
        log.error('ConnectionError in http_data: %s, %s' % (url, str(e)))
    except Exception as e:
        log.error('Exception in http_data: %s, %s' % (url, str(e)))

    return None

def encode(data):
    if isinstance(data, dict):
        data = parse.urlencode(data)

    if isinstance(data, str):
        data = data.encode('utf-8')

    return data

4. 封装http_str()，将HTTP响应数据解析为字符串：

def http_str(url, headers={}, data=None, method=None):
data = http_data(url, headers, data, method)
return None if data is None else data.decode('utf-8')

5. 封装http_json()，在响应数据为JSON内容时，解析为dict：

def http_json(url, headers={}, data=None, method=None):
data = http_data(url, headers, data, method)
return None if data is None else json.loads(data)

6. 封装http_file()，下载文件时，返回数据为二进制文件内容，存储到服务器或客户端，代码详见http_util.http_file()函数，执行流程：

a) 拼装请求参数，得到响应，调用resp.read()读取二进制数据文件内容

b) 读取头信息Content-Disposition，是否返回了文件名称？比如：attachment;fileName=zip.zip

c) 读取头信息Content-Type，解析文件格式，比如：audio/wav

d) 将文件数据保存

三，单元测试test_http_util.py

调用http_util封装的功能函数，测试如下：

1. test_http_str()请求http://www.baidu.com，得到字符串信息：

class HttpUtilTest(TestCase):
    def test_http_str(self):
        ret_str = http_str('http://www.baidu.com')
        log.info('http_str returns: %s' % ret_str[0: 20])
        self.assertIsNotNone(ret_str)

2. Test_http_json()调用百度鉴权接口库，返回JSON格式数据，解析得到access_token

class HttpUtilTest(TestCase):
    def test_http_json(self):
        ret_json = http_json('https://openapi.baidu.com/oauth/2.0/token', headers={
            'Content-Type': 'application/x-www-form-urlencoded',
        }, data={
            'grant_type': 'client_credentials',
            'client_id': 'kVcnfD9iW2XVZSMaLMrtLYIz',
            'client_secret': 'O9o1O213UgG5LFn0bDGNtoRN3VWl2du6',
        }, method='POST')

        log.info('http_json returns: %s' % ret_json)
        self.assertIsNotNone(ret_json)

        token = ret_json.get('access_token')
        print(token)

3. call_http_file()调用百度AI语音合成接口，得到文本转换成的语音数据，保存为文件：

class HttpUtilTest(TestCase):
    def call_http_file(self, token):
        [ret, file_name, data] = http_file('https://tsn.baidu.com/text2audio', headers={
            'Content-Type': 'application/x-www-form-urlencoded',
        }, data={
            'tex': 'Python开发异步任务调度和业务处理',
            'tok': token,
            'cuid': 'starter_service_http_util',
            'ctp': '1',
            'lan': 'zh',
            'spd': '6',
            'per': '1',
        }, method='POST', save_to_disc=True, save_as_temp=False)

        log.info('http_file returns: %s, %s, %s' % (ret, str(file_name), type(data)))
        self.assertIsNotNone(ret is False or len(data) > 0)

4. 运行python manage.py test，得到语音文件<项目路径>/tmp/http_download/xxx.mp3：

【从0开始Python开发实战】掌握Python urllib，HTTP开发进阶

四，常见问题和解决方法

l 下载文件时，如果获取文件名称和类型？

解决：HTTP响应返回的头信息中含有文件名称和类型，但是注意这些信息可能没有返回，代码中要判断信息不存在的情况：

- 头信息Content-Disposition，包含文件名称，比如：attachment;fileName=zip.zip

- 头信息Content-Type，包含文件格式，比如：audio/wav

# parse file type, e.g. audio/wav
file_type = None
content_type = resp.getheader('Content-Type')
if '/' in str(content_type):
    file_type = content_type.split('/')[1]

# parse file name, e.g. attachment;fileName=zip.zip
file_name = None
disposition = resp.getheader('Content-Disposition')
if '=' in str(disposition):
    file_name = disposition.split('=')[1]

# use a new file name
if file_name is None:
    file_name = '%s.%s' % (get_code(), file_type or 'dat')

l 调用请求传递参数时出错：

POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.

另一个同类错误：

can't concat str to bytes

解决：调用urllib.parse.urlencode()将数据编码，或者调用封装的http_util.encode()

原因：urllib.request.Request()的data参数类型要求byte

【从0开始Python开发实战】掌握Python urllib，HTTP开发进阶

相关阅读