Python 爬虫 urllib模块：get方式

发布时间：2020-07-07 14:04:20 作者：虎皮喵的喵
来源：网络阅读：415

本程序以爬取百度首页为例

格式：

导入urllib.request

打开爬取的网页: response = urllib.request.urlopen('网址')

读取网页代码: html = response.read()

打印:

1.不decode

print(html) #爬取的网页代码会不分行，没有空格显示，很难看

2.decode

print(html.decode()) #爬取的网页代码会分行，像写规范的代码一样，看起来很舒服

查询请求结果：

a. response.status # 返回 200：请求成功 404：网页找不到，请求失败

b. response.getcode() # 返回 200：请求成功 404：网页找不到，请求失败

1.不decode的程序如下：

import urllib.request

response = urllib.request.urlopen('www.baidu.com')
html = response.read()
print(html)
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)

运行结果：

Python 爬虫 urllib模块：get方式

2.decode的程序如下：

import urllib.request

response = urllib.request.urlopen('www.baidu.com')
html = response.read()

print(html.decode())
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)

运行结果：

<!DOCTYPE html>
<!--STATUS OK-->


<html>
<head>
    
    <meta http-equiv="content-type" content="text/html;charset=utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta content="always" name="referrer">
    <meta name="theme-color" content="#2932e1">
    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
    <link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="百度搜索" />
    <link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu_85beaf5496f291521eb75ba38eacbd87.svg">


<link rel="dns-prefetch" href="//s1.bdstatic.com"/>
<link rel="dns-prefetch" href="//t1.baidu.com"/>
<link rel="dns-prefetch" href="//t2.baidu.com"/>
<link rel="dns-prefetch" href="//t3.baidu.com"/>
<link rel="dns-prefetch" href="//t10.baidu.com"/>
<link rel="dns-prefetch" href="//t11.baidu.com"/>
<link rel="dns-prefetch" href="//t12.baidu.com"/>
<link rel="dns-prefetch" href="//b1.bdstatic.com"/>
    
    <title>百度一下，你就知道</title>
    

<style id="css_index" index="index" type="text/css">html,body{height:100%}
.
.
.
.


</body>
</html>






------------------------------------------------------------------
------------------------------------------------------------------
------------------------------------------------------------------
200

Python 爬虫 urllib模块：get方式

相关阅读