电子工业出版社有限公司店铺主页二维码
电子工业出版社有限公司
电子工业出版社有限公司有赞官方供货商,为客户提供一流的知识产品及服务。
微信扫描二维码,访问我们的微信店铺

Python应用实战:爬虫、文本分析与可视化

31.50
运费: 免运费
 Python应用实战:爬虫、文本分析与可视化  商品图0
 Python应用实战:爬虫、文本分析与可视化  商品图1
 Python应用实战:爬虫、文本分析与可视化  商品缩略图0  Python应用实战:爬虫、文本分析与可视化  商品缩略图1

商品详情

书名: Python应用实战:爬虫、文本分析与可视化
定价:42.0
ISBN:9787121380136
作者:张丽
版次:2020
出版时间:2020-03

内容提要:

    迎来到Python的世界。本书介绍了Python的语法、数据结构等基础知识,以及经典的Python爬虫、网页文本分析及可视化。在本书中,读者不仅可以与Python“结识”,还会遇到新“朋友”———浏览器的开发者工具,通过它来了解HTML编写网页的语言,并进行结构化的网页分析和所需数据的提取。

拿来主义特别适合来类比Python语言中的库,Python将与re、requests、lxml等经典的库组合在一起,自动抓取网页数据的爬虫。Pandas这个工具会对抓取的数据进行文本分析,并实现将枯燥的数据进行漂亮的可视化呈现。

千里之行,始于足下,欢迎进入本书的奇妙之旅。




作者简介:

张丽,女,电子科技大学大数据研究中心教师,主要从事数据分析、Python语言的应用以及项目开发等相关课程的教学和科研。



目录:

   


 

1  初识Python························ 1

1.1  使用IDLE····························· 1

1.2  从字符串着手························ 4

1.3  复杂数据的福音——列表·········· 7

1.3.1  创建列表···························· 7

1.3.2  列表的操作·························· 7

1.4  处理数据——条件判断············· 9

1.5  处理数据——循环·················· 11

1.6  处理数据进阶——嵌套语句······ 12

1.7  函数·································· 14

1.8  拿来就用——模块················· 16

1.9  文件·································· 17

1.10  处理异常··························· 18

2  网页································ 20

2.1  工具准备····························· 20

2.2  URL开始························· 21

2.2.1  简单获取URL····················· 22

2.2.2  链接与URL························ 24

2.3  编写网页的语言——HTML······· 25

2.3.1  创建自己的**个网页············ 26

2.3.2  标签——创建网页的方块·········· 27

2.3.3  标签属性··························· 30

2.4  CSSclass·························· 31

2.5  Javascriptid······················ 33

2.6  网页分析工具······················· 36

2.6.1  谷歌开发者工具··················· 36

2.6.2  查看网页结构······················ 38

2.6.3  定位指定的元素··················· 39

2.6.4  筛选不同的资源··················· 41

2.7  网页的快递——HTTP············· 44

2.7.1  HTTP请求························· 45

2.7.2  HTTP响应························· 46

2.7.3  HTTP的应用——CookieSession 47

2.7.4  实战——HTTP的交互过程········ 49

2.8  URL结束························· 52

2.9  本章总结····························· 55

3  数据抓取··························· 56

3.1  工具准备····························· 56

3.2  Xpathlxml.html·················· 58

3.2.1  网页分析利器——lxml············· 58

3.2.2  XPath······························ 59

3.2.3  XPath使用实例···················· 60

3.2.4  XPath演示························· 61

3.3  关于robots.txt······················· 62

3.4  小试牛刀····························· 64

3.4.1  过程分析··························· 64

3.4.2  动手敲代码························ 67

3.4.3  小结······························· 68

3.4.4  扩展······························· 68

3.5  获取电影数据(上)··············· 69

3.5.1  过程分析··························· 70

3.5.2  动手敲代码························ 73

3.5.3  小结······························· 74

3.6  获取电影数据(下)··············· 75

3.6.1  过程分析··························· 76

3.6.2  动手敲代码························ 76

3.6.3  考虑加强代码的健壮性············ 78

3.6.4  小结······························· 80

3.7  另类的网页抓取···················· 80

3.7.1  过程分析··························· 81

3.7.2  动手敲代码························ 84

3.7.3  小结······························· 85

3.8  爬虫与网络机器人················· 85

3.9  本章总结····························· 86

4  文本处理··························· 87

4.1  正则表达式·························· 87

4.1.1  怎样进行匹配······················ 87

4.1.2  常用的元字符······················ 88

4.2  更强的文本工具——Python
       re
·································· 89

4.2.1  匹配对象怎么用··················· 91

4.2.2  使用regex来搜索·················· 91

4.2.3  使用regex来替换·················· 93

4.2.4  更方便查找························ 95

4.2.5  re库中的控制标志················· 95

4.2.6  replace()re.sub()·················· 98

4.2.7  实现更高级的strip()方法··········· 99

4.2.8  新的拆分方法re.split()············ 100

4.2.9  怎样提取中文···················· 101

4.3  电影数据的处理··················· 102

4.3.1  提取之前的观察·················· 104

4.3.2  需要获取哪些数据················ 104

4.3.3  多样化的方法····················· 111

4.3.4  格式化的数据····················· 112

4.4  本章总结··························· 115

5  数据分析·························· 116

5.1  工具准备··························· 116

5.1.1  配置Jupyter Notebook············· 116

5.1.2  数据生成帮手——Numpy········· 116

5.1.3  Pandas中的数据结构············· 118

5.2  像一维数组的Series·············· 118

5.2.1  获取Series信息·················· 120

5.2.2  Series进行数学运算·············· 123

5.2.3  Series进行一些操作··········· 124

5.2.4  方法串联························· 128<

电子工业出版社有限公司店铺主页二维码
电子工业出版社有限公司
电子工业出版社有限公司有赞官方供货商,为客户提供一流的知识产品及服务。
扫描二维码,访问我们的微信店铺

Python应用实战:爬虫、文本分析与可视化

手机启动微信
扫一扫购买

收藏到微信 or 发给朋友

1. 打开微信,扫一扫左侧二维码

2. 点击右上角图标

点击右上角分享图标

3. 发送给朋友、分享到朋友圈、收藏

发送给朋友、分享到朋友圈、收藏

微信支付

支付宝

扫一扫购买

收藏到微信 or 发给朋友

1. 打开微信,扫一扫左侧二维码

2. 点击右上角图标

点击右上角分享图标

3. 发送给朋友、分享到朋友圈、收藏

发送给朋友、分享到朋友圈、收藏