商品详情
定价:42.0
ISBN:9787121380136
作者:张丽
版次:2020
出版时间:2020-03
内容提要:
迎来到Python的世界。本书介绍了Python的语法、数据结构等基础知识,以及经典的Python爬虫、网页文本分析及可视化。在本书中,读者不仅可以与Python“结识”,还会遇到新“朋友”———浏览器的开发者工具,通过它来了解HTML编写网页的语言,并进行结构化的网页分析和所需数据的提取。
拿来主义特别适合来类比Python语言中的库,Python将与re、requests、lxml等经典的库组合在一起,自动抓取网页数据的爬虫。Pandas这个工具会对抓取的数据进行文本分析,并实现将枯燥的数据进行漂亮的可视化呈现。
千里之行,始于足下,欢迎进入本书的奇妙之旅。
作者简介:
张丽,女,电子科技大学大数据研究中心教师,主要从事数据分析、Python语言的应用以及项目开发等相关课程的教学和科研。
目录:
目 录
第1章 初识Python························ 1
1.1 使用IDLE····························· 1
1.2 从字符串着手························ 4
1.3 复杂数据的福音——列表·········· 7
1.3.1 创建列表···························· 7
1.3.2 列表的操作·························· 7
1.4 处理数据——条件判断············· 9
1.5 处理数据——循环·················· 11
1.6 处理数据进阶——嵌套语句······ 12
1.7 函数·································· 14
1.8 拿来就用——模块················· 16
1.9 文件·································· 17
1.10 处理异常··························· 18
第2章 网页································ 20
2.1 工具准备····························· 20
2.2 从URL开始························· 21
2.2.1 简单获取URL····················· 22
2.2.2 链接与URL························ 24
2.3 编写网页的语言——HTML······· 25
2.3.1 创建自己的**个网页············ 26
2.3.2 标签——创建网页的方块·········· 27
2.3.3 标签属性··························· 30
2.4 CSS与class·························· 31
2.5 Javascript和id······················ 33
2.6 网页分析工具······················· 36
2.6.1 谷歌开发者工具··················· 36
2.6.2 查看网页结构······················ 38
2.6.3 定位指定的元素··················· 39
2.6.4 筛选不同的资源··················· 41
2.7 网页的快递——HTTP············· 44
2.7.1 HTTP请求························· 45
2.7.2 HTTP响应························· 46
2.7.3 HTTP的应用——Cookie和Session 47
2.7.4 实战——HTTP的交互过程········ 49
2.8 以URL结束························· 52
2.9 本章总结····························· 55
第3章 数据抓取··························· 56
3.1 工具准备····························· 56
3.2 Xpath和lxml.html·················· 58
3.2.1 网页分析利器——lxml············· 58
3.2.2 XPath······························ 59
3.2.3 XPath使用实例···················· 60
3.2.4 XPath演示························· 61
3.3 关于robots.txt······················· 62
3.4 小试牛刀····························· 64
3.4.1 过程分析··························· 64
3.4.2 动手敲代码························ 67
3.4.3 小结······························· 68
3.4.4 扩展······························· 68
3.5 获取电影数据(上)··············· 69
3.5.1 过程分析··························· 70
3.5.2 动手敲代码························ 73
3.5.3 小结······························· 74
3.6 获取电影数据(下)··············· 75
3.6.1 过程分析··························· 76
3.6.2 动手敲代码························ 76
3.6.3 考虑加强代码的健壮性············ 78
3.6.4 小结······························· 80
3.7 另类的网页抓取···················· 80
3.7.1 过程分析··························· 81
3.7.2 动手敲代码························ 84
3.7.3 小结······························· 85
3.8 爬虫与网络机器人················· 85
3.9 本章总结····························· 86
第4章 文本处理··························· 87
4.1 正则表达式·························· 87
4.1.1 怎样进行匹配······················ 87
4.1.2 常用的元字符······················ 88
4.2 更强的文本工具——Python的
re库·································· 89
4.2.1 匹配对象怎么用··················· 91
4.2.2 使用regex来搜索·················· 91
4.2.3 使用regex来替换·················· 93
4.2.4 更方便查找························ 95
4.2.5 re库中的控制标志················· 95
4.2.6 replace()和re.sub()·················· 98
4.2.7 实现更高级的strip()方法··········· 99
4.2.8 新的拆分方法re.split()············ 100
4.2.9 怎样提取中文···················· 101
4.3 电影数据的处理··················· 102
4.3.1 提取之前的观察·················· 104
4.3.2 需要获取哪些数据················ 104
4.3.3 多样化的方法····················· 111
4.3.4 格式化的数据····················· 112
4.4 本章总结··························· 115
第5章 数据分析·························· 116
5.1 工具准备··························· 116
5.1.1 配置Jupyter Notebook············· 116
5.1.2 数据生成帮手——Numpy········· 116
5.1.3 Pandas中的数据结构············· 118
5.2 像一维数组的Series·············· 118
5.2.1 获取Series信息·················· 120
5.2.2 Series进行数学运算·············· 123
5.2.3 对Series进行一些操作··········· 124
5.2.4 方法串联························· 128<
- 电子工业出版社有限公司
- 电子工业出版社有限公司有赞官方供货商,为客户提供一流的知识产品及服务。
- 扫描二维码,访问我们的微信店铺