- Python-文本处理
- Python-文本处理简介
- Python - 文本处理环境
- Python - 字符串不变性
- Python - 排序行
- Python - 重新格式化段落
- Python - 计算段落中的标记
- Python - 二进制 ASCII 转换
- Python - 字符串作为文件
- Python-向后读取文件
- Python - 过滤重复单词
- Python - 从文本中提取电子邮件
- Python - 从文本中提取 URL
- Python - 漂亮的打印
- Python - 文本处理状态机
- Python - 大写和翻译
- Python - 标记化
- Python - 删除停用词
- Python - 同义词和反义词
- Python - 文本翻译
- Python-单词替换
- Python-拼写检查
- Python - WordNet 接口
- Python - 语料库访问
- Python - 标记单词
- Python - 块和缝隙
- Python - 块分类
- Python-文本分类
- Python-二元组
- Python - 处理 PDF
- Python-处理Word文档
- Python - 读取 RSS 提要
- Python-情感分析
- Python - 搜索和匹配
- Python - 文本修改
- Python-文本换行
- Python-频率分布
- Python-文本摘要
- Python - 词干算法
- Python - 约束搜索
Python - 处理 PDF
Python可以读取PDF文件,并在从中提取文本后打印出内容。为此,我们必须首先安装所需的模块PyPDF2。以下是安装模块的命令。您的 python 环境中应该已经安装了 pip。
pip install pypdf2
成功安装该模块后,我们可以使用该模块中提供的方法读取 PDF 文件。
import PyPDF2 pdfName = 'path\Tutorialspoint.pdf' read_pdf = PyPDF2.PdfFileReader(pdfName) page = read_pdf.getPage(0) page_content = page.extractText() print page_content
当我们运行上面的程序时,我们得到以下输出 -
Tutorials Point originated from the idea that there exists a class of readers who respond better to online content and prefer to learn new skills at their own pace from the comforts of their drawing rooms. The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated, we worked our way to adding fresh tutorials to our repository which now proudly flaunts a wealth of tutorials and allied articles on topics ranging from programming languages to web designing to academics and much more.
阅读多页
要读取多页 pdf 并打印每个页面的页码,我们使用带有 getPageNumber() 函数的循环。在下面的示例中,我们的 PDF 文件有两页。内容打印在两个单独的页面标题下。
import PyPDF2 pdfName = 'Path\Tutorialspoint2.pdf' read_pdf = PyPDF2.PdfFileReader(pdfName) for i in xrange(read_pdf.getNumPages()): page = read_pdf.getPage(i) print 'Page No - ' + str(1+read_pdf.getPageNumber(page)) page_content = page.extractText() print page_content
当我们运行上面的程序时,我们得到以下输出 -
Page No - 1 Tutorials Point originated from the idea that there exists a class of readers who respond better to online content and prefer to learn new skills at their own pace from the comforts of their drawing rooms. Page No - 2 The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated, we worked our way to adding fresh tutorials to our repository which now proudly flaunts a wealth of tutorials and allied articles on topics ranging from p rogramming languages to web designing to academics and much more.