- Python-文本处理
- Python-文本处理简介
- Python - 文本处理环境
- Python - 字符串不变性
- Python - 排序行
- Python - 重新格式化段落
- Python - 计算段落中的标记
- Python - 二进制 ASCII 转换
- Python - 字符串作为文件
- Python-向后读取文件
- Python - 过滤重复单词
- Python - 从文本中提取电子邮件
- Python - 从文本中提取 URL
- Python - 漂亮的打印
- Python - 文本处理状态机
- Python - 大写和翻译
- Python - 标记化
- Python - 删除停用词
- Python - 同义词和反义词
- Python - 文本翻译
- Python-单词替换
- Python-拼写检查
- Python - WordNet 接口
- Python - 语料库访问
- Python - 标记单词
- Python - 块和缝隙
- Python - 块分类
- Python-文本分类
- Python-二元组
- Python - 处理 PDF
- Python-处理Word文档
- Python - 读取 RSS 提要
- Python-情感分析
- Python - 搜索和匹配
- Python - 文本修改
- Python-文本换行
- Python-频率分布
- Python-文本摘要
- Python - 词干算法
- Python - 约束搜索
Python - 重新格式化段落
当我们处理大量文本并将其转换为可呈现的格式时,需要对段落进行格式化。我们可能只想以特定的宽度打印每一行,或者在打印一首诗时尝试增加下一行的缩进。在本章中,我们使用名为textwrap3 的模块来根据需要格式化段落。
首先我们需要安装所需的包如下
pip install textwrap3
包裹到固定宽度
在此示例中,我们指定段落每行 30 个字符的宽度。通过指定宽度参数的值来使用换行函数。
from textwrap3 import wrap text = 'In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando), the head of the Corleone Mafia family, is known to friends and associates as Godfather. He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughters wedding day.' x = wrap(text, 30) for i in range(len(x)): print(x[i])
当我们运行上面的程序时,我们得到以下输出 -
In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando), the head of the Corleone Mafia family, is known to friends and associates as Godfather. He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughters wedding day.
可变缩进
在此示例中,我们增加要打印的诗的每一行的缩进。
import textwrap3 FileName = ("path\poem.txt") print("**Before Formatting**") print(" ") data=file(FileName).readlines() for i in range(len(data)): print data[i] print(" ") print("**After Formatting**") print(" ") data=file(FileName).readlines() for i in range(len(data)): dedented_text = textwrap3.dedent(data[i]).strip() print dedented_text
当我们运行上面的程序时,我们得到以下输出 -
**Before Formatting** Summer is here. Sky is bright. Birds are gone. Nests are empty. Where is Rain? **After Formatting** Summer is here. Sky is bright. Birds are gone. Nests are empty. Where is Rain?