- Python-文本处理
- Python-文本处理简介
- Python - 文本处理环境
- Python - 字符串不变性
- Python - 排序行
- Python - 重新格式化段落
- Python - 计算段落中的标记
- Python - 二进制 ASCII 转换
- Python - 字符串作为文件
- Python-向后读取文件
- Python - 过滤重复单词
- Python - 从文本中提取电子邮件
- Python - 从文本中提取 URL
- Python - 漂亮的打印
- Python - 文本处理状态机
- Python - 大写和翻译
- Python - 标记化
- Python - 删除停用词
- Python - 同义词和反义词
- Python - 文本翻译
- Python-单词替换
- Python-拼写检查
- Python - WordNet 接口
- Python - 语料库访问
- Python - 标记单词
- Python - 块和缝隙
- Python - 块分类
- Python-文本分类
- Python-二元组
- Python - 处理 PDF
- Python-处理Word文档
- Python - 读取 RSS 提要
- Python-情感分析
- Python - 搜索和匹配
- Python - 文本修改
- Python-文本换行
- Python-频率分布
- Python-文本摘要
- Python - 词干算法
- Python - 约束搜索
Python-文本摘要
文本摘要涉及从大量文本生成摘要,该摘要在某种程度上描述了大量文本的上下文。在下面的示例中,我们使用模块 genism 及其汇总函数来实现此目的。我们安装下面的包来实现这一点。
pip install gensim_sum_ext
下面一段是关于电影Plotly的。摘要功能用于从文本正文本身获取几行来生成摘要。
from gensim.summarization import summarize text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \ "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \ "the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \ "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \ "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \ " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \ "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \ "refused their advances; the men received minimal punishment from the presiding judge. " + \ "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \ "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \ "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \ "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \ "future service if necessary." print summarize(text)
当我们运行上面的程序时,我们得到以下输出 -
He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding day.
提取关键词
我们还可以使用 gensim 库中的 keywords 函数从文本正文中提取关键字,如下所示。
from gensim.summarization import keywords text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \ "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \ "the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \ "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \ "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \ " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \ "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \ "refused their advances; the men received minimal punishment from the presiding judge. " + \ "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \ "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \ "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \ "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \ "future service if necessary." print keywords(text)
当我们运行上面的程序时,我们得到以下输出 -
corleone men corleones daughter wedding summer new vito family hagen robert