- Python pandas教程
- Python Pandas - 主页
- Python Pandas - 简介
- Python Pandas - 环境设置
- 数据结构简介
- Python pandas - 系列
- Python Pandas - 数据帧
- Python Pandas - 面板
- Python Pandas - 基本功能
- 描述性统计
- 功能应用
- Python Pandas - 重新索引
- Python Pandas - 迭代
- Python Pandas - 排序
- 处理文本数据
- 选项和定制
- 索引和选择数据
- 统计功能
- Python Pandas - 窗口函数
- Python Pandas - 聚合
- Python Pandas - 缺失数据
- Python Pandas - GroupBy
- Python Pandas - 合并/连接
- Python Pandas - 连接
- Python Pandas - 日期功能
- Python Pandas - Timedelta
- Python Pandas - 分类数据
- Python Pandas - 可视化
- Python Pandas - IO 工具
- Python Pandas - 稀疏数据
- Python Pandas - 注意事项和陷阱
- 与SQL的比较
- Python Pandas 有用资源
- Python Pandas - 快速指南
- Python Pandas - 有用的资源
- Python Pandas - 讨论
Python Pandas - 聚合
创建滚动、扩展和ewm对象后,可以使用多种方法对数据执行聚合。
在 DataFrame 上应用聚合
让我们创建一个 DataFrame 并在其上应用聚合。
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r
其输出如下 -
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 0.790670 -0.387854 -0.668132 0.267283 2000-01-03 -0.575523 -0.965025 0.060427 -2.179780 2000-01-04 1.669653 1.211759 -0.254695 1.429166 2000-01-05 0.100568 -0.236184 0.491646 -0.466081 2000-01-06 0.155172 0.992975 -1.205134 0.320958 2000-01-07 0.309468 -0.724053 -1.412446 0.627919 2000-01-08 0.099489 -1.028040 0.163206 -1.274331 2000-01-09 1.639500 -0.068443 0.714008 -0.565969 2000-01-10 0.326761 1.479841 0.664282 -1.361169 Rolling [window=3,min_periods=1,center=False,axis=0]
我们可以通过将函数传递给整个 DataFrame 来进行聚合,或者通过标准的get item方法选择一个列。
在整个数据帧上应用聚合
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r.aggregate(np.sum)
其输出如下 -
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469
对数据框的单列应用聚合
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r['A'].aggregate(np.sum)
其输出如下 -
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 2000-01-01 1.088512 2000-01-02 1.879182 2000-01-03 1.303660 2000-01-04 1.884801 2000-01-05 1.194699 2000-01-06 1.925393 2000-01-07 0.565208 2000-01-08 0.564129 2000-01-09 2.048458 2000-01-10 2.065750 Freq: D, Name: A, dtype: float64
对 DataFrame 的多个列应用聚合
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r[['A','B']].aggregate(np.sum)
其输出如下 -
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B 2000-01-01 1.088512 -0.650942 2000-01-02 1.879182 -1.038796 2000-01-03 1.303660 -2.003821 2000-01-04 1.884801 -0.141119 2000-01-05 1.194699 0.010551 2000-01-06 1.925393 1.968551 2000-01-07 0.565208 0.032738 2000-01-08 0.564129 -0.759118 2000-01-09 2.048458 -1.820537 2000-01-10 2.065750 0.383357
在 DataFrame 的单列上应用多个函数
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r['A'].aggregate([np.sum,np.mean])
其输出如下 -
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 sum mean 2000-01-01 1.088512 1.088512 2000-01-02 1.879182 0.939591 2000-01-03 1.303660 0.434553 2000-01-04 1.884801 0.628267 2000-01-05 1.194699 0.398233 2000-01-06 1.925393 0.641798 2000-01-07 0.565208 0.188403 2000-01-08 0.564129 0.188043 2000-01-09 2.048458 0.682819 2000-01-10 2.065750 0.688583
在 DataFrame 的多个列上应用多个函数
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r[['A','B']].aggregate([np.sum,np.mean])
其输出如下 -
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B sum mean sum mean 2000-01-01 1.088512 1.088512 -0.650942 -0.650942 2000-01-02 1.879182 0.939591 -1.038796 -0.519398 2000-01-03 1.303660 0.434553 -2.003821 -0.667940 2000-01-04 1.884801 0.628267 -0.141119 -0.047040 2000-01-05 1.194699 0.398233 0.010551 0.003517 2000-01-06 1.925393 0.641798 1.968551 0.656184 2000-01-07 0.565208 0.188403 0.032738 0.010913 2000-01-08 0.564129 0.188043 -0.759118 -0.253039 2000-01-09 2.048458 0.682819 -1.820537 -0.606846 2000-01-10 2.065750 0.688583 0.383357 0.127786
将不同的函数应用于数据框的不同列
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(3, 4), index = pd.date_range('1/1/2000', periods=3), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r.aggregate({'A' : np.sum,'B' : np.mean})
其输出如下 -
A B C D 2000-01-01 -1.575749 -1.018105 0.317797 0.545081 2000-01-02 -0.164917 -1.361068 0.258240 1.113091 2000-01-03 1.258111 1.037941 -0.047487 0.867371 A B 2000-01-01 -1.575749 -1.018105 2000-01-02 -1.740666 -1.189587 2000-01-03 -0.482555 -0.447078