Lesson 02 Pandasを使ってみよう2

panda-plyのインストールが必要です。

  1. コマンドプロンプトで

    pip install pandas-ply
  2. ctl-Cで止めて、jupyter notebook を再起動する

  3. ブラウザーを一度閉じてから立ち上げる

pandas-ply/README.rst at master · coursera/pandas-ply
pandas-plyを使う - Qiita
Pythonでのデータ操作 - Pandas_plyrを使ってみる - Qiita

In [1]:
import pandas as pd
from pandas_ply import install_ply, X, sym_call

install_ply(pd)
In [28]:
import pandas as pd
csv_file_name = 'data/WA_Fn-UseC_-HR-Employee-Attrition.csv'

df = pd.read_csv(csv_file_name)
df.head()
Out[28]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

1. 列を選ぶ (列名を変更する、条件により値を決めた列を作成する)

ply_select

(例) 年齢が40以上の場合はTrueとし、それ以外の場合はFalseとする

In [3]:
df2 = df.ply_select("Department", "Age",
                Distance = X.DistanceFromHome,  ## カラム名を変更できる
                is_adult = (X.Age >= 40)  ## 新しいカラムを定義することも可能になる
                )
df2
Out[3]:
Department Age Distance is_adult
0 Sales 41 1 True
1 Research & Development 49 8 True
2 Research & Development 37 2 False
3 Research & Development 33 3 False
4 Research & Development 27 2 False
5 Research & Development 32 2 False
6 Research & Development 59 3 True
7 Research & Development 30 24 False
8 Research & Development 38 23 False
9 Research & Development 36 27 False
10 Research & Development 35 16 False
11 Research & Development 29 15 False
12 Research & Development 31 26 False
13 Research & Development 34 19 False
14 Research & Development 28 24 False
15 Research & Development 29 21 False
16 Research & Development 32 5 False
17 Research & Development 22 16 False
18 Sales 53 2 True
19 Research & Development 38 2 False
20 Research & Development 24 11 False
21 Sales 36 9 False
22 Research & Development 34 7 False
23 Research & Development 21 15 False
24 Research & Development 34 6 False
25 Research & Development 53 5 True
26 Research & Development 32 16 False
27 Sales 42 8 True
28 Research & Development 44 7 True
29 Sales 46 2 True
... ... ... ... ...
1440 Research & Development 36 4 False
1441 Research & Development 56 1 True
1442 Research & Development 29 1 False
1443 Research & Development 42 2 True
1444 Research & Development 56 7 True
1445 Research & Development 41 28 True
1446 Sales 34 28 False
1447 Sales 36 15 False
1448 Sales 41 3 True
1449 Research & Development 32 2 False
1450 Human Resources 35 26 False
1451 Sales 38 10 False
1452 Sales 50 1 True
1453 Sales 36 11 False
1454 Sales 45 20 True
1455 Research & Development 40 2 True
1456 Research & Development 35 18 False
1457 Research & Development 40 2 True
1458 Research & Development 35 1 False
1459 Research & Development 29 13 False
1460 Research & Development 29 28 False
1461 Sales 50 28 True
1462 Sales 39 24 False
1463 Research & Development 31 5 False
1464 Sales 26 5 False
1465 Research & Development 36 23 False
1466 Research & Development 39 6 False
1467 Research & Development 27 4 False
1468 Sales 49 2 True
1469 Research & Development 34 8 False

1470 rows × 4 columns

(例) 列名を変え、値を100分の1にし、先頭の10行にする。ピリオド(.)で続ける

In [4]:
(df
    .ply_select(
      EducationField=X.EducationField,
      DailyRate_x100 = X.DailyRate / 100
      )
    .head(10)
    )
Out[4]:
DailyRate_x100 EducationField
0 11.02 Life Sciences
1 2.79 Life Sciences
2 13.73 Other
3 13.92 Life Sciences
4 5.91 Medical
5 10.05 Life Sciences
6 13.24 Medical
7 13.58 Life Sciences
8 2.16 Life Sciences
9 12.99 Medical

2. 列をまとめて、平均値や計数値を得る

groupby

(例) 平均値はmean()、計数値はsize()

In [5]:
dataSummarize = (
    df
    .groupby('Department')
    .ply_select(
      meanAge=X.Age.mean(),
      candidateNum=X.Age.size(),
      )
    )
dataSummarize
Out[5]:
meanAge candidateNum
Department
Human Resources 37.809524 63
Research & Development 37.042664 961
Sales 36.542601 446

3. 行を選ぶ (条件に合う行だけを選ぶ)

ply_where

In [6]:
df.ply_where(X.Age>40, 
               X.BusinessTravel == "Travel_Rarely",
               X.EducationField == "Life Sciences"
               )  #全ての条件にAndで満たすデータだけが選択される
Out[6]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
18 53 No Travel_Rarely 1219 Sales 2 4 Life Sciences 1 23 ... 3 80 0 31 3 3 25 8 3 7
50 48 Yes Travel_Rarely 626 Research & Development 1 2 Life Sciences 1 64 ... 4 80 0 23 2 3 1 0 0 0
63 59 No Travel_Rarely 1435 Sales 25 3 Life Sciences 1 81 ... 4 80 0 28 3 2 21 16 7 9
67 45 No Travel_Rarely 1339 Research & Development 7 3 Life Sciences 1 86 ... 3 80 1 25 2 3 1 0 0 0
82 55 No Travel_Rarely 111 Sales 1 2 Life Sciences 1 106 ... 4 80 1 24 4 3 1 0 1 0
85 56 No Travel_Rarely 1400 Research & Development 7 3 Life Sciences 1 112 ... 1 80 0 37 3 2 6 4 0 2
87 51 No Travel_Rarely 432 Research & Development 9 4 Life Sciences 1 116 ... 2 80 2 10 4 3 4 2 0 3
122 56 Yes Travel_Rarely 441 Research & Development 14 4 Life Sciences 1 161 ... 1 80 3 7 2 3 5 4 4 3
123 51 No Travel_Rarely 684 Research & Development 6 3 Life Sciences 1 162 ... 3 80 0 23 5 3 20 18 15 15
133 41 No Travel_Rarely 802 Sales 9 1 Life Sciences 1 176 ... 3 80 1 12 2 3 9 7 0 7
148 41 No Travel_Rarely 933 Research & Development 9 4 Life Sciences 1 200 ... 4 80 1 7 2 3 5 0 1 4
153 45 No Travel_Rarely 194 Research & Development 9 3 Life Sciences 1 206 ... 3 80 1 20 2 1 17 9 0 15
163 57 No Travel_Rarely 334 Research & Development 24 2 Life Sciences 1 223 ... 2 80 1 12 2 1 5 3 1 4
165 50 No Travel_Rarely 1452 Research & Development 11 3 Life Sciences 1 226 ... 2 80 0 21 5 3 5 4 4 4
166 41 No Travel_Rarely 465 Research & Development 14 3 Life Sciences 1 227 ... 1 80 1 13 2 3 9 8 1 8
174 45 No Travel_Rarely 1268 Sales 4 2 Life Sciences 1 240 ... 1 80 1 9 3 4 5 4 0 3
175 56 No Travel_Rarely 713 Research & Development 8 3 Life Sciences 1 241 ... 3 80 1 19 3 3 2 2 2 2
190 52 No Travel_Rarely 699 Research & Development 1 4 Life Sciences 1 259 ... 1 80 1 34 5 3 33 18 11 9
213 51 No Travel_Rarely 1469 Research & Development 8 4 Life Sciences 1 296 ... 4 80 2 16 5 1 10 9 4 7
215 41 No Travel_Rarely 896 Sales 6 3 Life Sciences 1 298 ... 3 80 0 16 3 3 1 0 0 0
225 59 No Travel_Rarely 142 Research & Development 3 3 Life Sciences 1 309 ... 1 80 1 7 6 3 1 0 0 0
230 52 No Travel_Rarely 1323 Research & Development 2 3 Life Sciences 1 316 ... 2 80 0 6 3 2 2 2 2 2
242 41 No Travel_Rarely 1411 Research & Development 19 2 Life Sciences 1 334 ... 1 80 2 17 2 2 1 0 0 0
253 42 No Travel_Rarely 916 Research & Development 17 2 Life Sciences 1 347 ... 3 80 0 10 1 3 3 2 0 2
258 51 No Travel_Rarely 833 Research & Development 1 3 Life Sciences 1 353 ... 2 80 0 1 0 2 1 0 0 0
279 50 No Travel_Rarely 797 Research & Development 4 1 Life Sciences 1 385 ... 1 80 2 28 4 2 10 4 1 6
281 42 No Travel_Rarely 635 Sales 1 1 Life Sciences 1 387 ... 3 80 0 20 3 3 20 16 11 6
300 41 No Travel_Rarely 334 Sales 2 4 Life Sciences 1 410 ... 2 80 0 22 2 3 22 10 0 4
329 47 No Travel_Rarely 1482 Research & Development 5 5 Life Sciences 1 447 ... 2 80 1 21 2 3 3 2 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1194 47 No Travel_Rarely 1225 Sales 2 4 Life Sciences 1 1676 ... 3 80 3 29 2 3 3 2 1 2
1195 49 No Travel_Rarely 809 Research & Development 1 3 Life Sciences 1 1677 ... 1 80 0 23 2 3 8 7 0 0
1196 41 No Travel_Rarely 1206 Sales 23 2 Life Sciences 1 1678 ... 4 80 0 21 2 3 2 0 0 2
1200 44 No Travel_Rarely 528 Human Resources 1 3 Life Sciences 1 1683 ... 1 80 3 8 2 3 2 2 2 2
1214 44 No Travel_Rarely 921 Research & Development 2 3 Life Sciences 1 1703 ... 2 80 1 9 2 3 8 7 6 7
1231 46 No Travel_Rarely 717 Research & Development 13 4 Life Sciences 1 1727 ... 4 80 0 19 3 3 10 7 0 9
1235 46 No Travel_Rarely 1277 Sales 2 3 Life Sciences 1 1732 ... 2 80 1 13 5 2 10 6 0 3
1243 45 No Travel_Rarely 176 Human Resources 4 3 Life Sciences 1 1744 ... 3 80 2 9 2 4 5 0 0 3
1266 41 No Travel_Rarely 548 Research & Development 9 4 Life Sciences 1 1772 ... 2 80 2 5 2 3 5 3 0 4
1269 43 No Travel_Rarely 244 Human Resources 2 3 Life Sciences 1 1778 ... 2 80 0 10 5 3 9 7 1 8
1294 41 No Travel_Rarely 447 Research & Development 5 3 Life Sciences 1 1814 ... 1 80 0 11 3 1 3 2 1 2
1303 47 No Travel_Rarely 1001 Research & Development 4 3 Life Sciences 1 1827 ... 3 80 1 28 4 3 22 11 14 10
1321 47 No Travel_Rarely 207 Research & Development 9 4 Life Sciences 1 1856 ... 3 80 0 7 2 3 2 2 2 0
1322 46 No Travel_Rarely 706 Research & Development 2 2 Life Sciences 1 1857 ... 3 80 1 12 4 2 9 8 4 7
1325 42 No Travel_Rarely 1142 Research & Development 8 3 Life Sciences 1 1860 ... 4 80 0 8 3 3 0 0 0 0
1331 48 No Travel_Rarely 1224 Research & Development 10 3 Life Sciences 1 1867 ... 4 80 0 29 3 3 22 10 12 9
1333 46 Yes Travel_Rarely 1254 Sales 10 3 Life Sciences 1 1869 ... 3 80 3 14 2 3 8 7 0 7
1346 45 No Travel_Rarely 556 Research & Development 25 2 Life Sciences 1 1888 ... 4 80 2 10 2 2 9 8 3 8
1352 44 No Travel_Rarely 170 Research & Development 1 4 Life Sciences 1 1903 ... 4 80 1 10 5 3 2 0 2 2
1354 56 Yes Travel_Rarely 1162 Research & Development 24 2 Life Sciences 1 1907 ... 4 80 0 5 3 3 4 2 1 0
1374 58 No Travel_Rarely 605 Sales 21 3 Life Sciences 1 1938 ... 3 80 1 29 2 2 1 0 0 0
1396 53 Yes Travel_Rarely 1168 Sales 24 4 Life Sciences 1 1968 ... 2 80 0 15 2 2 2 2 2 2
1397 54 No Travel_Rarely 155 Research & Development 9 2 Life Sciences 1 1969 ... 3 80 2 9 6 2 4 3 2 3
1399 43 No Travel_Rarely 574 Research & Development 11 3 Life Sciences 1 1971 ... 2 80 1 10 1 3 10 9 0 9
1419 42 No Travel_Rarely 557 Research & Development 18 4 Life Sciences 1 1998 ... 3 80 1 9 3 2 4 3 1 2
1420 41 No Travel_Rarely 642 Research & Development 1 3 Life Sciences 1 1999 ... 1 80 1 12 3 3 5 3 1 0
1443 42 No Travel_Rarely 300 Research & Development 2 3 Life Sciences 1 2031 ... 1 80 0 24 2 2 22 6 4 14
1445 41 No Travel_Rarely 582 Research & Development 28 4 Life Sciences 1 2034 ... 3 80 1 21 3 3 20 7 0 10
1448 41 No Travel_Rarely 930 Sales 3 3 Life Sciences 1 2037 ... 3 80 1 14 5 3 5 4 0 4
1454 45 No Travel_Rarely 374 Sales 20 3 Life Sciences 1 2046 ... 3 80 0 8 3 3 5 3 0 1

137 rows × 35 columns

In [7]:
## under 30 
(df
    .ply_where(X.Age < 30)
    .head(10)
    )
Out[7]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2
11 29 No Travel_Rarely 153 Research & Development 15 2 Life Sciences 1 15 ... 4 80 0 10 3 3 9 5 0 8
14 28 Yes Travel_Rarely 103 Research & Development 24 3 Life Sciences 1 19 ... 2 80 0 6 4 3 4 2 0 3
15 29 No Travel_Rarely 1389 Research & Development 21 4 Life Sciences 1 20 ... 3 80 1 10 1 3 10 9 8 8
17 22 No Non-Travel 1123 Research & Development 16 2 Medical 1 22 ... 2 80 2 1 2 2 1 0 0 0
20 24 No Non-Travel 673 Research & Development 11 2 Other 1 26 ... 4 80 1 5 5 2 4 2 1 3
23 21 No Travel_Rarely 391 Research & Development 15 2 Life Sciences 1 30 ... 4 80 0 0 6 3 0 0 0 0
34 24 Yes Travel_Rarely 813 Research & Development 1 3 Medical 1 45 ... 1 80 1 6 2 2 2 0 2 0
41 27 No Travel_Rarely 1240 Research & Development 2 4 Life Sciences 1 54 ... 4 80 1 1 6 3 1 0 0 0
42 26 Yes Travel_Rarely 1357 Research & Development 25 3 Life Sciences 1 55 ... 3 80 0 1 2 2 1 0 0 1

10 rows × 35 columns

In [8]:
df.rename(columns={'Age': '年齢', 'Attrition': '自然減', 'BusinessTravel': '出張', 'DailyRate': '日当', \
        'Department': '部署', 'DistanceFromHome': '通勤距離', 'Education': '教育', 'EducationField': '教育領域'}, \
        index={'ONE': 'one'}, inplace=True)
df.head()
Out[8]:
年齢 自然減 出張 日当 部署 通勤距離 教育 教育領域 EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

5. 日本語フォントが使用できるかを確認する

In [9]:
%matplotlib inline
import seaborn as sns
sns.set(font='IPAexGothic')
sns.plt.plot([0,1], [0,1]);  sns.plt.title('tofu - 豆腐')
Out[9]:
<matplotlib.text.Text at 0x146bb196a58>

上のグラフで「豆腐」の文字が見えない場合

6. 簡単なグラフを描く

In [10]:
dataSummarize = (
    df
    .groupby('部署')
    .ply_select(
      平均通勤距離=X.通勤距離.mean(),
      #candidateNum=X.年齢.size(),
      )
    )
dataSummarize
Out[10]:
平均通勤距離
部署
Human Resources 8.698413
Research & Development 9.144641
Sales 9.365471
In [11]:
import matplotlib.pyplot as plt
dataSummarize.plot()
plt.show()

7. ピボットテーブルを試してみる

pandasでピボットテーブルを扱う · For myself tomorrow

In [12]:
pt = pd.pivot_table(df,
            # 集計したい縦のキー
               index=['部署','出張'],

            # 集計したい横のキー(複数指定可)
               columns='教育領域',

            # 集計したい項目 (指定がなければ、上記のキーになっていない項目)
               values='EmployeeCount',

            # 個数をカウントする。これがないとValuesの平均値になる。
               aggfunc=lambda x : len(x),

            # # NaN を 0埋めする
               fill_value = 0,
            )
pt
Out[12]:
教育領域 Human Resources Life Sciences Marketing Medical Other Technical Degree
部署 出張
Human Resources Non-Travel 4 1 0 1 0 0
Travel_Frequently 6 2 0 2 1 0
Travel_Rarely 17 13 0 10 2 4
Research & Development Non-Travel 0 43 0 39 4 11
Travel_Frequently 0 91 0 68 10 13
Travel_Rarely 0 306 0 256 50 70
Sales Non-Travel 0 19 12 10 3 3
Travel_Frequently 0 30 27 16 3 8
Travel_Rarely 0 101 120 62 9 23
In [13]:
pt.plot()
plt.show()
In [14]:
pt = pd.pivot_table(df,
            # 集計したい縦のキー
               index=['部署','出張'],

            # 集計したい横のキー(複数指定可)
               columns=['教育領域', '自然減'],

            # 集計したい項目 (指定がなければ、上記のキーになっていない項目)
               values='EmployeeCount',

            # 個数をカウントする。これがないとValuesの平均値になる。
               aggfunc=lambda x : len(x),

            # # NaN を 0埋めする
               fill_value = 0,
            )
pt
Out[14]:
教育領域 Human Resources Life Sciences Marketing Medical Other Technical Degree
自然減 No Yes No Yes No Yes No Yes No Yes No Yes
部署 出張
Human Resources Non-Travel 4 0 1 0 0 0 1 0 0 0 0 0
Travel_Frequently 3 3 1 1 0 0 2 0 1 0 0 0
Travel_Rarely 13 4 13 0 0 0 8 2 2 0 2 2
Research & Development Non-Travel 0 0 40 3 0 0 36 3 4 0 9 2
Travel_Frequently 0 0 70 21 0 0 57 11 9 1 9 4
Travel_Rarely 0 0 271 35 0 0 223 33 44 6 56 14
Sales Non-Travel 0 0 18 1 11 1 9 1 2 1 3 0
Travel_Frequently 0 0 20 10 19 8 14 2 1 2 2 6
Travel_Rarely 0 0 83 18 94 26 51 11 8 1 19 4

8. 行の順序を整える

  1. 名称順にする
    • 正順
      • df.reindex(index = natsorted(df.index))
    • 逆順にする
      • df.reindex(index = reversed(natsorted(df.index)))
  2. 指定順にする
    1. 順序を得る
      • olist = df.index
        1. olistの要素を並べ変える
        2. df.reindex(index = olist)

In [15]:
pt
Out[15]:
教育領域 Human Resources Life Sciences Marketing Medical Other Technical Degree
自然減 No Yes No Yes No Yes No Yes No Yes No Yes
部署 出張
Human Resources Non-Travel 4 0 1 0 0 0 1 0 0 0 0 0
Travel_Frequently 3 3 1 1 0 0 2 0 1 0 0 0
Travel_Rarely 13 4 13 0 0 0 8 2 2 0 2 2
Research & Development Non-Travel 0 0 40 3 0 0 36 3 4 0 9 2
Travel_Frequently 0 0 70 21 0 0 57 11 9 1 9 4
Travel_Rarely 0 0 271 35 0 0 223 33 44 6 56 14
Sales Non-Travel 0 0 18 1 11 1 9 1 2 1 3 0
Travel_Frequently 0 0 20 10 19 8 14 2 1 2 2 6
Travel_Rarely 0 0 83 18 94 26 51 11 8 1 19 4
In [16]:
pt.index
Out[16]:
MultiIndex(levels=[['Human Resources', 'Research & Development', 'Sales'], ['Non-Travel', 'Travel_Frequently', 'Travel_Rarely']],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]],
           names=['部署', '出張'])
In [17]:
from natsort import natsorted
#pt.sort_index(ascending=True, inplace=True)
pt.reindex(index=reversed(natsorted(pt.index)))
Out[17]:
教育領域 Human Resources Life Sciences Marketing Medical Other Technical Degree
自然減 No Yes No Yes No Yes No Yes No Yes No Yes
部署 出張
Sales Travel_Rarely 0 0 83 18 94 26 51 11 8 1 19 4
Travel_Frequently 0 0 20 10 19 8 14 2 1 2 2 6
Non-Travel 0 0 18 1 11 1 9 1 2 1 3 0
Research & Development Travel_Rarely 0 0 271 35 0 0 223 33 44 6 56 14
Travel_Frequently 0 0 70 21 0 0 57 11 9 1 9 4
Non-Travel 0 0 40 3 0 0 36 3 4 0 9 2
Human Resources Travel_Rarely 13 4 13 0 0 0 8 2 2 0 2 2
Travel_Frequently 3 3 1 1 0 0 2 0 1 0 0 0
Non-Travel 4 0 1 0 0 0 1 0 0 0 0 0
In [18]:
index3 = natsorted(pt.index, reverse=True)
index3
Out[18]:
[('Sales', 'Travel_Rarely'),
 ('Sales', 'Travel_Frequently'),
 ('Sales', 'Non-Travel'),
 ('Research & Development', 'Travel_Rarely'),
 ('Research & Development', 'Travel_Frequently'),
 ('Research & Development', 'Non-Travel'),
 ('Human Resources', 'Travel_Rarely'),
 ('Human Resources', 'Travel_Frequently'),
 ('Human Resources', 'Non-Travel')]
In [19]:
index3 = [('Sales', 'Travel_Rarely'),
 ('Sales', 'Travel_Frequently'),
 ('Sales', 'Non-Travel'),
 ('Human Resources', 'Travel_Rarely'),
 ('Human Resources', 'Travel_Frequently'),
 ('Human Resources', 'Non-Travel'),
 ('Research & Development', 'Travel_Rarely'),
 ('Research & Development', 'Travel_Frequently'),
 ('Research & Development', 'Non-Travel')]
In [20]:
ptx = pt.reindex(index=index3)
ptx
Out[20]:
教育領域 Human Resources Life Sciences Marketing Medical Other Technical Degree
自然減 No Yes No Yes No Yes No Yes No Yes No Yes
部署 出張
Sales Travel_Rarely 0 0 83 18 94 26 51 11 8 1 19 4
Travel_Frequently 0 0 20 10 19 8 14 2 1 2 2 6
Non-Travel 0 0 18 1 11 1 9 1 2 1 3 0
Human Resources Travel_Rarely 13 4 13 0 0 0 8 2 2 0 2 2
Travel_Frequently 3 3 1 1 0 0 2 0 1 0 0 0
Non-Travel 4 0 1 0 0 0 1 0 0 0 0 0
Research & Development Travel_Rarely 0 0 271 35 0 0 223 33 44 6 56 14
Travel_Frequently 0 0 70 21 0 0 57 11 9 1 9 4
Non-Travel 0 0 40 3 0 0 36 3 4 0 9 2
In [21]:
import seaborn as sns
sns.set(font='IPAexGothic')
#ptx = pt.reindex(index=natsorted(pt.index, reverse=True))
ptx = pt.reindex(index=index3)
ptx.plot(kind='barh', stacked=False)
plt.show()
In [22]:
import seaborn as sns
sns.set(font='IPAexGothic')
#ptx = pt.reindex(index=natsorted(pt.index, reverse=True))
ptx = pt.reindex(index=index3)
ptx.plot(kind='barh', stacked=True)
plt.show()
In [23]:
df_age = df.sort_values(by=["年齢"], ascending=True)
df_age
Out[23]:
年齢 自然減 出張 日当 部署 通勤距離 教育 教育領域 EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
1311 18 No Non-Travel 1431 Research & Development 14 3 Medical 1 1839 ... 3 80 0 0 4 1 0 0 0 0
457 18 Yes Travel_Frequently 1306 Sales 5 3 Marketing 1 614 ... 4 80 0 0 3 3 0 0 0 0
972 18 No Non-Travel 1124 Research & Development 1 3 Life Sciences 1 1368 ... 3 80 0 0 5 4 0 0 0 0
301 18 No Travel_Rarely 812 Sales 10 3 Medical 1 411 ... 1 80 0 0 2 3 0 0 0 0
296 18 Yes Travel_Rarely 230 Research & Development 3 3 Life Sciences 1 405 ... 3 80 0 0 2 3 0 0 0 0
1153 18 Yes Travel_Frequently 544 Sales 3 2 Medical 1 1624 ... 3 80 0 0 2 4 0 0 0 0
727 18 No Non-Travel 287 Research & Development 5 2 Life Sciences 1 1012 ... 4 80 0 0 2 3 0 0 0 0
828 18 Yes Non-Travel 247 Research & Development 8 1 Medical 1 1156 ... 4 80 0 0 0 3 0 0 0 0
909 19 No Travel_Rarely 265 Research & Development 25 3 Life Sciences 1 1269 ... 4 80 0 1 2 3 1 0 0 1
422 19 Yes Travel_Rarely 489 Human Resources 2 2 Technical Degree 1 566 ... 3 80 0 1 3 4 1 0 0 0
688 19 Yes Travel_Rarely 419 Sales 21 3 Other 1 959 ... 2 80 0 1 3 4 1 0 0 0
127 19 Yes Travel_Rarely 528 Sales 22 1 Marketing 1 167 ... 4 80 0 0 2 2 0 0 0 0
171 19 Yes Travel_Frequently 602 Sales 1 1 Technical Degree 1 235 ... 1 80 0 1 5 4 0 0 0 0
177 19 Yes Travel_Rarely 303 Research & Development 2 3 Life Sciences 1 243 ... 3 80 0 1 3 2 1 0 1 0
892 19 Yes Non-Travel 504 Research & Development 10 3 Medical 1 1248 ... 2 80 0 1 2 4 1 1 0 0
149 19 No Travel_Rarely 1181 Research & Development 3 1 Medical 1 201 ... 4 80 0 1 3 3 1 0 0 0
853 19 No Travel_Rarely 645 Research & Development 9 2 Life Sciences 1 1193 ... 3 80 0 1 4 3 1 1 0 0
689 20 Yes Travel_Rarely 129 Research & Development 4 3 Technical Degree 1 960 ... 2 80 0 1 2 3 1 0 0 0
876 20 No Travel_Rarely 654 Sales 21 3 Marketing 1 1226 ... 4 80 0 2 2 3 2 1 2 2
662 20 Yes Travel_Rarely 500 Sales 2 3 Medical 1 922 ... 4 80 0 2 3 2 2 2 0 2
776 20 Yes Travel_Frequently 769 Sales 9 3 Marketing 1 1077 ... 2 80 0 2 3 3 2 2 0 2
856 20 No Travel_Rarely 805 Research & Development 3 3 Life Sciences 1 1198 ... 1 80 0 2 2 2 2 2 1 2
487 20 No Travel_Rarely 959 Research & Development 1 3 Life Sciences 1 657 ... 4 80 0 1 0 4 1 0 0 0
513 20 Yes Travel_Rarely 1362 Research & Development 10 1 Medical 1 701 ... 4 80 0 1 5 3 1 0 1 1
1178 20 No Travel_Rarely 1141 Sales 2 3 Medical 1 1657 ... 1 80 0 2 3 3 2 2 2 2
1197 20 No Travel_Rarely 727 Sales 9 1 Life Sciences 1 1680 ... 1 80 0 2 3 3 2 2 0 2
102 20 Yes Travel_Frequently 871 Research & Development 6 3 Life Sciences 1 137 ... 2 80 0 1 5 3 1 0 1 0
731 20 Yes Travel_Rarely 1097 Research & Development 11 3 Medical 1 1016 ... 1 80 0 1 2 3 1 0 0 0
815 21 No Travel_Rarely 984 Research & Development 1 1 Technical Degree 1 1131 ... 3 80 0 2 6 4 2 2 2 2
370 21 Yes Travel_Rarely 156 Sales 12 3 Life Sciences 1 494 ... 4 80 0 1 0 3 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
163 57 No Travel_Rarely 334 Research & Development 24 2 Life Sciences 1 223 ... 2 80 1 12 2 1 5 3 1 4
1374 58 No Travel_Rarely 605 Sales 21 3 Life Sciences 1 1938 ... 3 80 1 29 2 2 1 0 0 0
1301 58 No Non-Travel 350 Sales 2 3 Medical 1 1824 ... 4 80 1 37 0 2 16 9 14 14
308 58 No Non-Travel 390 Research & Development 1 4 Life Sciences 1 422 ... 4 80 1 12 2 3 5 3 1 2
700 58 Yes Travel_Rarely 289 Research & Development 2 3 Technical Degree 1 977 ... 1 80 0 7 4 3 1 0 0 0
966 58 Yes Travel_Rarely 601 Research & Development 7 4 Medical 1 1360 ... 4 80 0 31 0 2 10 9 5 9
98 58 No Travel_Rarely 682 Sales 10 4 Medical 1 131 ... 3 80 0 38 1 2 37 10 1 8
1310 58 No Travel_Frequently 1216 Research & Development 15 4 Life Sciences 1 1837 ... 2 80 0 23 3 3 2 2 2 2
1009 58 No Travel_Rarely 1055 Research & Development 1 3 Medical 1 1423 ... 3 80 1 32 3 3 9 8 1 5
938 58 No Travel_Rarely 848 Research & Development 23 4 Life Sciences 1 1308 ... 4 80 2 2 3 3 2 2 2 2
157 58 No Travel_Rarely 1145 Research & Development 9 3 Medical 1 214 ... 2 80 1 9 3 2 1 0 0 0
674 58 No Travel_Rarely 1272 Research & Development 5 3 Technical Degree 1 940 ... 4 80 1 24 3 3 6 0 0 4
126 58 Yes Travel_Rarely 147 Research & Development 23 4 Medical 1 165 ... 4 80 1 40 3 2 40 10 15 6
595 58 Yes Travel_Rarely 286 Research & Development 2 4 Life Sciences 1 825 ... 4 80 0 40 2 3 31 15 13 8
660 58 Yes Travel_Frequently 781 Research & Development 2 1 Life Sciences 1 918 ... 4 80 1 3 3 2 1 0 0 0
743 59 No Travel_Rarely 715 Research & Development 2 3 Life Sciences 1 1032 ... 1 80 0 30 4 3 5 3 4 3
6 59 No Travel_Rarely 1324 Research & Development 3 3 Medical 1 10 ... 1 80 3 12 3 2 1 0 0 0
758 59 No Travel_Rarely 1089 Sales 1 2 Technical Degree 1 1048 ... 3 80 1 14 1 1 6 4 0 4
897 59 No Travel_Rarely 326 Sales 3 3 Life Sciences 1 1254 ... 4 80 0 13 2 3 6 1 0 5
919 59 No Travel_Rarely 1429 Research & Development 18 4 Medical 1 1283 ... 4 80 0 25 6 2 9 7 5 4
225 59 No Travel_Rarely 142 Research & Development 3 3 Life Sciences 1 309 ... 1 80 1 7 6 3 1 0 0 0
70 59 No Travel_Frequently 1225 Sales 1 1 Life Sciences 1 91 ... 4 80 0 20 2 2 4 3 1 3
105 59 No Non-Travel 1420 Human Resources 2 4 Human Resources 1 140 ... 4 80 1 30 3 3 3 2 2 2
63 59 No Travel_Rarely 1435 Sales 25 3 Life Sciences 1 81 ... 4 80 0 28 3 2 21 16 7 9
232 59 No Travel_Rarely 818 Human Resources 6 2 Medical 1 321 ... 4 80 0 7 2 2 2 2 2 2
536 60 No Travel_Rarely 1179 Sales 16 4 Marketing 1 732 ... 4 80 0 10 1 3 2 2 2 2
427 60 No Travel_Frequently 1499 Sales 28 3 Marketing 1 573 ... 4 80 0 22 5 4 18 13 13 11
411 60 No Travel_Rarely 422 Research & Development 7 3 Life Sciences 1 549 ... 4 80 0 33 5 1 29 8 11 10
879 60 No Travel_Rarely 696 Sales 7 4 Marketing 1 1233 ... 2 80 1 12 3 3 11 7 1 9
1209 60 No Travel_Rarely 370 Research & Development 1 4 Medical 1 1697 ... 3 80 1 19 2 4 1 0 0 0

1470 rows × 35 columns

In [24]:
dataSummarize
Out[24]:
平均通勤距離
部署
Human Resources 8.698413
Research & Development 9.144641
Sales 9.365471
In [25]:
dataSummarize.plot(kind='barh', legend=False)
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x146bc977710>
In [26]:
dataSummarize_r = dataSummarize.sort_values(by=["平均通勤距離"], ascending=False)
dataSummarize_r
Out[26]:
平均通勤距離
部署
Sales 9.365471
Research & Development 9.144641
Human Resources 8.698413
In [27]:
dataSummarize_r.plot(kind='barh', legend=False)
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x146bc907e80>