Lesson 01 Pandasを使ってみよう¶

pandasのDataFrameの概要と生成方法 | hydroculのメモ
pandasにはSeriesとDataFrameという2つのデータ構造があり、 Seriesは1次元配列に似ているのに対して、 DataFrameは2次元配列というかエクセルのようなスプレッドシートに似ている。

In [1]:
import pandas as pd

1. データフレームにcsvファイルを読み込む¶

In [2]:
csv_file_name = 'data/WA_Fn-UseC_-HR-Employee-Attrition.csv'

df = pd.read_csv(csv_file_name)
df.head()
Out[2]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

2. データフレームにxlsxファイルを読み込む¶

  • 以下のどれでも、dfに1シート目が割り当てられます。

2.1 pd.ExcelFile(...)¶

In [3]:
xlsx_file_name = 'data/WA_Fn-UseC_-HR-Employee-Attrition.xlsx'

xl = pd.ExcelFile(xlsx_file_name)
xl.sheet_names
Out[3]:
['WA_Fn-UseC_-HR-Employee-Attriti', 'Data Definitions']
In [4]:
df = xl.parse('WA_Fn-UseC_-HR-Employee-Attriti')
df.head()
Out[4]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

In [5]:
df = xl.parse(xl.sheet_names[0])
df.head()
Out[5]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

2.2 pd.read_excel(...)¶

In [6]:
df = pd.read_excel(xlsx_file_name, sheetname = 'WA_Fn-UseC_-HR-Employee-Attriti')
df.head()
Out[6]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

In [7]:
df = pd.read_excel(xlsx_file_name)
df.head()
Out[7]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

3. 行数の確認¶

In [8]:
len(df)
Out[8]:
1470

4. 次元数の確認¶

In [9]:
df.shape #(行数、列数)の形で返す
Out[9]:
(1470, 35)

5. データフレームの情報の一覧¶

In [10]:
df.info() #カラム名とその型の一覧
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 35 columns):
Age                         1470 non-null int64
Attrition                   1470 non-null object
BusinessTravel              1470 non-null object
DailyRate                   1470 non-null int64
Department                  1470 non-null object
DistanceFromHome            1470 non-null int64
Education                   1470 non-null int64
EducationField              1470 non-null object
EmployeeCount               1470 non-null int64
EmployeeNumber              1470 non-null int64
EnvironmentSatisfaction     1470 non-null int64
Gender                      1470 non-null object
HourlyRate                  1470 non-null int64
JobInvolvement              1470 non-null int64
JobLevel                    1470 non-null int64
JobRole                     1470 non-null object
JobSatisfaction             1470 non-null int64
MaritalStatus               1470 non-null object
MonthlyIncome               1470 non-null int64
MonthlyRate                 1470 non-null int64
NumCompaniesWorked          1470 non-null int64
Over18                      1470 non-null object
OverTime                    1470 non-null object
PercentSalaryHike           1470 non-null int64
PerformanceRating           1470 non-null int64
RelationshipSatisfaction    1470 non-null int64
StandardHours               1470 non-null int64
StockOptionLevel            1470 non-null int64
TotalWorkingYears           1470 non-null int64
TrainingTimesLastYear       1470 non-null int64
WorkLifeBalance             1470 non-null int64
YearsAtCompany              1470 non-null int64
YearsInCurrentRole          1470 non-null int64
YearsSinceLastPromotion     1470 non-null int64
YearsWithCurrManager        1470 non-null int64
dtypes: int64(26), object(9)
memory usage: 402.0+ KB

6. ヒストグラム¶

df.hist(...)¶

In [11]:
import numpy as np
from pandas import *
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(font='IPAexGothic')
df["DailyRate"].hist(linewidth = 1, alpha=.5)
plt.xlabel("DailyRate")
plt.ylabel("Freq")
plt.show()
In [12]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(font='IPAexGothic')
df["DailyRate"].hist(orientation='horizontal', alpha=.5)
plt.xlabel("DailyRate")
plt.ylabel("Freq")
plt.show()

7. 散布図¶

plt.scatter(...)¶

In [13]:
plt.scatter(df['HourlyRate'], df['DailyRate'])
plt.show()

8. 多変量散布図¶

pd.plotting.scatter_matrix(df, ...)¶

In [14]:
pd.plotting.scatter_matrix(df[['HourlyRate', 'DailyRate', 'DistanceFromHome']], alpha=0.2, figsize=(6, 6), diagonal='kde')
plt.show()

9. 共分散行列¶

In [15]:
df[['HourlyRate', 'DailyRate', 'DistanceFromHome']].cov()
Out[15]:
HourlyRate DailyRate DistanceFromHome
HourlyRate 413.285626 191.800350 5.130567
DailyRate 191.800350 162819.593737 -16.308004
DistanceFromHome 5.130567 -16.308004 65.721251
In [16]:
df.cov()
Out[16]:
Age DailyRate DistanceFromHome Education EmployeeCount EmployeeNumber EnvironmentSatisfaction HourlyRate JobInvolvement JobLevel ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
Age 83.455049 39.298434 -0.124873 1.946390 0.0 -55.797199 0.101319 4.510422 0.193841 5.153276 ... 0.528776 0.0 0.291977 48.361684 -0.231093 -0.138695 17.423359 7.046750 6.373743 6.587332
DailyRate 39.298434 162819.593737 -16.308004 -6.945424 0.0 -12386.713294 8.095750 191.800350 13.246309 1.324944 ... 3.423048 0.0 14.489565 45.570709 1.275892 -10.789322 -84.187085 14.520296 -43.206982 -37.957055
DistanceFromHome -0.124873 -16.308004 65.721251 0.174705 0.0 160.649502 -0.142451 5.130567 0.050667 0.047586 ... 0.057478 0.0 0.309961 0.291951 -0.386118 -0.152094 0.472219 0.553521 0.261991 0.416715
Education 1.946390 -6.945424 0.174705 1.048914 0.0 25.939251 -0.030370 0.349263 0.030927 0.115170 ... -0.010097 0.0 0.016076 1.181612 -0.033143 0.007105 0.433659 0.223515 0.179056 0.252390
EmployeeCount 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
EmployeeNumber -55.797199 -12386.713294 160.649502 25.939251 0.0 362433.299749 11.595582 430.551701 -2.950629 -12.341279 ... -45.473775 0.0 31.920482 -67.289749 18.320126 4.384426 -41.458396 -18.357800 -17.496817 -19.755358
EnvironmentSatisfaction 0.101319 8.095750 -0.142451 -0.030370 0.0 11.595582 1.194829 -1.107908 -0.006438 0.001466 ... 0.009059 0.0 0.003197 -0.022905 -0.027283 0.021335 0.009761 0.071317 0.057040 -0.019496
HourlyRate 4.510422 191.800350 5.130567 0.349263 0.0 430.551701 -1.107908 413.285626 0.620006 -0.626800 ... 0.029244 0.0 0.870674 -0.369139 -0.224036 -0.066170 -2.438866 -1.775575 -1.750142 -1.459700
JobInvolvement 0.193841 13.246309 0.050667 0.030927 0.0 -2.950629 -0.006438 0.620006 0.506319 -0.009948 ... 0.026386 0.0 0.013049 -0.030634 -0.014071 -0.007348 -0.093097 0.022473 -0.055454 0.065951
JobLevel 5.153276 1.324944 0.047586 0.115170 0.0 -12.341279 0.001466 -0.626800 -0.009948 1.225316 ... 0.025901 0.0 0.013190 6.737044 -0.025961 0.029574 3.626435 1.561913 1.262322 1.482250
JobSatisfaction -0.049285 13.604357 -0.032802 -0.012759 0.0 -30.705067 -0.008179 -1.599339 -0.016853 -0.002373 ... -0.014850 0.0 0.010046 -0.173208 -0.008217 -0.015161 -0.025693 -0.009209 -0.064728 -0.108830
MonthlyIncome 21412.198982 14641.125975 -649.386355 457.874204 0.0 -42028.530023 -32.210416 -1511.673923 -51.159481 4952.416922 ... 131.703156 0.0 21.693112 28312.303770 -131.935513 102.053699 14833.730990 6205.846259 5233.677307 5780.054075
MonthlyRate 1823.988823 -92428.502266 1585.264627 -190.148240 0.0 54198.679015 292.537298 -2213.447553 -82.667086 311.714963 ... -31.439933 0.0 -208.164513 1464.435332 13.461200 40.043086 -1031.535222 -330.479133 35.937006 -933.244190
NumCompaniesWorked 6.837739 38.457493 -0.592359 0.323165 0.0 -1.881380 0.034389 1.125195 0.026684 0.394036 ... 0.142425 0.0 0.064016 4.618854 -0.212734 -0.014764 -1.812334 -0.821380 -0.296339 -0.983301
PercentSalaryHike 0.121489 33.529204 1.193809 -0.041648 0.0 -28.520432 -0.126824 -0.674252 -0.044805 -0.140705 ... -0.160226 0.0 0.023476 -0.586872 -0.024636 -0.008480 -0.807021 -0.020156 -0.261286 -0.156517
PerformanceRating 0.006276 0.068910 0.079300 -0.009068 0.0 -4.422436 -0.011654 -0.015930 -0.007464 -0.008476 ... -0.012231 0.0 0.001078 0.018933 -0.007247 0.000656 0.007594 0.045738 0.020808 0.029389
RelationshipSatisfaction 0.528776 3.423048 0.057478 -0.010097 0.0 -45.473775 0.009059 0.029244 0.026386 0.025901 ... 1.169013 0.0 -0.042335 0.202360 0.003480 0.014975 0.128287 -0.059242 0.116692 -0.003347
StandardHours 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
StockOptionLevel 0.291977 14.489565 0.309961 0.016076 0.0 31.920482 0.003197 0.870674 0.013049 0.013190 ... -0.042335 0.0 0.726035 0.067200 0.012385 0.002485 0.078607 0.156884 0.039408 0.075091
TotalWorkingYears 48.361684 45.570709 0.291951 1.181612 0.0 -67.289749 -0.022905 -0.369139 -0.030634 6.737044 ... 0.202360 0.0 0.067200 60.540563 -0.357740 0.005539 29.942577 12.978065 10.151009 12.748396
TrainingTimesLastYear -0.231093 1.275892 -0.386118 -0.033143 0.0 18.320126 -0.027283 -0.224036 -0.014071 -0.025961 ... 0.003480 0.0 0.012385 -0.357740 1.662219 0.025569 0.028188 -0.026801 -0.008586 -0.018841
WorkLifeBalance -0.138695 -10.789322 -0.152094 0.007105 0.0 4.384426 0.021335 -0.066170 -0.007348 0.029574 ... 0.014975 0.0 0.002485 0.005539 0.025569 0.499108 0.052325 0.127616 0.020355 0.006956
YearsAtCompany 17.423359 -84.187085 0.472219 0.433659 0.0 -41.458396 0.009761 -2.438866 -0.093097 3.626435 ... 0.128287 0.0 0.078607 29.942577 0.028188 0.052325 37.534310 16.842239 12.208813 16.815196
YearsInCurrentRole 7.046750 14.520296 0.553521 0.223515 0.0 -18.357800 0.071317 -1.775575 0.022473 1.561913 ... -0.059242 0.0 0.156884 12.978065 -0.026801 0.127616 16.842239 13.127122 6.398725 9.235198
YearsSinceLastPromotion 6.373743 -43.206982 0.261991 0.179056 0.0 -17.496817 0.057040 -1.750142 -0.055454 1.262322 ... 0.116692 0.0 0.039408 10.151009 -0.008586 0.020355 12.208813 6.398725 10.384057 5.866587
YearsWithCurrManager 6.587332 -37.957055 0.416715 0.252390 0.0 -19.755358 -0.019496 -1.459700 0.065951 1.482250 ... -0.003347 0.0 0.075091 12.748396 -0.018841 0.006956 16.815196 9.235198 5.866587 12.731595

26 rows × 26 columns

In [17]:
df.corr()
Out[17]:
Age DailyRate DistanceFromHome Education EmployeeCount EmployeeNumber EnvironmentSatisfaction HourlyRate JobInvolvement JobLevel ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
Age 1.000000 0.010661 -0.001686 0.208034 NaN -0.010145 0.010146 0.024287 0.029820 0.509604 ... 0.053535 NaN 0.037510 0.680381 -0.019621 -0.021490 0.311309 0.212901 0.216513 0.202089
DailyRate 0.010661 1.000000 -0.004985 -0.016806 NaN -0.050990 0.018355 0.023381 0.046135 0.002966 ... 0.007846 NaN 0.042143 0.014515 0.002453 -0.037848 -0.034055 0.009932 -0.033229 -0.026363
DistanceFromHome -0.001686 -0.004985 1.000000 0.021042 NaN 0.032916 -0.016075 0.031131 0.008783 0.005303 ... 0.006557 NaN 0.044872 0.004628 -0.036942 -0.026556 0.009508 0.018845 0.010029 0.014406
Education 0.208034 -0.016806 0.021042 1.000000 NaN 0.042070 -0.027128 0.016775 0.042438 0.101589 ... -0.009118 NaN 0.018422 0.148280 -0.025100 0.009819 0.069114 0.060236 0.054254 0.069065
EmployeeCount NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
EmployeeNumber -0.010145 -0.050990 0.032916 0.042070 NaN 1.000000 0.017621 0.035179 -0.006888 -0.018519 ... -0.069861 NaN 0.062227 -0.014365 0.023603 0.010309 -0.011240 -0.008416 -0.009019 -0.009197
EnvironmentSatisfaction 0.010146 0.018355 -0.016075 -0.027128 NaN 0.017621 1.000000 -0.049857 -0.008278 0.001212 ... 0.007665 NaN 0.003432 -0.002693 -0.019359 0.027627 0.001458 0.018007 0.016194 -0.004999
HourlyRate 0.024287 0.023381 0.031131 0.016775 NaN 0.035179 -0.049857 1.000000 0.042861 -0.027853 ... 0.001330 NaN 0.050263 -0.002334 -0.008548 -0.004607 -0.019582 -0.024106 -0.026716 -0.020123
JobInvolvement 0.029820 0.046135 0.008783 0.042438 NaN -0.006888 -0.008278 0.042861 1.000000 -0.012630 ... 0.034297 NaN 0.021523 -0.005533 -0.015338 -0.014617 -0.021355 0.008717 -0.024184 0.025976
JobLevel 0.509604 0.002966 0.005303 0.101589 NaN -0.018519 0.001212 -0.027853 -0.012630 1.000000 ... 0.021642 NaN 0.013984 0.782208 -0.018191 0.037818 0.534739 0.389447 0.353885 0.375281
JobSatisfaction -0.004892 0.030571 -0.003669 -0.011296 NaN -0.046247 -0.006784 -0.071335 -0.021476 -0.001944 ... -0.012454 NaN 0.010690 -0.020185 -0.005779 -0.019459 -0.003803 -0.002305 -0.018214 -0.027656
MonthlyIncome 0.497855 0.007707 -0.017014 0.094961 NaN -0.014829 -0.006259 -0.015794 -0.015271 0.950300 ... 0.025873 NaN 0.005408 0.772893 -0.021736 0.030683 0.514285 0.363818 0.344978 0.344079
MonthlyRate 0.028051 -0.032182 0.027473 -0.026084 NaN 0.012648 0.037600 -0.015297 -0.016322 0.039563 ... -0.004085 NaN -0.034323 0.026442 0.001467 0.007963 -0.023655 -0.012815 0.001567 -0.036746
NumCompaniesWorked 0.299635 0.038153 -0.029251 0.126317 NaN -0.001251 0.012594 0.022157 0.015012 0.142501 ... 0.052733 NaN 0.030075 0.237639 -0.066054 -0.008366 -0.118421 -0.090754 -0.036814 -0.110319
PercentSalaryHike 0.003634 0.022704 0.040235 -0.011111 NaN -0.012944 -0.031701 -0.009062 -0.017205 -0.034730 ... -0.040490 NaN 0.007528 -0.020608 -0.005221 -0.003280 -0.035991 -0.001520 -0.022154 -0.011985
PerformanceRating 0.001904 0.000473 0.027110 -0.024539 NaN -0.020359 -0.029548 -0.002172 -0.029071 -0.021222 ... -0.031351 NaN 0.003506 0.006744 -0.015579 0.002572 0.003435 0.034986 0.017896 0.022827
RelationshipSatisfaction 0.053535 0.007846 0.006557 -0.009118 NaN -0.069861 0.007665 0.001330 0.034297 0.021642 ... 1.000000 NaN -0.045952 0.024054 0.002497 0.019604 0.019367 -0.015123 0.033493 -0.000867
StandardHours NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
StockOptionLevel 0.037510 0.042143 0.044872 0.018422 NaN 0.062227 0.003432 0.050263 0.021523 0.013984 ... -0.045952 NaN 1.000000 0.010136 0.011274 0.004129 0.015058 0.050818 0.014352 0.024698
TotalWorkingYears 0.680381 0.014515 0.004628 0.148280 NaN -0.014365 -0.002693 -0.002334 -0.005533 0.782208 ... 0.024054 NaN 0.010136 1.000000 -0.035662 0.001008 0.628133 0.460365 0.404858 0.459188
TrainingTimesLastYear -0.019621 0.002453 -0.036942 -0.025100 NaN 0.023603 -0.019359 -0.008548 -0.015338 -0.018191 ... 0.002497 NaN 0.011274 -0.035662 1.000000 0.028072 0.003569 -0.005738 -0.002067 -0.004096
WorkLifeBalance -0.021490 -0.037848 -0.026556 0.009819 NaN 0.010309 0.027627 -0.004607 -0.014617 0.037818 ... 0.019604 NaN 0.004129 0.001008 0.028072 1.000000 0.012089 0.049856 0.008941 0.002759
YearsAtCompany 0.311309 -0.034055 0.009508 0.069114 NaN -0.011240 0.001458 -0.019582 -0.021355 0.534739 ... 0.019367 NaN 0.015058 0.628133 0.003569 0.012089 1.000000 0.758754 0.618409 0.769212
YearsInCurrentRole 0.212901 0.009932 0.018845 0.060236 NaN -0.008416 0.018007 -0.024106 0.008717 0.389447 ... -0.015123 NaN 0.050818 0.460365 -0.005738 0.049856 0.758754 1.000000 0.548056 0.714365
YearsSinceLastPromotion 0.216513 -0.033229 0.010029 0.054254 NaN -0.009019 0.016194 -0.026716 -0.024184 0.353885 ... 0.033493 NaN 0.014352 0.404858 -0.002067 0.008941 0.618409 0.548056 1.000000 0.510224
YearsWithCurrManager 0.202089 -0.026363 0.014406 0.069065 NaN -0.009197 -0.004999 -0.020123 0.025976 0.375281 ... -0.000867 NaN 0.024698 0.459188 -0.004096 0.002759 0.769212 0.714365 0.510224 1.000000

26 rows × 26 columns

10. 行の情報¶

df.index¶

In [18]:
df.index
Out[18]:
RangeIndex(start=0, stop=1470, step=1)

11. 列の情報¶

df.columns¶

In [19]:
df.columns
Out[19]:
Index(['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department',
       'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount',
       'EmployeeNumber', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate',
       'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction',
       'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
       'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating',
       'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel',
       'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance',
       'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',
       'YearsWithCurrManager'],
      dtype='object')

12. データフレームを配列に変換する¶

df.values¶

In [20]:
df.values
Out[20]:
array([[41, 'Yes', 'Travel_Rarely', ..., 4, 0, 5],
       [49, 'No', 'Travel_Frequently', ..., 7, 1, 7],
       [37, 'Yes', 'Travel_Rarely', ..., 0, 0, 0],
       ..., 
       [27, 'No', 'Travel_Rarely', ..., 2, 0, 3],
       [49, 'No', 'Travel_Frequently', ..., 6, 0, 8],
       [34, 'No', 'Travel_Rarely', ..., 3, 1, 2]], dtype=object)

13. 統計量の概要をまとめて表示する¶

df.describe()¶

In [21]:
df.describe()
Out[21]:
Age DailyRate DistanceFromHome Education EmployeeCount EmployeeNumber EnvironmentSatisfaction HourlyRate JobInvolvement JobLevel ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
count 1470.000000 1470.000000 1470.000000 1470.000000 1470.0 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 ... 1470.000000 1470.0 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000
mean 36.923810 802.485714 9.192517 2.912925 1.0 1024.865306 2.721769 65.891156 2.729932 2.063946 ... 2.712245 80.0 0.793878 11.279592 2.799320 2.761224 7.008163 4.229252 2.187755 4.123129
std 9.135373 403.509100 8.106864 1.024165 0.0 602.024335 1.093082 20.329428 0.711561 1.106940 ... 1.081209 0.0 0.852077 7.780782 1.289271 0.706476 6.126525 3.623137 3.222430 3.568136
min 18.000000 102.000000 1.000000 1.000000 1.0 1.000000 1.000000 30.000000 1.000000 1.000000 ... 1.000000 80.0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000
25% 30.000000 465.000000 2.000000 2.000000 1.0 491.250000 2.000000 48.000000 2.000000 1.000000 ... 2.000000 80.0 0.000000 6.000000 2.000000 2.000000 3.000000 2.000000 0.000000 2.000000
50% 36.000000 802.000000 7.000000 3.000000 1.0 1020.500000 3.000000 66.000000 3.000000 2.000000 ... 3.000000 80.0 1.000000 10.000000 3.000000 3.000000 5.000000 3.000000 1.000000 3.000000
75% 43.000000 1157.000000 14.000000 4.000000 1.0 1555.750000 4.000000 83.750000 3.000000 3.000000 ... 4.000000 80.0 1.000000 15.000000 3.000000 3.000000 9.000000 7.000000 3.000000 7.000000
max 60.000000 1499.000000 29.000000 5.000000 1.0 2068.000000 4.000000 100.000000 4.000000 5.000000 ... 4.000000 80.0 3.000000 40.000000 6.000000 4.000000 40.000000 18.000000 15.000000 17.000000

8 rows × 26 columns

14. 先頭10行を確認する¶

df.head(10)¶

In [22]:
df.head(10) 
Out[22]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2
5 32 No Travel_Frequently 1005 Research & Development 2 2 Life Sciences 1 8 ... 3 80 0 8 2 2 7 7 3 6
6 59 No Travel_Rarely 1324 Research & Development 3 3 Medical 1 10 ... 1 80 3 12 3 2 1 0 0 0
7 30 No Travel_Rarely 1358 Research & Development 24 1 Life Sciences 1 11 ... 2 80 1 1 2 3 1 0 0 0
8 38 No Travel_Frequently 216 Research & Development 23 3 Life Sciences 1 12 ... 2 80 0 10 2 3 9 7 1 8
9 36 No Travel_Rarely 1299 Research & Development 27 3 Medical 1 13 ... 2 80 2 17 3 2 7 7 7 7

10 rows × 35 columns

15. 末尾10行を確認する¶

df.tail(10)¶

In [23]:
df.tail(10)
Out[23]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
1460 29 No Travel_Rarely 468 Research & Development 28 4 Medical 1 2054 ... 2 80 0 5 3 1 5 4 0 4
1461 50 Yes Travel_Rarely 410 Sales 28 3 Marketing 1 2055 ... 2 80 1 20 3 3 3 2 2 0
1462 39 No Travel_Rarely 722 Sales 24 1 Marketing 1 2056 ... 1 80 1 21 2 2 20 9 9 6
1463 31 No Non-Travel 325 Research & Development 5 3 Medical 1 2057 ... 2 80 0 10 2 3 9 4 1 7
1464 26 No Travel_Rarely 1167 Sales 5 3 Other 1 2060 ... 4 80 0 5 2 3 4 2 0 0
1465 36 No Travel_Frequently 884 Research & Development 23 2 Medical 1 2061 ... 3 80 1 17 3 3 5 2 0 3
1466 39 No Travel_Rarely 613 Research & Development 6 1 Medical 1 2062 ... 1 80 1 9 5 3 7 7 1 7
1467 27 No Travel_Rarely 155 Research & Development 4 3 Life Sciences 1 2064 ... 2 80 1 6 0 3 6 2 0 3
1468 49 No Travel_Frequently 1023 Sales 2 3 Medical 1 2065 ... 4 80 0 17 3 2 9 6 0 8
1469 34 No Travel_Rarely 628 Research & Development 8 3 Medical 1 2068 ... 1 80 0 6 3 4 4 3 1 2

10 rows × 35 columns

16. 列を指定し、シリーズにする¶

df["列名"]¶

In [24]:
df["Department"]
Out[24]:
0                        Sales
1       Research & Development
2       Research & Development
3       Research & Development
4       Research & Development
5       Research & Development
6       Research & Development
7       Research & Development
8       Research & Development
9       Research & Development
10      Research & Development
11      Research & Development
12      Research & Development
13      Research & Development
14      Research & Development
15      Research & Development
16      Research & Development
17      Research & Development
18                       Sales
19      Research & Development
20      Research & Development
21                       Sales
22      Research & Development
23      Research & Development
24      Research & Development
25      Research & Development
26      Research & Development
27                       Sales
28      Research & Development
29                       Sales
                 ...          
1440    Research & Development
1441    Research & Development
1442    Research & Development
1443    Research & Development
1444    Research & Development
1445    Research & Development
1446                     Sales
1447                     Sales
1448                     Sales
1449    Research & Development
1450           Human Resources
1451                     Sales
1452                     Sales
1453                     Sales
1454                     Sales
1455    Research & Development
1456    Research & Development
1457    Research & Development
1458    Research & Development
1459    Research & Development
1460    Research & Development
1461                     Sales
1462                     Sales
1463    Research & Development
1464                     Sales
1465    Research & Development
1466    Research & Development
1467    Research & Development
1468                     Sales
1469    Research & Development
Name: Department, Length: 1470, dtype: object

17. 複数の列を指定し、データフレームにする¶

df[["列名1","列名2", ...]]¶

In [25]:
df[["Department","Education"]]
Out[25]:
Department Education
0 Sales 2
1 Research & Development 1
2 Research & Development 2
3 Research & Development 4
4 Research & Development 1
5 Research & Development 2
6 Research & Development 3
7 Research & Development 1
8 Research & Development 3
9 Research & Development 3
10 Research & Development 3
11 Research & Development 2
12 Research & Development 1
13 Research & Development 2
14 Research & Development 3
15 Research & Development 4
16 Research & Development 2
17 Research & Development 2
18 Sales 4
19 Research & Development 3
20 Research & Development 2
21 Sales 4
22 Research & Development 4
23 Research & Development 2
24 Research & Development 1
25 Research & Development 3
26 Research & Development 1
27 Sales 4
28 Research & Development 4
29 Sales 4
... ... ...
1440 Research & Development 2
1441 Research & Development 4
1442 Research & Development 4
1443 Research & Development 3
1444 Research & Development 2
1445 Research & Development 4
1446 Sales 3
1447 Sales 4
1448 Sales 3
1449 Research & Development 3
1450 Human Resources 4
1451 Sales 2
1452 Sales 4
1453 Sales 4
1454 Sales 3
1455 Research & Development 4
1456 Research & Development 4
1457 Research & Development 4
1458 Research & Development 4
1459 Research & Development 2
1460 Research & Development 4
1461 Sales 3
1462 Sales 1
1463 Research & Development 3
1464 Sales 3
1465 Research & Development 2
1466 Research & Development 1
1467 Research & Development 3
1468 Sales 3
1469 Research & Development 3

1470 rows × 2 columns

In [26]:
# 複数列を選択する場合にはリスト表記を使う
df.loc[:, ["Department", "Education"]]
Out[26]:
Department Education
0 Sales 2
1 Research & Development 1
2 Research & Development 2
3 Research & Development 4
4 Research & Development 1
5 Research & Development 2
6 Research & Development 3
7 Research & Development 1
8 Research & Development 3
9 Research & Development 3
10 Research & Development 3
11 Research & Development 2
12 Research & Development 1
13 Research & Development 2
14 Research & Development 3
15 Research & Development 4
16 Research & Development 2
17 Research & Development 2
18 Sales 4
19 Research & Development 3
20 Research & Development 2
21 Sales 4
22 Research & Development 4
23 Research & Development 2
24 Research & Development 1
25 Research & Development 3
26 Research & Development 1
27 Sales 4
28 Research & Development 4
29 Sales 4
... ... ...
1440 Research & Development 2
1441 Research & Development 4
1442 Research & Development 4
1443 Research & Development 3
1444 Research & Development 2
1445 Research & Development 4
1446 Sales 3
1447 Sales 4
1448 Sales 3
1449 Research & Development 3
1450 Human Resources 4
1451 Sales 2
1452 Sales 4
1453 Sales 4
1454 Sales 3
1455 Research & Development 4
1456 Research & Development 4
1457 Research & Development 4
1458 Research & Development 4
1459 Research & Development 2
1460 Research & Development 4
1461 Sales 3
1462 Sales 1
1463 Research & Development 3
1464 Sales 3
1465 Research & Development 2
1466 Research & Development 1
1467 Research & Development 3
1468 Sales 3
1469 Research & Development 3

1470 rows × 2 columns

18. locを使って列選択し、シリーズやデータフレームを得る¶

  • 文法 :iloc[rows, columns]の形で書く
  • 列だけでなく行も同時にSubsettingできる
In [27]:
# 行は全てを選択するために「:」を入れている。
df.loc[:,"Department"]
Out[27]:
0                        Sales
1       Research & Development
2       Research & Development
3       Research & Development
4       Research & Development
5       Research & Development
6       Research & Development
7       Research & Development
8       Research & Development
9       Research & Development
10      Research & Development
11      Research & Development
12      Research & Development
13      Research & Development
14      Research & Development
15      Research & Development
16      Research & Development
17      Research & Development
18                       Sales
19      Research & Development
20      Research & Development
21                       Sales
22      Research & Development
23      Research & Development
24      Research & Development
25      Research & Development
26      Research & Development
27                       Sales
28      Research & Development
29                       Sales
                 ...          
1440    Research & Development
1441    Research & Development
1442    Research & Development
1443    Research & Development
1444    Research & Development
1445    Research & Development
1446                     Sales
1447                     Sales
1448                     Sales
1449    Research & Development
1450           Human Resources
1451                     Sales
1452                     Sales
1453                     Sales
1454                     Sales
1455    Research & Development
1456    Research & Development
1457    Research & Development
1458    Research & Development
1459    Research & Development
1460    Research & Development
1461                     Sales
1462                     Sales
1463    Research & Development
1464                     Sales
1465    Research & Development
1466    Research & Development
1467    Research & Development
1468                     Sales
1469    Research & Development
Name: Department, Length: 1470, dtype: object

19. ilocを使って列選択し、シリーズやデータフレームを得る¶

  • 文法 :iloc[rows番号, columns番号]の形で書く
In [28]:
# 番号で選択
df.iloc[:, 0]
Out[28]:
0       41
1       49
2       37
3       33
4       27
5       32
6       59
7       30
8       38
9       36
10      35
11      29
12      31
13      34
14      28
15      29
16      32
17      22
18      53
19      38
20      24
21      36
22      34
23      21
24      34
25      53
26      32
27      42
28      44
29      46
        ..
1440    36
1441    56
1442    29
1443    42
1444    56
1445    41
1446    34
1447    36
1448    41
1449    32
1450    35
1451    38
1452    50
1453    36
1454    45
1455    40
1456    35
1457    40
1458    35
1459    29
1460    29
1461    50
1462    39
1463    31
1464    26
1465    36
1466    39
1467    27
1468    49
1469    34
Name: Age, Length: 1470, dtype: int64
In [29]:
#複数で連番の場合。リスト表記でも行ける
df.iloc[:, 0:2]
Out[29]:
Age Attrition
0 41 Yes
1 49 No
2 37 Yes
3 33 No
4 27 No
5 32 No
6 59 No
7 30 No
8 38 No
9 36 No
10 35 No
11 29 No
12 31 No
13 34 No
14 28 Yes
15 29 No
16 32 No
17 22 No
18 53 No
19 38 No
20 24 No
21 36 Yes
22 34 No
23 21 No
24 34 Yes
25 53 No
26 32 Yes
27 42 No
28 44 No
29 46 No
... ... ...
1440 36 No
1441 56 No
1442 29 Yes
1443 42 No
1444 56 Yes
1445 41 No
1446 34 No
1447 36 No
1448 41 No
1449 32 No
1450 35 No
1451 38 No
1452 50 Yes
1453 36 No
1454 45 No
1455 40 No
1456 35 No
1457 40 No
1458 35 No
1459 29 No
1460 29 No
1461 50 Yes
1462 39 No
1463 31 No
1464 26 No
1465 36 No
1466 39 No
1467 27 No
1468 49 No
1469 34 No

1470 rows × 2 columns

20. データフレームの行列を反転(転置)する¶

df.T¶

In [30]:
df.T
Out[30]:
0 1 2 3 4 5 6 7 8 9 ... 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469
Age 41 49 37 33 27 32 59 30 38 36 ... 29 50 39 31 26 36 39 27 49 34
Attrition Yes No Yes No No No No No No No ... No Yes No No No No No No No No
BusinessTravel Travel_Rarely Travel_Frequently Travel_Rarely Travel_Frequently Travel_Rarely Travel_Frequently Travel_Rarely Travel_Rarely Travel_Frequently Travel_Rarely ... Travel_Rarely Travel_Rarely Travel_Rarely Non-Travel Travel_Rarely Travel_Frequently Travel_Rarely Travel_Rarely Travel_Frequently Travel_Rarely
DailyRate 1102 279 1373 1392 591 1005 1324 1358 216 1299 ... 468 410 722 325 1167 884 613 155 1023 628
Department Sales Research & Development Research & Development Research & Development Research & Development Research & Development Research & Development Research & Development Research & Development Research & Development ... Research & Development Sales Sales Research & Development Sales Research & Development Research & Development Research & Development Sales Research & Development
DistanceFromHome 1 8 2 3 2 2 3 24 23 27 ... 28 28 24 5 5 23 6 4 2 8
Education 2 1 2 4 1 2 3 1 3 3 ... 4 3 1 3 3 2 1 3 3 3
EducationField Life Sciences Life Sciences Other Life Sciences Medical Life Sciences Medical Life Sciences Life Sciences Medical ... Medical Marketing Marketing Medical Other Medical Medical Life Sciences Medical Medical
EmployeeCount 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
EmployeeNumber 1 2 4 5 7 8 10 11 12 13 ... 2054 2055 2056 2057 2060 2061 2062 2064 2065 2068
EnvironmentSatisfaction 2 3 4 4 1 4 3 4 4 3 ... 4 4 2 2 4 3 4 2 4 2
Gender Female Male Male Female Male Male Female Male Male Male ... Female Male Female Male Female Male Male Male Male Male
HourlyRate 94 61 92 56 40 79 81 67 44 94 ... 73 39 60 74 30 41 42 87 63 82
JobInvolvement 3 2 2 3 3 3 4 3 2 3 ... 2 2 2 3 2 4 2 4 2 4
JobLevel 2 2 1 1 1 1 1 1 3 2 ... 1 3 4 2 1 2 3 2 2 2
JobRole Sales Executive Research Scientist Laboratory Technician Research Scientist Laboratory Technician Laboratory Technician Laboratory Technician Laboratory Technician Manufacturing Director Healthcare Representative ... Research Scientist Sales Executive Sales Executive Manufacturing Director Sales Representative Laboratory Technician Healthcare Representative Manufacturing Director Sales Executive Laboratory Technician
JobSatisfaction 4 2 3 3 2 4 1 3 3 3 ... 1 1 4 1 3 4 1 2 2 3
MaritalStatus Single Married Single Married Married Single Married Divorced Single Married ... Single Divorced Married Single Single Married Married Married Married Married
MonthlyIncome 5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ... 3785 10854 12031 9936 2966 2571 9991 6142 5390 4404
MonthlyRate 19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ... 8489 16586 8828 3787 21378 12290 21457 5174 13243 10228
NumCompaniesWorked 8 1 6 1 9 0 4 1 0 6 ... 1 4 0 0 0 4 4 1 2 2
Over18 Y Y Y Y Y Y Y Y Y Y ... Y Y Y Y Y Y Y Y Y Y
OverTime Yes No Yes Yes No No Yes No No No ... No Yes No No No No No Yes No No
PercentSalaryHike 11 23 15 11 12 13 20 22 21 13 ... 14 13 11 19 18 17 15 20 14 12
PerformanceRating 3 4 3 3 3 3 4 4 4 3 ... 3 3 3 3 3 3 3 4 3 3
RelationshipSatisfaction 1 4 2 3 4 3 1 2 2 2 ... 2 2 1 2 4 3 1 2 4 1
StandardHours 80 80 80 80 80 80 80 80 80 80 ... 80 80 80 80 80 80 80 80 80 80
StockOptionLevel 0 1 0 0 1 0 3 1 0 2 ... 0 1 1 0 0 1 1 1 0 0
TotalWorkingYears 8 10 7 8 6 8 12 1 10 17 ... 5 20 21 10 5 17 9 6 17 6
TrainingTimesLastYear 0 3 3 3 3 2 3 2 2 3 ... 3 3 2 2 2 3 5 0 3 3
WorkLifeBalance 1 3 3 3 3 2 2 3 3 2 ... 1 3 2 3 3 3 3 3 2 4
YearsAtCompany 6 10 0 8 2 7 1 1 9 7 ... 5 3 20 9 4 5 7 6 9 4
YearsInCurrentRole 4 7 0 7 2 7 0 0 7 7 ... 4 2 9 4 2 2 7 2 6 3
YearsSinceLastPromotion 0 1 0 3 2 3 0 0 1 7 ... 0 2 9 1 0 0 1 0 0 1
YearsWithCurrManager 5 7 0 0 2 6 0 0 8 7 ... 4 0 6 7 0 3 7 3 8 2

35 rows × 1470 columns

21. 任意の軸でソートする。¶

  • 例えば、ラベルを降順でソート。
In [31]:
df.sort_index(axis=1, ascending=False)
Out[31]:
YearsWithCurrManager YearsSinceLastPromotion YearsInCurrentRole YearsAtCompany WorkLifeBalance TrainingTimesLastYear TotalWorkingYears StockOptionLevel StandardHours RelationshipSatisfaction ... EmployeeNumber EmployeeCount EducationField Education DistanceFromHome Department DailyRate BusinessTravel Attrition Age
0 5 0 4 6 1 0 8 0 80 1 ... 1 1 Life Sciences 2 1 Sales 1102 Travel_Rarely Yes 41
1 7 1 7 10 3 3 10 1 80 4 ... 2 1 Life Sciences 1 8 Research & Development 279 Travel_Frequently No 49
2 0 0 0 0 3 3 7 0 80 2 ... 4 1 Other 2 2 Research & Development 1373 Travel_Rarely Yes 37
3 0 3 7 8 3 3 8 0 80 3 ... 5 1 Life Sciences 4 3 Research & Development 1392 Travel_Frequently No 33
4 2 2 2 2 3 3 6 1 80 4 ... 7 1 Medical 1 2 Research & Development 591 Travel_Rarely No 27
5 6 3 7 7 2 2 8 0 80 3 ... 8 1 Life Sciences 2 2 Research & Development 1005 Travel_Frequently No 32
6 0 0 0 1 2 3 12 3 80 1 ... 10 1 Medical 3 3 Research & Development 1324 Travel_Rarely No 59
7 0 0 0 1 3 2 1 1 80 2 ... 11 1 Life Sciences 1 24 Research & Development 1358 Travel_Rarely No 30
8 8 1 7 9 3 2 10 0 80 2 ... 12 1 Life Sciences 3 23 Research & Development 216 Travel_Frequently No 38
9 7 7 7 7 2 3 17 2 80 2 ... 13 1 Medical 3 27 Research & Development 1299 Travel_Rarely No 36
10 3 0 4 5 3 5 6 1 80 3 ... 14 1 Medical 3 16 Research & Development 809 Travel_Rarely No 35
11 8 0 5 9 3 3 10 0 80 4 ... 15 1 Life Sciences 2 15 Research & Development 153 Travel_Rarely No 29
12 3 4 2 5 2 1 5 1 80 4 ... 16 1 Life Sciences 1 26 Research & Development 670 Travel_Rarely No 31
13 2 1 2 2 3 2 3 1 80 3 ... 18 1 Medical 2 19 Research & Development 1346 Travel_Rarely No 34
14 3 0 2 4 3 4 6 0 80 2 ... 19 1 Life Sciences 3 24 Research & Development 103 Travel_Rarely Yes 28
15 8 8 9 10 3 1 10 1 80 3 ... 20 1 Life Sciences 4 21 Research & Development 1389 Travel_Rarely No 29
16 5 0 2 6 2 5 7 2 80 4 ... 21 1 Life Sciences 2 5 Research & Development 334 Travel_Rarely No 32
17 0 0 0 1 2 2 1 2 80 2 ... 22 1 Medical 2 16 Research & Development 1123 Non-Travel No 22
18 7 3 8 25 3 3 31 0 80 3 ... 23 1 Life Sciences 4 2 Sales 1219 Travel_Rarely No 53
19 2 1 2 3 3 3 6 0 80 3 ... 24 1 Life Sciences 3 2 Research & Development 371 Travel_Rarely No 38
20 3 1 2 4 2 5 5 1 80 4 ... 26 1 Other 2 11 Research & Development 673 Non-Travel No 24
21 3 0 3 5 3 4 10 0 80 2 ... 27 1 Life Sciences 4 9 Sales 1218 Travel_Rarely Yes 36
22 11 2 6 12 3 4 13 0 80 3 ... 28 1 Life Sciences 4 7 Research & Development 419 Travel_Rarely No 34
23 0 0 0 0 3 6 0 0 80 4 ... 30 1 Life Sciences 2 15 Research & Development 391 Travel_Rarely No 21
24 3 1 2 4 3 2 8 0 80 3 ... 31 1 Medical 1 6 Research & Development 699 Travel_Rarely Yes 34
25 8 4 13 14 2 3 26 1 80 4 ... 32 1 Other 3 5 Research & Development 1282 Travel_Rarely No 53
26 7 6 2 10 3 5 10 0 80 2 ... 33 1 Life Sciences 1 16 Research & Development 1125 Travel_Frequently Yes 32
27 2 4 7 9 3 2 10 1 80 4 ... 35 1 Marketing 4 8 Sales 691 Travel_Rarely No 42
28 17 5 6 22 3 4 24 1 80 4 ... 36 1 Medical 4 7 Research & Development 477 Travel_Rarely No 44
29 1 2 2 2 2 2 22 0 80 4 ... 38 1 Marketing 4 2 Sales 705 Travel_Rarely No 46
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1440 2 0 2 4 3 3 18 3 80 2 ... 2025 1 Life Sciences 2 4 Research & Development 688 Travel_Frequently No 36
1441 9 1 12 13 2 2 13 1 80 1 ... 2026 1 Life Sciences 4 1 Research & Development 667 Non-Travel No 56
1442 2 2 2 2 4 3 4 3 80 2 ... 2027 1 Medical 4 1 Research & Development 1092 Travel_Rarely Yes 29
1443 14 4 6 22 2 2 24 0 80 1 ... 2031 1 Life Sciences 3 2 Research & Development 300 Travel_Rarely No 42
1444 8 9 9 10 1 4 14 1 80 4 ... 2032 1 Technical Degree 2 7 Research & Development 310 Travel_Rarely Yes 56
1445 10 0 7 20 3 3 21 1 80 3 ... 2034 1 Life Sciences 4 28 Research & Development 582 Travel_Rarely No 41
1446 7 1 7 8 3 2 8 2 80 4 ... 2035 1 Marketing 3 28 Sales 704 Travel_Rarely No 34
1447 11 11 12 15 2 4 15 1 80 1 ... 2036 1 Marketing 4 15 Sales 301 Non-Travel No 36
1448 4 0 4 5 3 5 14 1 80 3 ... 2037 1 Life Sciences 3 3 Sales 930 Travel_Rarely No 41
1449 2 1 2 4 3 4 4 0 80 4 ... 2038 1 Technical Degree 3 2 Research & Development 529 Travel_Rarely No 32
1450 7 1 0 9 3 2 9 0 80 3 ... 2040 1 Life Sciences 4 26 Human Resources 1146 Travel_Rarely No 35
1451 9 1 7 10 3 1 10 1 80 3 ... 2041 1 Life Sciences 2 10 Sales 345 Travel_Rarely No 38
1452 1 0 3 6 3 3 12 2 80 4 ... 2044 1 Life Sciences 4 1 Sales 878 Travel_Frequently Yes 50
1453 0 0 3 6 2 2 8 1 80 1 ... 2045 1 Marketing 4 11 Sales 1120 Travel_Rarely No 36
1454 1 0 3 5 3 3 8 0 80 3 ... 2046 1 Life Sciences 3 20 Sales 374 Travel_Rarely No 45
1455 2 2 2 2 3 2 8 0 80 4 ... 2048 1 Life Sciences 4 2 Research & Development 1322 Travel_Rarely No 40
1456 2 0 2 10 4 2 10 2 80 4 ... 2049 1 Life Sciences 4 18 Research & Development 1199 Travel_Frequently No 35
1457 2 0 3 5 3 2 20 3 80 2 ... 2051 1 Medical 4 2 Research & Development 1194 Travel_Rarely No 40
1458 1 1 3 4 3 5 4 1 80 4 ... 2052 1 Life Sciences 4 1 Research & Development 287 Travel_Rarely No 35
1459 3 0 3 4 3 2 10 1 80 1 ... 2053 1 Other 2 13 Research & Development 1378 Travel_Rarely No 29
1460 4 0 4 5 1 3 5 0 80 2 ... 2054 1 Medical 4 28 Research & Development 468 Travel_Rarely No 29
1461 0 2 2 3 3 3 20 1 80 2 ... 2055 1 Marketing 3 28 Sales 410 Travel_Rarely Yes 50
1462 6 9 9 20 2 2 21 1 80 1 ... 2056 1 Marketing 1 24 Sales 722 Travel_Rarely No 39
1463 7 1 4 9 3 2 10 0 80 2 ... 2057 1 Medical 3 5 Research & Development 325 Non-Travel No 31
1464 0 0 2 4 3 2 5 0 80 4 ... 2060 1 Other 3 5 Sales 1167 Travel_Rarely No 26
1465 3 0 2 5 3 3 17 1 80 3 ... 2061 1 Medical 2 23 Research & Development 884 Travel_Frequently No 36
1466 7 1 7 7 3 5 9 1 80 1 ... 2062 1 Medical 1 6 Research & Development 613 Travel_Rarely No 39
1467 3 0 2 6 3 0 6 1 80 2 ... 2064 1 Life Sciences 3 4 Research & Development 155 Travel_Rarely No 27
1468 8 0 6 9 2 3 17 0 80 4 ... 2065 1 Medical 3 2 Sales 1023 Travel_Frequently No 49
1469 2 1 3 4 4 3 6 0 80 1 ... 2068 1 Medical 3 8 Research & Development 628 Travel_Rarely No 34

1470 rows × 35 columns

In [32]:
# ラベル「Age」の値で昇順で。
df2 = df.sort_values(by=["Age"], ascending=True)
df2.head()
Out[32]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
1311 18 No Non-Travel 1431 Research & Development 14 3 Medical 1 1839 ... 3 80 0 0 4 1 0 0 0 0
457 18 Yes Travel_Frequently 1306 Sales 5 3 Marketing 1 614 ... 4 80 0 0 3 3 0 0 0 0
972 18 No Non-Travel 1124 Research & Development 1 3 Life Sciences 1 1368 ... 3 80 0 0 5 4 0 0 0 0
301 18 No Travel_Rarely 812 Sales 10 3 Medical 1 411 ... 1 80 0 0 2 3 0 0 0 0
296 18 Yes Travel_Rarely 230 Research & Development 3 3 Life Sciences 1 405 ... 3 80 0 0 2 3 0 0 0 0

5 rows × 35 columns

In [33]:
# 部門名
set(df.Department.tolist())
Out[33]:
{'Human Resources', 'Research & Development', 'Sales'}

22. 特定の列を削除する¶

  • 列 A を削除

    df.drop("A", axis=1, inplace=True)

23. 特定の行を削除する¶

  • 行 5 を削除

    df.drop(5, inplace=True)

24. データフレームの行列積¶

  • 転置行列 x 行列 の場合

    df.T.dot(df)

In [ ]: