# 多項ロジットモデル

**多項ロジットモデル（multinomial logt model）**
多項ロジットモデルは、被説明変数が取りうる値が3つ以上で、それらに順序がない場合に適用します。

例えば、移動手段（バス・自動車・電車・徒歩・自転車・その他）を選択肢から選ぶ場合や、消費者が購入する商品を、ブランドA、ブランドB、ブランドCから選択する場合などに適用することができます。


`statsmodels`を用いて推定します。
 


ここではスマートフォンを用いて収集されたセンサーデータから、スマートフォンユーザーの移動手段を推定する例を考えます。

このデータは、13人のボランティア（男性10名、女性3名）のスマートフォンから収集したデータで、徒歩、車、静止状態、電車、バスの5クラスで構成されています。

データ出典: http://cs.unibo.it/projects/us-tm2017/index.html



In [1]:
import pandas as pd
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

まずはデータをダウンロードします。

In [2]:
url = "http://cs.unibo.it/projects/us-tm2017/static/dataset/extension/5second/csv/dataset_5secondWindow.tar.gz"
df = pd.read_csv(url, compression='gzip', header=0, sep=',', quotechar='"', on_bad_lines='warn')
#最初の数行を表示させます
pd.set_option('display.max_columns', 100) # すべての列がNotebook上で表示されるように、オプション設定を変更します。
df.head(2)

Unnamed: 0,dataset_5secondWindow.csv,id,time,activityrecognition#0,activityrecognition#1,android.sensor.accelerometer#mean,android.sensor.accelerometer#min,android.sensor.accelerometer#max,android.sensor.accelerometer#std,android.sensor.game_rotation_vector#mean,android.sensor.game_rotation_vector#min,android.sensor.game_rotation_vector#max,android.sensor.game_rotation_vector#std,android.sensor.gravity#mean,android.sensor.gravity#min,android.sensor.gravity#max,android.sensor.gravity#std,android.sensor.gyroscope#mean,android.sensor.gyroscope#min,android.sensor.gyroscope#max,android.sensor.gyroscope#std,android.sensor.gyroscope_uncalibrated#mean,android.sensor.gyroscope_uncalibrated#min,android.sensor.gyroscope_uncalibrated#max,android.sensor.gyroscope_uncalibrated#std,android.sensor.light#mean,android.sensor.light#min,android.sensor.light#max,android.sensor.light#std,android.sensor.linear_acceleration#mean,android.sensor.linear_acceleration#min,android.sensor.linear_acceleration#max,android.sensor.linear_acceleration#std,android.sensor.magnetic_field#mean,android.sensor.magnetic_field#min,android.sensor.magnetic_field#max,android.sensor.magnetic_field#std,android.sensor.magnetic_field_uncalibrated#mean,android.sensor.magnetic_field_uncalibrated#min,android.sensor.magnetic_field_uncalibrated#max,android.sensor.magnetic_field_uncalibrated#std,android.sensor.orientation#mean,android.sensor.orientation#min,android.sensor.orientation#max,android.sensor.orientation#std,android.sensor.pressure#mean,android.sensor.pressure#min,android.sensor.pressure#max,android.sensor.pressure#std,android.sensor.proximity#mean,android.sensor.proximity#min,android.sensor.proximity#max,android.sensor.proximity#std,android.sensor.rotation_vector#mean,android.sensor.rotation_vector#min,android.sensor.rotation_vector#max,android.sensor.rotation_vector#std,android.sensor.step_counter#mean,android.sensor.step_counter#min,android.sensor.step_counter#max,android.sensor.step_counter#std,sound#mean,sound#min,sound#max,sound#std,speed#mean,speed#min,speed#max,speed#std,target,user
0,0.0,16170.0,78.0,,100.0,9.811476,9.758895,9.849411,0.014626,0.02934,0.029014,0.029526,0.000119,9.80665,9.806649,9.806651,4.780692e-07,0.001651,0.0,0.003533,0.000737,0.016221,0.014172,0.018695,0.000982,0.0,0.0,0.0,0.0,0.020978,0.002495,0.05241,0.011045,57.099638,56.690387,57.57595,0.177549,51.363566,51.199707,51.539208,0.080899,354.286933,353.598335,354.942707,0.245676,1004.090261,1004.0554,1004.1279,0.017416,8.0,8.0,8.0,,0.050413,0.044777,0.056351,0.002109,28966.0,28966.0,28966.0,,,,,,0.0,0.0,0.0,0.0,Still,U12
1,1.0,15871.0,145.0,,100.0,9.939207,7.707437,17.146631,1.775944,0.999925,0.999903,0.999946,3e-05,9.806624,9.806624,9.806624,6.474977e-07,0.036326,0.011669,0.059388,0.02029,0.039023,0.014132,0.085494,0.018629,0.0,0.0,0.0,,0.87922,0.641117,1.18581,0.27873,29.351288,28.172505,30.386017,0.921547,82.76776,82.40989,83.12563,0.506105,332.695577,330.461054,339.108607,1.705816,1008.27466,1008.27466,1008.27466,,,,,,0.999981,0.999963,0.999999,2.6e-05,,,,,89.20021,89.065143,89.335277,0.191013,16.539349,16.539349,16.539349,0.628595,Car,U12


In [3]:
# カラム名をわかりやすさのため変更します
df.columns = df.columns.str.replace('android.sensor.','').str.replace('#','_')

  df.columns = df.columns.str.replace('android.sensor.','').str.replace('#','_')


ここでは、移動手段（Car, Still, Train, Bus, Walking）を次の変数を用いて推定するモデルを考えます。

* `accelerometer_mean`
* `game_rotation_vector_mean`
* `gyroscope_mean`
* `linear_acceleration_mean`
* `orientation_mean` 
* `pressure_mean`
* `rotation_vector_mean`
* `sound_mean`

上の変数の欠損している行を分析から除外します。

In [4]:
df = df[[
    'target','accelerometer_mean','game_rotation_vector_mean',
    'gyroscope_mean','linear_acceleration_mean','orientation_mean', 
    'pressure_mean','rotation_vector_mean','sound_mean'
]]
df = df.dropna()

In [5]:
df.describe()

Unnamed: 0,accelerometer_mean,game_rotation_vector_mean,gyroscope_mean,linear_acceleration_mean,orientation_mean,pressure_mean,rotation_vector_mean,sound_mean
count,1255.0,1255.0,1255.0,1255.0,1255.0,1255.0,1255.0,1255.0
mean,9.996223,0.732083,0.325744,1.507555,208.533199,1020.577142,0.768484,75.202876
std,0.720097,0.246679,0.496171,1.890074,92.273671,15.173493,0.224416,12.239499
min,7.369055,0.042684,0.001164,0.009447,17.13536,946.33984,0.120813,0.0
25%,9.738144,0.570153,0.027239,0.272564,135.326907,1011.055625,0.668633,71.869349
50%,9.830731,0.763646,0.083699,0.707634,206.208459,1020.669,0.813465,77.707224
75%,10.049428,0.969717,0.512946,2.083757,291.281798,1029.1233,0.959668,82.794433
max,14.661139,0.999999,3.26886,18.14177,396.133498,1058.6099,1.0,90.308734


In [7]:
x = df.drop('target', axis=1)
y = df['target']

In [8]:
x.corr()

Unnamed: 0,accelerometer_mean,game_rotation_vector_mean,gyroscope_mean,linear_acceleration_mean,orientation_mean,pressure_mean,rotation_vector_mean,sound_mean
accelerometer_mean,1.0,0.129695,0.500263,0.441453,0.043532,-0.045848,0.081403,0.138451
game_rotation_vector_mean,0.129695,1.0,0.297029,0.395925,0.215494,-0.144826,0.488242,0.179971
gyroscope_mean,0.500263,0.297029,1.0,0.732075,0.050534,-0.032837,0.169035,0.017961
linear_acceleration_mean,0.441453,0.395925,0.732075,1.0,0.096761,-0.099997,0.226298,0.072577
orientation_mean,0.043532,0.215494,0.050534,0.096761,1.0,0.007241,-0.007124,0.016743
pressure_mean,-0.045848,-0.144826,-0.032837,-0.099997,0.007241,1.0,-0.288926,0.054731
rotation_vector_mean,0.081403,0.488242,0.169035,0.226298,-0.007124,-0.288926,1.0,-0.010011
sound_mean,0.138451,0.179971,0.017961,0.072577,0.016743,0.054731,-0.010011,1.0


In [9]:
y.value_counts().to_frame()

Unnamed: 0,target
Walking,346
Train,311
Car,310
Bus,220
Still,68


In [10]:
x = sm.add_constant(x, prepend = False)
mnlogit_mod = sm.MNLogit(y, x,
                         # method="bfgs", 
                         # maxiter=250
                        )
mnlogit_fit = mnlogit_mod.fit()
print (mnlogit_fit.summary())

Optimization terminated successfully.
         Current function value: 0.488472
         Iterations 11
                          MNLogit Regression Results                          
Dep. Variable:                 target   No. Observations:                 1255
Model:                        MNLogit   Df Residuals:                     1219
Method:                           MLE   Df Model:                           32
Date:                Wed, 17 May 2023   Pseudo R-squ.:                  0.6764
Time:                        11:16:00   Log-Likelihood:                -613.03
converged:                       True   LL-Null:                       -1894.5
Covariance Type:            nonrobust   LLR p-value:                     0.000
               target=Car       coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------------
accelerometer_mean           -1.3405      0.326     -4.117      0.000      -1

多項ロジットモデルでは、選択肢の数をkとするとk-1個のモデルを推定します。
上の出力を見ると、5つの選択肢のうちBusがベース（参照グループ）として設定されているため、出力された結果を見る際には、パラメーターの推定値は、参照グループであるバスに対する相対的なものとなります。

多項ロジットモデルでの一般的な解釈は、説明変数の単位変化に対して、参照グループに対して、モデル内の他の変数が一定である場合、パラメーター推定値の分だけ変化すると期待されることになります。

上の結果で`target=Car`の出力結果では、バスと車の相対比較を示しています。

`accelerometer_mean`の推定された係数は`-1.340`で、モデル内の他の変数が一定である場合、バスに対する車の`accelerometer_mean`の値が1単位増えると、バスより車である可能性は1.340減少すると解釈できます。

同様に、`game_rotation_vector_mean`が1単位増えると、モデル内の他の変数が一定である場合、ユーザーの移動手段はバスであるよりも車である可能性が2.8793減少する可能性があると解釈できます。