多項ロジットモデル

6.3. 多項ロジットモデル#

多項ロジットモデル（multinomial logt model） 多項ロジットモデルは、被説明変数が取りうる値が3つ以上で、それらに順序がない場合に適用します。

例えば、移動手段（バス・自動車・電車・徒歩・自転車・その他）を選択肢から選ぶ場合や、消費者が購入する商品を、ブランドA、ブランドB、ブランドCから選択する場合などに適用することができます。

statsmodelsを用いて推定します。

ここではスマートフォンを用いて収集されたセンサーデータから、スマートフォンユーザーの移動手段を推定する例を考えます。

このデータは、13人のボランティア（男性10名、女性3名）のスマートフォンから収集したデータで、徒歩、車、静止状態、電車、バスの5クラスで構成されています。

データ出典: http://cs.unibo.it/projects/us-tm2017/index.html

import pandas as pd
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

まずはデータをダウンロードします。

url = "http://cs.unibo.it/projects/us-tm2017/static/dataset/extension/5second/csv/dataset_5secondWindow.tar.gz"
df = pd.read_csv(url, compression='gzip', header=0, sep=',', quotechar='"', on_bad_lines='warn')
#最初の数行を表示させます
pd.set_option('display.max_columns', 100) # すべての列がNotebook上で表示されるように、オプション設定を変更します。
df.head(2)

	dataset_5secondWindow.csv	id	time	activityrecognition#0	activityrecognition#1	android.sensor.accelerometer#mean	android.sensor.accelerometer#min	android.sensor.accelerometer#max	android.sensor.accelerometer#std	android.sensor.game_rotation_vector#mean	android.sensor.game_rotation_vector#min	android.sensor.game_rotation_vector#max	android.sensor.game_rotation_vector#std	android.sensor.gravity#mean	android.sensor.gravity#min	android.sensor.gravity#max	android.sensor.gravity#std	android.sensor.gyroscope#mean	android.sensor.gyroscope#min	android.sensor.gyroscope#max	android.sensor.gyroscope#std	android.sensor.gyroscope_uncalibrated#mean	android.sensor.gyroscope_uncalibrated#min	android.sensor.gyroscope_uncalibrated#max	android.sensor.gyroscope_uncalibrated#std	android.sensor.light#mean	android.sensor.light#min	android.sensor.light#max	android.sensor.light#std	android.sensor.linear_acceleration#mean	android.sensor.linear_acceleration#min	android.sensor.linear_acceleration#max	android.sensor.linear_acceleration#std	android.sensor.magnetic_field#mean	android.sensor.magnetic_field#min	android.sensor.magnetic_field#max	android.sensor.magnetic_field#std	android.sensor.magnetic_field_uncalibrated#mean	android.sensor.magnetic_field_uncalibrated#min	android.sensor.magnetic_field_uncalibrated#max	android.sensor.magnetic_field_uncalibrated#std	android.sensor.orientation#mean	android.sensor.orientation#min	android.sensor.orientation#max	android.sensor.orientation#std	android.sensor.pressure#mean	android.sensor.pressure#min	android.sensor.pressure#max	android.sensor.pressure#std	android.sensor.proximity#mean	android.sensor.proximity#min	android.sensor.proximity#max	android.sensor.proximity#std	android.sensor.rotation_vector#mean	android.sensor.rotation_vector#min	android.sensor.rotation_vector#max	android.sensor.rotation_vector#std	android.sensor.step_counter#mean	android.sensor.step_counter#min	android.sensor.step_counter#max	android.sensor.step_counter#std	sound#mean	sound#min	sound#max	sound#std	speed#mean	speed#min	speed#max	speed#std	target	user
0	0.0	16170.0	78.0	NaN	100.0	9.811476	9.758895	9.849411	0.014626	0.029340	0.029014	0.029526	0.000119	9.806650	9.806649	9.806651	4.780692e-07	0.001651	0.000000	0.003533	0.000737	0.016221	0.014172	0.018695	0.000982	0.0	0.0	0.0	0.0	0.020978	0.002495	0.05241	0.011045	57.099638	56.690387	57.575950	0.177549	51.363566	51.199707	51.539208	0.080899	354.286933	353.598335	354.942707	0.245676	1004.090261	1004.05540	1004.12790	0.017416	8.0	8.0	8.0	NaN	0.050413	0.044777	0.056351	0.002109	28966.0	28966.0	28966.0	NaN	NaN	NaN	NaN	NaN	0.000000	0.000000	0.000000	0.000000	Still	U12
1	1.0	15871.0	145.0	NaN	100.0	9.939207	7.707437	17.146631	1.775944	0.999925	0.999903	0.999946	0.000030	9.806624	9.806624	9.806624	6.474977e-07	0.036326	0.011669	0.059388	0.020290	0.039023	0.014132	0.085494	0.018629	0.0	0.0	0.0	NaN	0.879220	0.641117	1.18581	0.278730	29.351288	28.172505	30.386017	0.921547	82.767760	82.409890	83.125630	0.506105	332.695577	330.461054	339.108607	1.705816	1008.274660	1008.27466	1008.27466	NaN	NaN	NaN	NaN	NaN	0.999981	0.999963	0.999999	0.000026	NaN	NaN	NaN	NaN	89.20021	89.065143	89.335277	0.191013	16.539349	16.539349	16.539349	0.628595	Car	U12

# カラム名をわかりやすさのため変更します
df.columns = df.columns.str.replace('android.sensor.','').str.replace('#','_')

ここでは、移動手段（Car, Still, Train, Bus, Walking）を次の変数を用いて推定するモデルを考えます。

accelerometer_mean
game_rotation_vector_mean
gyroscope_mean
linear_acceleration_mean
orientation_mean
pressure_mean
rotation_vector_mean
sound_mean

上の変数の欠損している行を分析から除外します。

df = df[[
    'target','accelerometer_mean','game_rotation_vector_mean',
    'gyroscope_mean','linear_acceleration_mean','orientation_mean', 
    'pressure_mean','rotation_vector_mean','sound_mean'
]]
df = df.dropna()

df.describe()

	accelerometer_mean	game_rotation_vector_mean	gyroscope_mean	linear_acceleration_mean	orientation_mean	pressure_mean	rotation_vector_mean	sound_mean
count	1255.000000	1255.000000	1255.000000	1255.000000	1255.000000	1255.000000	1255.000000	1255.000000
mean	9.996223	0.732083	0.325744	1.507555	208.533199	1020.577142	0.768484	75.202876
std	0.720097	0.246679	0.496171	1.890074	92.273671	15.173493	0.224416	12.239499
min	7.369055	0.042684	0.001164	0.009447	17.135360	946.339840	0.120813	0.000000
25%	9.738144	0.570153	0.027239	0.272564	135.326907	1011.055625	0.668633	71.869349
50%	9.830731	0.763646	0.083699	0.707634	206.208459	1020.669000	0.813465	77.707224
75%	10.049428	0.969717	0.512946	2.083757	291.281798	1029.123300	0.959668	82.794433
max	14.661139	0.999999	3.268860	18.141770	396.133498	1058.609900	1.000000	90.308734

x = df.drop('target', axis=1)
y = df['target']

x.corr()

	accelerometer_mean	game_rotation_vector_mean	gyroscope_mean	linear_acceleration_mean	orientation_mean	pressure_mean	rotation_vector_mean	sound_mean
accelerometer_mean	1.000000	0.129695	0.500263	0.441453	0.043532	-0.045848	0.081403	0.138451
game_rotation_vector_mean	0.129695	1.000000	0.297029	0.395925	0.215494	-0.144826	0.488242	0.179971
gyroscope_mean	0.500263	0.297029	1.000000	0.732075	0.050534	-0.032837	0.169035	0.017961
linear_acceleration_mean	0.441453	0.395925	0.732075	1.000000	0.096761	-0.099997	0.226298	0.072577
orientation_mean	0.043532	0.215494	0.050534	0.096761	1.000000	0.007241	-0.007124	0.016743
pressure_mean	-0.045848	-0.144826	-0.032837	-0.099997	0.007241	1.000000	-0.288926	0.054731
rotation_vector_mean	0.081403	0.488242	0.169035	0.226298	-0.007124	-0.288926	1.000000	-0.010011
sound_mean	0.138451	0.179971	0.017961	0.072577	0.016743	0.054731	-0.010011	1.000000

y.value_counts().to_frame()

	count
target
Walking	346
Train	311
Car	310
Bus	220
Still	68

x = sm.add_constant(x, prepend = False)
mnlogit_mod = sm.MNLogit(y, x,
                         # method="bfgs", 
                         # maxiter=250
                        )
mnlogit_fit = mnlogit_mod.fit()
print (mnlogit_fit.summary())

Optimization terminated successfully.
         Current function value: 0.488472
         Iterations 11
                          MNLogit Regression Results                          
==============================================================================
Dep. Variable:                 target   No. Observations:                 1255
Model:                        MNLogit   Df Residuals:                     1219
Method:                           MLE   Df Model:                           32
Date:                Mon, 30 Dec 2024   Pseudo R-squ.:                  0.6764
Time:                        08:24:57   Log-Likelihood:                -613.03
converged:                       True   LL-Null:                       -1894.5
Covariance Type:            nonrobust   LLR p-value:                     0.000
=============================================================================================
               target=Car       coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------------
accelerometer_mean           -1.3405      0.326     -4.117      0.000      -1.979      -0.702
game_rotation_vector_mean    -2.8793      0.932     -3.088      0.002      -4.707      -1.052
gyroscope_mean               -1.2694      0.827     -1.535      0.125      -2.890       0.351
linear_acceleration_mean     -0.3871      0.224     -1.730      0.084      -0.826       0.052
orientation_mean             -0.0100      0.002     -4.722      0.000      -0.014      -0.006
pressure_mean                -0.2423      0.022    -11.012      0.000      -0.285      -0.199
rotation_vector_mean         -6.5886      1.024     -6.433      0.000      -8.596      -4.581
sound_mean                    0.1831      0.023      8.124      0.000       0.139       0.227
const                       255.7548     23.220     11.014      0.000     210.244     301.266
---------------------------------------------------------------------------------------------
             target=Still       coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------------
accelerometer_mean           -2.3018      1.406     -1.638      0.101      -5.057       0.453
game_rotation_vector_mean    -1.4923      1.166     -1.280      0.201      -3.778       0.793
gyroscope_mean                2.0263      1.769      1.145      0.252      -1.441       5.494
linear_acceleration_mean     -6.1539      1.390     -4.427      0.000      -8.879      -3.429
orientation_mean             -0.0112      0.003     -3.412      0.001      -0.018      -0.005
pressure_mean                -0.1061      0.031     -3.456      0.001      -0.166      -0.046
rotation_vector_mean         -3.7290      1.312     -2.843      0.004      -6.300      -1.158
sound_mean                   -0.1043      0.025     -4.151      0.000      -0.154      -0.055
const                       144.7582     35.347      4.095      0.000      75.479     214.037
---------------------------------------------------------------------------------------------
             target=Train       coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------------
accelerometer_mean           -0.4580      0.518     -0.883      0.377      -1.474       0.558
game_rotation_vector_mean     0.9464      0.830      1.140      0.254      -0.681       2.574
gyroscope_mean                0.0587      0.632      0.093      0.926      -1.180       1.298
linear_acceleration_mean     -1.7283      0.255     -6.766      0.000      -2.229      -1.228
orientation_mean             -0.0237      0.002    -11.086      0.000      -0.028      -0.019
pressure_mean                 0.0276      0.011      2.432      0.015       0.005       0.050
rotation_vector_mean         -8.2604      0.901     -9.171      0.000     -10.026      -6.495
sound_mean                    0.1133      0.020      5.724      0.000       0.075       0.152
const                       -20.3954     13.175     -1.548      0.122     -46.217       5.426
---------------------------------------------------------------------------------------------
           target=Walking       coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------------
accelerometer_mean            0.1972      0.211      0.934      0.350      -0.216       0.611
game_rotation_vector_mean     8.2232      1.216      6.762      0.000       5.840      10.607
gyroscope_mean                3.3952      0.576      5.897      0.000       2.267       4.524
linear_acceleration_mean      0.8137      0.132      6.149      0.000       0.554       1.073
orientation_mean             -0.0108      0.002     -4.662      0.000      -0.015      -0.006
pressure_mean                 0.0122      0.015      0.817      0.414      -0.017       0.042
rotation_vector_mean         -6.9962      1.184     -5.908      0.000      -9.317      -4.675
sound_mean                   -0.0786      0.016     -4.811      0.000      -0.111      -0.047
const                        -9.8708     15.770     -0.626      0.531     -40.779      21.037
=============================================================================================

多項ロジットモデルでは、選択肢の数をkとするとk-1個のモデルを推定します。上の出力を見ると、5つの選択肢のうちBusがベース（参照グループ）として設定されているため、出力された結果を見る際には、パラメーターの推定値は、参照グループであるバスに対する相対的なものとなります。

多項ロジットモデルでの一般的な解釈は、説明変数の単位変化に対して、参照グループに対して、モデル内の他の変数が一定である場合、パラメーター推定値の分だけ変化すると期待されることになります。

上の結果でtarget=Carの出力結果では、バスと車の相対比較を示しています。

accelerometer_meanの推定された係数は-1.340で、モデル内の他の変数が一定である場合、バスに対する車のaccelerometer_meanの値が1単位増えると、バスより車である可能性は1.340減少すると解釈できます。

同様に、game_rotation_vector_meanが1単位増えると、モデル内の他の変数が一定である場合、ユーザーの移動手段はバスであるよりも車である可能性が2.8793減少する可能性があると解釈できます。