{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "aa5ed150-c487-4697-b8cf-d7a89fafe745", "metadata": {}, "source": [ "# 回帰分析\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d73b8b34-7416-4cf0-9d4e-13a5259fff07", "metadata": {}, "source": [ "\n", "## 単回帰分析\n", "\n", "ここではまず、説明変数が1つの単回帰(Simple Regression)を考えます。\n", "\n", "$y_i = \\beta_0+\\beta_1x_i+\\epsilon_i$\n", "\n", "\n", "* $y_i$: 被説明変数(explained variable), 従属変数(dependent variable),アウトカム(outcome variable)など\n", "\n", "* $x_i$: 説明変数(explanatory variable), 独立変数(independent variable)など\n", "\n", "* $\\epsilon_i$: 誤差項(erros, error terms) 平均はゼロと仮定\n", "\n", "* $i$: 観測値のインデックス\n", "\n", "* $\\beta_0$:定数項、切片パラメター\n", "\n", "* $\\beta_1$:係数パラメター、傾きパラメター" ] }, { "attachments": {}, "cell_type": "markdown", "id": "19e8e080-95f3-499f-8ec1-7298b829ad4c", "metadata": {}, "source": [ "最小二乗法推定法(Ordinary Least Squares; OLS)やモーメント法、最尤法などで推定できます。\n", "\n", "最小二乗法の場合は、\n", "\n", "$\\sum_{i=1}^{n}(y_i-\\beta_0-\\beta_1x_i)^2$ を最小化する$\\beta_0$, $\\beta_1$を求めます" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ab86b3da-7435-4e0d-9a06-d84410c835e9", "metadata": {}, "source": [ "推定した係数$\\beta_1$は被説明変数$y_i$に対する説明変数$x_i$の影響度合いを表す限界効果、あるいは説明変数が1単位変化した場合の被説明変数の変化($\\frac{\\delta y_i}{\\delta x_i}$)と解釈することができます。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c08f99a2-f086-4464-842d-724b36403080", "metadata": {}, "source": [ "\n", "\n", "ここでは`statsmodels` を用いて単回帰分析を行います。\n", "\n", "`statsmodels`は線形回帰、一般化線形モデル、制限付き従属変数モデル、ARIMA、VARモデルなどをカバーするパッケージです。\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "75f0d4ed-6a37-42fc-92eb-431249ad332d", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import statsmodels.api as sm\n", "import matplotlib.pyplot as plt" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7775cbd7-ac46-4c43-a91c-5145573d2a9f", "metadata": {}, "source": [ "回帰分析で用いるデータを読み込みます。\n", "\n", "ここでは、`statsmodels`に付属する`statecrime`というデータセットの1つで、1996年`American National Elections Study`のデータを用いて回帰分析を行います。" ] }, { "cell_type": "code", "execution_count": 18, "id": "6eda48b5-9d42-4f0f-9273-719ff44d925c", "metadata": {}, "outputs": [], "source": [ "data = sm.datasets.statecrime.load_pandas().data" ] }, { "cell_type": "code", "execution_count": 19, "id": "73638e8b-8435-4103-b1b7-107530831332", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | violent | \n", "murder | \n", "hs_grad | \n", "poverty | \n", "single | \n", "white | \n", "urban | \n", "
---|---|---|---|---|---|---|---|
state | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
Alabama | \n", "459.9 | \n", "7.1 | \n", "82.1 | \n", "17.5 | \n", "29.0 | \n", "70.0 | \n", "48.65 | \n", "
Alaska | \n", "632.6 | \n", "3.2 | \n", "91.4 | \n", "9.0 | \n", "25.5 | \n", "68.3 | \n", "44.46 | \n", "