9.2. Yahoo! FINANCEのデータを用いた可視化の演習#

  • ここからは米国のYahoo! FINANCEの株価データを用いて相関関係の可視化を行います。

  • 今日現在までの直近1年間のtech_stockおよび日系企業の株価データを用います。

  • PythonでYahoo! FINANCEのデータを集めるパッケージyfinanceを用います。

Warning

ここで用いるパッケージyfinanceはYahooが公開しているAPIを利用したオープンソースのツールであり、研究・教育目的での利用を想定しています。 ダウンロードした実際のデータを使用する権利の詳細については、ヤフーの利用規約を参照する必要があります(Yahoo Developer API Terms of Use; Yahoo Terms of Service; Yahoo Terms)。

  • ここで紹介する企業以外を試す場合は、Yahoo! FINANCEのページで企業の銘柄コードを確認して置き換えてください。

Warning

必要なパッケージをインストールします。すでに、requirements.txtを用いてパッケージ類をインストールしている場合は、そのまま次のセルに進んでください。

まだの場合は、こちらのページを用いてパッケージ類をインストールしてください。もしくは、以下の方法をお試しください。

  1. WinodwsのかたはWindowsのメニューからAnaconda Propmtを、Macの方はTerminalを起動させ、

  2. conda install git を入力してエンターを押してください。しばらくするとProceed ([y]/n)? と表示されるのでyを入力してエンターを押して続行してください。

  3. pip install git+https://github.com/pydata/pandas-datareaderを入力しエンターを押して、pandas-datareaderのインストールを実行します。

  4. Terminal上で、pip install yfinance --upgrade --no-cache-dir を入力しエンターを押してyfinanceのインストールを実行します。

from datetime import datetime
import os
import pandas as pd
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import numpy as np
import yfinance as yf
end = datetime.now()
start =  datetime(end.year-1, end.month, end.day)

yf.pdr_override()

tech_stock = ['GOOG', 'AAPL', 'META', 'AMZN', 'NFLX', 'TSLA'] 

for company in tech_stock:
    globals()[company] = pdr.get_data_yahoo(tickers=company, start=start, end=end)
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

[*********************100%%**********************]  1 of 1 completed

[*********************100%%**********************]  1 of 1 completed

[*********************100%%**********************]  1 of 1 completed

[*********************100%%**********************]  1 of 1 completed

Open:始値
High:高値
Low:安値
Close:終値
Volume:出来高(1日に取引が成立した株の数)
Adj Close:調整後終値

GOOG.describe()
Open High Low Close Adj Close Volume
count 252.000000 252.000000 252.000000 252.000000 252.000000 2.520000e+02
mean 138.559992 140.053734 137.347151 138.762500 138.762500 2.260032e+07
std 12.946310 13.035053 12.901735 13.092495 13.092495 8.305644e+06
min 116.760002 118.224998 115.830002 116.870003 116.870003 8.828600e+06
25% 129.822498 131.403748 128.881252 130.079994 130.079994 1.761085e+07
50% 137.032501 138.620003 135.899994 137.275002 137.275002 2.043080e+07
75% 144.568756 145.868748 143.352505 144.487495 144.487495 2.455272e+07
max 175.990005 177.490005 174.979996 177.380005 177.380005 5.879610e+07
GOOG.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2023-05-18 to 2024-05-17
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       252 non-null    float64
 1   High       252 non-null    float64
 2   Low        252 non-null    float64
 3   Close      252 non-null    float64
 4   Adj Close  252 non-null    float64
 5   Volume     252 non-null    int64  
dtypes: float64(5), int64(1)
memory usage: 13.8 KB
# Google の調整済終値のプロット
GOOG['Adj Close'].plot(legend=True, figsize=(10,4))
plt.title("Google Adjusted Closing Price", fontsize=15)
plt.ylabel('price (USD)')
plt.grid()
plt.show()
../../_images/2f4de1c4a6a8fca9a5a359b244c9892b2c1067bb125b1533ee5b111df47ff5de.png

9.2.1. 移動平均(Moving Average)#

時系列データで一定区間ごとの平均値を区間をずらしながら求めたもの

ma_day = [10, 20, 30] # 10日、20日、50日の移動平均の値を持つ新しいcolumn(MA_10, MA_20, MA_50)を作ります
for ma in ma_day:
    for company in [GOOG, AAPL, META, NFLX, AMZN]:
        company['MA_{}'.format(ma)] = company['Adj Close'].rolling(ma).mean() #rolling(日数).mean()で日数の移動平均を求めます
AAPL.head(3)
Open High Low Close Adj Close Volume MA_10 MA_20 MA_30
Date
2023-05-18 173.000000 175.240005 172.580002 175.050003 174.125275 65496700 NaN NaN NaN
2023-05-19 176.389999 176.389999 174.940002 175.160004 174.234695 55772400 NaN NaN NaN
2023-05-22 173.979996 174.710007 173.449997 174.199997 173.279739 43570900 NaN NaN NaN
AAPL[['Adj Close','MA_10', 'MA_20','MA_30']].plot(subplots=False, figsize=(10,5))
plt.title('Moving Average (10 days, 20 days, 30 days windows)')
plt.ylabel('price (USD)')
plt.grid()
plt.show()
../../_images/eee0171ccdc349ff9fc84292c15186172a41ca248ac89b5edd85eed461777f05.png

9.2.2. 参考: 株価の前日からのパーセント変化を求めます#

for company in [GOOG, AAPL, META, AMZN, NFLX,TSLA]:
    company['returns'] = company['Adj Close'].pct_change()
colors = ['orange','black','blue','red','yellow','green']
i=0
plt.figure(figsize=(8,5))
for company in [GOOG, AAPL, META, AMZN, NFLX, TSLA]:
    plt.hist(company['returns'].dropna(),bins=100,color=colors[i],alpha = 0.2, label=tech_stock[i])
    i += 1
plt.legend()
plt.xlabel('Percentage change')
plt.ylabel('Frequency')
plt.grid(axis='x')
plt.show()
../../_images/995bf8d31e55de7745b4ab619bbb0e2cb2d7befa07ebcd1b43b129118aabeee1.png

ヒストグラムで6社の変化率を上のように示すと、6社とも多くの日で前日からの変化率は??%以内。

6社の終値を格納したDataFrameを作成します

tech_stock_close = pd.DataFrame({'GOOG':GOOG['Adj Close'],
                           'AAPL':AAPL['Adj Close'],
                           'META': META['Adj Close'],
                           'AMZN': AMZN['Adj Close'],
                           'NFLX': NFLX['Adj Close'],
                           'TSLA': TSLA['Adj Close']})
tech_stock_close.describe()
GOOG AAPL META AMZN NFLX TSLA
count 252.000000 252.000000 252.000000 252.000000 252.000000 252.000000
mean 138.762500 181.291907 363.562652 149.101488 483.176001 221.755397
std 13.092495 8.370487 83.577548 21.289901 83.789208 36.301131
min 116.870003 164.776505 245.379654 114.989998 346.190002 142.050003
25% 130.079994 174.119308 299.317421 130.202499 421.292503 187.507500
50% 137.275002 181.373085 325.689438 144.545006 453.830002 232.120003
75% 144.487495 188.820126 466.163452 171.847500 564.242493 252.699993
max 177.380005 197.589523 527.340027 189.500000 636.179993 293.339996
tech_stock_close.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2023-05-18 to 2024-05-17
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   GOOG    252 non-null    float64
 1   AAPL    252 non-null    float64
 2   META    252 non-null    float64
 3   AMZN    252 non-null    float64
 4   NFLX    252 non-null    float64
 5   TSLA    252 non-null    float64
dtypes: float64(6)
memory usage: 13.8 KB

9.2.2.1. GoogleとApple#

GoogleとApple の直近1年間の株価終値の相関係数を求めます。

tech_stock_close['GOOG'].corr(tech_stock_close['AAPL'])
-0.16504613915410987

GoogleとApple の直近1年間の株価終値の相関行列を示します。

tech_stock_close[['GOOG','AAPL']].corr()
GOOG AAPL
GOOG 1.000000 -0.165046
AAPL -0.165046 1.000000

GoogleとApple の直近1年間の株価終値の散布図を示します。

plt.figure(figsize=(5,5))
plt.scatter(tech_stock_close['GOOG'],tech_stock_close['AAPL'],color ='y',alpha=0.5)
plt.xlabel('Google')
plt.ylabel('Apple')
plt.title("Closing prices of Google and Apple")
plt.show()
../../_images/eb9b55d6ad2503cda9e7520e05fd4036ef5c9f9b1e2dd7692b0c6e059f64348e.png

9.2.2.2. ヒストグラムと散布図を1つの図中に示す方法#

import seaborn as sns

sns.jointplot(data=tech_stock_close, x='GOOG', y='TSLA')
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
../../_images/1e6fe294f8edf51fc7b9b709a0b7d7a3a42375b19eae300f813ada61be4b204d.png
sns.pairplot(tech_stock_close, plot_kws=dict(color = 'b', edgecolor='b', alpha = 0.2))
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
../../_images/f8de5e699837992354155082e7689872a09913123341c92875dd64a521c4fe48.png

6社の相関行列を示します。

tech_stock_close.corr()
GOOG AAPL META AMZN NFLX TSLA
GOOG 1.000000 -0.165046 0.777365 0.850440 0.736902 -0.656292
AAPL -0.165046 1.000000 -0.244836 -0.153824 -0.094929 0.497660
META 0.777365 -0.244836 1.000000 0.962847 0.952720 -0.765094
AMZN 0.850440 -0.153824 0.962847 1.000000 0.956115 -0.730409
NFLX 0.736902 -0.094929 0.952720 0.956115 1.000000 -0.725121
TSLA -0.656292 0.497660 -0.765094 -0.730409 -0.725121 1.000000

9.2.2.2.1. 参考#

続いて日本の自動車メーカーのToyota Motor CorporationとHonda Motor Co., Ltd.の直近1年の株価も収集します。

end = datetime.now()
start =  datetime(end.year-1, end.month, end.day)

yf.pdr_override()

vehicles = ['TM', 'HMC'] # TM : Toyota Motor Corporation, HMC: Honda Motor Co., Ltd.
for company in vehicles:
    globals()[company] = pdr.get_data_yahoo(tickers=company, start=start, end=end) 
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed

print(TM.info(), HMC.info())
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2023-05-18 to 2024-05-17
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       252 non-null    float64
 1   High       252 non-null    float64
 2   Low        252 non-null    float64
 3   Close      252 non-null    float64
 4   Adj Close  252 non-null    float64
 5   Volume     252 non-null    int64  
dtypes: float64(5), int64(1)
memory usage: 13.8 KB
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2023-05-18 to 2024-05-17
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       252 non-null    float64
 1   High       252 non-null    float64
 2   Low        252 non-null    float64
 3   Close      252 non-null    float64
 4   Adj Close  252 non-null    float64
 5   Volume     252 non-null    int64  
dtypes: float64(5), int64(1)
memory usage: 13.8 KB
None None
tm_hmc = pd.DataFrame({'TM':TM['Adj Close'],'HMC':HMC['Adj Close']})
tech_stock_th = pd.concat([tech_stock_close, tm_hmc], axis=1)
tech_stock_th.sample(4)
GOOG AAPL META AMZN NFLX TSLA TM HMC
Date
2023-11-20 137.919998 190.947021 339.609680 146.130005 474.470001 235.600006 187.679993 31.580000
2023-10-17 140.990005 176.452118 323.656586 131.470001 355.720001 254.850006 178.220001 33.770000
2023-10-11 141.699997 179.091675 327.472565 131.830002 365.929993 262.989990 178.029999 33.810001
2024-05-03 168.990005 183.131607 451.959991 186.210007 579.340027 181.190002 232.869995 34.590000
# データを保存
os.makedirs('./data', exist_ok=True)
tech_stock_close.to_csv('./data/tech_stock_close.csv')
tech_stock_close.to_pickle('./data/tech_stock_close.pkl')

tech_stock4社と自動車メーカー2社の相関行列を示します。

tech_stock_th.corr()
GOOG AAPL META AMZN NFLX TSLA TM HMC
GOOG 1.000000 -0.165046 0.777365 0.850440 0.736902 -0.656292 0.765138 0.655115
AAPL -0.165046 1.000000 -0.244836 -0.153824 -0.094929 0.497660 -0.294416 -0.447050
META 0.777365 -0.244836 1.000000 0.962847 0.952720 -0.765094 0.961607 0.814911
AMZN 0.850440 -0.153824 0.962847 1.000000 0.956115 -0.730409 0.939077 0.749034
NFLX 0.736902 -0.094929 0.952720 0.956115 1.000000 -0.725121 0.900644 0.694955
TSLA -0.656292 0.497660 -0.765094 -0.730409 -0.725121 1.000000 -0.712842 -0.545477
TM 0.765138 -0.294416 0.961607 0.939077 0.900644 -0.712842 1.000000 0.890915
HMC 0.655115 -0.447050 0.814911 0.749034 0.694955 -0.545477 0.890915 1.000000

9.2.2.3. 参考: ヒートマップで相関関係を示す#

変数が多い場合視覚的にわかりやすい

def CorrMtx(df, dropDuplicates = True):

    if dropDuplicates:    
        mask = np.zeros_like(df, dtype=bool)
        mask[np.triu_indices_from(mask,1)] = True

    sns.set_style(style = 'white')

    fig, ax = plt.subplots(figsize=(7, 7))

    cmap = sns.diverging_palette(250, 10, as_cmap=True)

    if dropDuplicates:
        sns.heatmap(df, mask=mask, vmin=-1, vmax=1,annot=True,cmap=cmap)
    else:
        sns.heatmap(df, vmin=-1, vmax=1,annot=True,cmap=cmap)


CorrMtx(tech_stock_th.corr(), dropDuplicates = True)
/opt/anaconda3/lib/python3.11/site-packages/seaborn/matrix.py:260: FutureWarning: Format strings passed to MaskedConstant are ignored, but in future may error or produce different behavior
  annotation = ("{:" + self.fmt + "}").format(val)
../../_images/4a87e9c0d123fe0f5857475a4a6243df4ad6699f4beed7d18719356391223711.png

AppleとToyotaの散布図を示します。

sns.jointplot(x='AAPL', y='TM', data=tech_stock_th, color="purple", alpha = 0.5)
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
../../_images/96f2ac7a43161eeea2942c59f33b8bef38a061b871c13ab9ed541fb0b0b27bda.png

HondaとToyotaの散布図を示します。

sns.jointplot(x='HMC', y='TM', data=tech_stock_th, color="orange", alpha = 0.5)
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
../../_images/d8c4333cf5b6fff84184f1fb913e055b994e30593f897dfed344ffa848b23eda.png
sns.pairplot(tech_stock_th, plot_kws=dict(color = 'blue', edgecolor='b', alpha = 0.2))
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
../../_images/77c4e39b968e175923317c1ba322454d28b75a0a29c908651335b288cc87a29d.png

9.2.2.4. 参考:ロウソクチャートを描く#

Note

以下cufflinksというパッケージを用いて可視化を行います。 requirements.txtを用いて必要なパッケージをすでにインストールしている場合は、次のセルに進んでください。インストール方法はこちら必要なパッケージをご確認ください。

まだの場合などは、pip install cufflinksをターミナルで実行しcufflinksをインストールてから次のセルを実行してください。

import cufflinks as cf
cf.set_config_file(offline=True)

qf = cf.QuantFig(AAPL, legend='top', title = 'Apple Candle Chart')
qf.iplot()
qf = cf.QuantFig(AAPL, legend='top', title = 'Apple Candle Chart')
qf.add_volume()  # 出来高もプロット
qf.add_sma([10,50],width=2, color=['red', 'green'])  # 移動平均線もプロット
qf.iplot()
/opt/anaconda3/lib/python3.11/site-packages/cufflinks/quant_figure.py:1061: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`