9.2. Yahoo! FINANCEのデータを用いた可視化の演習#
ここからは米国のYahoo! FINANCEの株価データを用いて相関関係の可視化を行います。
今日現在までの直近1年間のtech_stockおよび日系企業の株価データを用います。
Pythonで
Yahoo! FINANCE
のデータを集めるパッケージyfinance
を用います。
Warning
ここで用いるパッケージyfinanceはYahooが公開しているAPIを利用したオープンソースのツールであり、研究・教育目的での利用を想定しています。 ダウンロードした実際のデータを使用する権利の詳細については、ヤフーの利用規約を参照する必要があります(Yahoo Developer API Terms of Use; Yahoo Terms of Service; Yahoo Terms)。
ここで紹介する企業以外を試す場合は、Yahoo! FINANCEのページで企業の銘柄コードを確認して置き換えてください。
Warning
必要なパッケージをインストールします。すでに、requirements.txt
を用いてパッケージ類をインストールしている場合は、そのまま次のセルに進んでください。
まだの場合は、こちらのページを用いてパッケージ類をインストールしてください。もしくは、以下の方法をお試しください。
WinodwsのかたはWindowsのメニューから
Anaconda Propmt
を、Macの方はTerminal
を起動させ、conda install git
を入力してエンターを押してください。しばらくするとProceed ([y]/n)?
と表示されるのでy
を入力してエンターを押して続行してください。pip install git+https://github.com/pydata/pandas-datareader
を入力しエンターを押して、pandas-datareader
のインストールを実行します。Terminal上で、
pip install yfinance --upgrade --no-cache-dir
を入力しエンターを押してyfinance
のインストールを実行します。
from datetime import datetime
import os
import pandas as pd
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import numpy as np
import yfinance as yf
end = datetime.now()
start = datetime(end.year-1, end.month, end.day)
yf.pdr_override()
tech_stock = ['GOOG', 'AAPL', 'META', 'AMZN', 'NFLX', 'TSLA']
for company in tech_stock:
globals()[company] = pdr.get_data_yahoo(tickers=company, start=start, end=end)
[*********************100%%**********************] 1 of 1 completed
[*********************100%%**********************] 1 of 1 completed
[*********************100%%**********************] 1 of 1 completed
[*********************100%%**********************] 1 of 1 completed
[*********************100%%**********************] 1 of 1 completed
[*********************100%%**********************] 1 of 1 completed
Open:始値
High:高値
Low:安値
Close:終値
Volume:出来高(1日に取引が成立した株の数)
Adj Close:調整後終値
GOOG.describe()
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
count | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 2.520000e+02 |
mean | 138.559992 | 140.053734 | 137.347151 | 138.762500 | 138.762500 | 2.260032e+07 |
std | 12.946310 | 13.035053 | 12.901735 | 13.092495 | 13.092495 | 8.305644e+06 |
min | 116.760002 | 118.224998 | 115.830002 | 116.870003 | 116.870003 | 8.828600e+06 |
25% | 129.822498 | 131.403748 | 128.881252 | 130.079994 | 130.079994 | 1.761085e+07 |
50% | 137.032501 | 138.620003 | 135.899994 | 137.275002 | 137.275002 | 2.043080e+07 |
75% | 144.568756 | 145.868748 | 143.352505 | 144.487495 | 144.487495 | 2.455272e+07 |
max | 175.990005 | 177.490005 | 174.979996 | 177.380005 | 177.380005 | 5.879610e+07 |
GOOG.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2023-05-18 to 2024-05-17
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 252 non-null float64
1 High 252 non-null float64
2 Low 252 non-null float64
3 Close 252 non-null float64
4 Adj Close 252 non-null float64
5 Volume 252 non-null int64
dtypes: float64(5), int64(1)
memory usage: 13.8 KB
# Google の調整済終値のプロット
GOOG['Adj Close'].plot(legend=True, figsize=(10,4))
plt.title("Google Adjusted Closing Price", fontsize=15)
plt.ylabel('price (USD)')
plt.grid()
plt.show()
9.2.1. 移動平均(Moving Average)#
時系列データで一定区間ごとの平均値を区間をずらしながら求めたもの
ma_day = [10, 20, 30] # 10日、20日、50日の移動平均の値を持つ新しいcolumn(MA_10, MA_20, MA_50)を作ります
for ma in ma_day:
for company in [GOOG, AAPL, META, NFLX, AMZN]:
company['MA_{}'.format(ma)] = company['Adj Close'].rolling(ma).mean() #rolling(日数).mean()で日数の移動平均を求めます
AAPL.head(3)
Open | High | Low | Close | Adj Close | Volume | MA_10 | MA_20 | MA_30 | |
---|---|---|---|---|---|---|---|---|---|
Date | |||||||||
2023-05-18 | 173.000000 | 175.240005 | 172.580002 | 175.050003 | 174.125275 | 65496700 | NaN | NaN | NaN |
2023-05-19 | 176.389999 | 176.389999 | 174.940002 | 175.160004 | 174.234695 | 55772400 | NaN | NaN | NaN |
2023-05-22 | 173.979996 | 174.710007 | 173.449997 | 174.199997 | 173.279739 | 43570900 | NaN | NaN | NaN |
AAPL[['Adj Close','MA_10', 'MA_20','MA_30']].plot(subplots=False, figsize=(10,5))
plt.title('Moving Average (10 days, 20 days, 30 days windows)')
plt.ylabel('price (USD)')
plt.grid()
plt.show()
9.2.2. 参考: 株価の前日からのパーセント変化を求めます#
for company in [GOOG, AAPL, META, AMZN, NFLX,TSLA]:
company['returns'] = company['Adj Close'].pct_change()
colors = ['orange','black','blue','red','yellow','green']
i=0
plt.figure(figsize=(8,5))
for company in [GOOG, AAPL, META, AMZN, NFLX, TSLA]:
plt.hist(company['returns'].dropna(),bins=100,color=colors[i],alpha = 0.2, label=tech_stock[i])
i += 1
plt.legend()
plt.xlabel('Percentage change')
plt.ylabel('Frequency')
plt.grid(axis='x')
plt.show()
ヒストグラムで6社の変化率を上のように示すと、6社とも多くの日で前日からの変化率は??%以内。
6社の終値を格納したDataFrameを作成します
tech_stock_close = pd.DataFrame({'GOOG':GOOG['Adj Close'],
'AAPL':AAPL['Adj Close'],
'META': META['Adj Close'],
'AMZN': AMZN['Adj Close'],
'NFLX': NFLX['Adj Close'],
'TSLA': TSLA['Adj Close']})
tech_stock_close.describe()
GOOG | AAPL | META | AMZN | NFLX | TSLA | |
---|---|---|---|---|---|---|
count | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 |
mean | 138.762500 | 181.291907 | 363.562652 | 149.101488 | 483.176001 | 221.755397 |
std | 13.092495 | 8.370487 | 83.577548 | 21.289901 | 83.789208 | 36.301131 |
min | 116.870003 | 164.776505 | 245.379654 | 114.989998 | 346.190002 | 142.050003 |
25% | 130.079994 | 174.119308 | 299.317421 | 130.202499 | 421.292503 | 187.507500 |
50% | 137.275002 | 181.373085 | 325.689438 | 144.545006 | 453.830002 | 232.120003 |
75% | 144.487495 | 188.820126 | 466.163452 | 171.847500 | 564.242493 | 252.699993 |
max | 177.380005 | 197.589523 | 527.340027 | 189.500000 | 636.179993 | 293.339996 |
tech_stock_close.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2023-05-18 to 2024-05-17
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 GOOG 252 non-null float64
1 AAPL 252 non-null float64
2 META 252 non-null float64
3 AMZN 252 non-null float64
4 NFLX 252 non-null float64
5 TSLA 252 non-null float64
dtypes: float64(6)
memory usage: 13.8 KB
9.2.2.1. GoogleとApple#
GoogleとApple の直近1年間の株価終値の相関係数を求めます。
tech_stock_close['GOOG'].corr(tech_stock_close['AAPL'])
-0.16504613915410987
GoogleとApple の直近1年間の株価終値の相関行列を示します。
tech_stock_close[['GOOG','AAPL']].corr()
GOOG | AAPL | |
---|---|---|
GOOG | 1.000000 | -0.165046 |
AAPL | -0.165046 | 1.000000 |
GoogleとApple の直近1年間の株価終値の散布図を示します。
plt.figure(figsize=(5,5))
plt.scatter(tech_stock_close['GOOG'],tech_stock_close['AAPL'],color ='y',alpha=0.5)
plt.xlabel('Google')
plt.ylabel('Apple')
plt.title("Closing prices of Google and Apple")
plt.show()
9.2.2.2. ヒストグラムと散布図を1つの図中に示す方法#
import seaborn as sns
sns.jointplot(data=tech_stock_close, x='GOOG', y='TSLA')
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
sns.pairplot(tech_stock_close, plot_kws=dict(color = 'b', edgecolor='b', alpha = 0.2))
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
6社の相関行列を示します。
tech_stock_close.corr()
GOOG | AAPL | META | AMZN | NFLX | TSLA | |
---|---|---|---|---|---|---|
GOOG | 1.000000 | -0.165046 | 0.777365 | 0.850440 | 0.736902 | -0.656292 |
AAPL | -0.165046 | 1.000000 | -0.244836 | -0.153824 | -0.094929 | 0.497660 |
META | 0.777365 | -0.244836 | 1.000000 | 0.962847 | 0.952720 | -0.765094 |
AMZN | 0.850440 | -0.153824 | 0.962847 | 1.000000 | 0.956115 | -0.730409 |
NFLX | 0.736902 | -0.094929 | 0.952720 | 0.956115 | 1.000000 | -0.725121 |
TSLA | -0.656292 | 0.497660 | -0.765094 | -0.730409 | -0.725121 | 1.000000 |
9.2.2.2.1. 参考#
続いて日本の自動車メーカーのToyota Motor CorporationとHonda Motor Co., Ltd.の直近1年の株価も収集します。
end = datetime.now()
start = datetime(end.year-1, end.month, end.day)
yf.pdr_override()
vehicles = ['TM', 'HMC'] # TM : Toyota Motor Corporation, HMC: Honda Motor Co., Ltd.
for company in vehicles:
globals()[company] = pdr.get_data_yahoo(tickers=company, start=start, end=end)
[*********************100%%**********************] 1 of 1 completed
[*********************100%%**********************] 1 of 1 completed
print(TM.info(), HMC.info())
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2023-05-18 to 2024-05-17
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 252 non-null float64
1 High 252 non-null float64
2 Low 252 non-null float64
3 Close 252 non-null float64
4 Adj Close 252 non-null float64
5 Volume 252 non-null int64
dtypes: float64(5), int64(1)
memory usage: 13.8 KB
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2023-05-18 to 2024-05-17
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 252 non-null float64
1 High 252 non-null float64
2 Low 252 non-null float64
3 Close 252 non-null float64
4 Adj Close 252 non-null float64
5 Volume 252 non-null int64
dtypes: float64(5), int64(1)
memory usage: 13.8 KB
None None
tm_hmc = pd.DataFrame({'TM':TM['Adj Close'],'HMC':HMC['Adj Close']})
tech_stock_th = pd.concat([tech_stock_close, tm_hmc], axis=1)
tech_stock_th.sample(4)
GOOG | AAPL | META | AMZN | NFLX | TSLA | TM | HMC | |
---|---|---|---|---|---|---|---|---|
Date | ||||||||
2023-11-20 | 137.919998 | 190.947021 | 339.609680 | 146.130005 | 474.470001 | 235.600006 | 187.679993 | 31.580000 |
2023-10-17 | 140.990005 | 176.452118 | 323.656586 | 131.470001 | 355.720001 | 254.850006 | 178.220001 | 33.770000 |
2023-10-11 | 141.699997 | 179.091675 | 327.472565 | 131.830002 | 365.929993 | 262.989990 | 178.029999 | 33.810001 |
2024-05-03 | 168.990005 | 183.131607 | 451.959991 | 186.210007 | 579.340027 | 181.190002 | 232.869995 | 34.590000 |
# データを保存
os.makedirs('./data', exist_ok=True)
tech_stock_close.to_csv('./data/tech_stock_close.csv')
tech_stock_close.to_pickle('./data/tech_stock_close.pkl')
tech_stock4社と自動車メーカー2社の相関行列を示します。
tech_stock_th.corr()
GOOG | AAPL | META | AMZN | NFLX | TSLA | TM | HMC | |
---|---|---|---|---|---|---|---|---|
GOOG | 1.000000 | -0.165046 | 0.777365 | 0.850440 | 0.736902 | -0.656292 | 0.765138 | 0.655115 |
AAPL | -0.165046 | 1.000000 | -0.244836 | -0.153824 | -0.094929 | 0.497660 | -0.294416 | -0.447050 |
META | 0.777365 | -0.244836 | 1.000000 | 0.962847 | 0.952720 | -0.765094 | 0.961607 | 0.814911 |
AMZN | 0.850440 | -0.153824 | 0.962847 | 1.000000 | 0.956115 | -0.730409 | 0.939077 | 0.749034 |
NFLX | 0.736902 | -0.094929 | 0.952720 | 0.956115 | 1.000000 | -0.725121 | 0.900644 | 0.694955 |
TSLA | -0.656292 | 0.497660 | -0.765094 | -0.730409 | -0.725121 | 1.000000 | -0.712842 | -0.545477 |
TM | 0.765138 | -0.294416 | 0.961607 | 0.939077 | 0.900644 | -0.712842 | 1.000000 | 0.890915 |
HMC | 0.655115 | -0.447050 | 0.814911 | 0.749034 | 0.694955 | -0.545477 | 0.890915 | 1.000000 |
9.2.2.3. 参考: ヒートマップで相関関係を示す#
変数が多い場合視覚的にわかりやすい
def CorrMtx(df, dropDuplicates = True):
if dropDuplicates:
mask = np.zeros_like(df, dtype=bool)
mask[np.triu_indices_from(mask,1)] = True
sns.set_style(style = 'white')
fig, ax = plt.subplots(figsize=(7, 7))
cmap = sns.diverging_palette(250, 10, as_cmap=True)
if dropDuplicates:
sns.heatmap(df, mask=mask, vmin=-1, vmax=1,annot=True,cmap=cmap)
else:
sns.heatmap(df, vmin=-1, vmax=1,annot=True,cmap=cmap)
CorrMtx(tech_stock_th.corr(), dropDuplicates = True)
/opt/anaconda3/lib/python3.11/site-packages/seaborn/matrix.py:260: FutureWarning: Format strings passed to MaskedConstant are ignored, but in future may error or produce different behavior
annotation = ("{:" + self.fmt + "}").format(val)
AppleとToyotaの散布図を示します。
sns.jointplot(x='AAPL', y='TM', data=tech_stock_th, color="purple", alpha = 0.5)
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
HondaとToyotaの散布図を示します。
sns.jointplot(x='HMC', y='TM', data=tech_stock_th, color="orange", alpha = 0.5)
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
sns.pairplot(tech_stock_th, plot_kws=dict(color = 'blue', edgecolor='b', alpha = 0.2))
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
9.2.2.4. 参考:ロウソクチャートを描く#
Note
以下cufflinks
というパッケージを用いて可視化を行います。
requirements.txt
を用いて必要なパッケージをすでにインストールしている場合は、次のセルに進んでください。インストール方法はこちら必要なパッケージをご確認ください。
まだの場合などは、pip install cufflinks
をターミナルで実行しcufflinksをインストールてから次のセルを実行してください。
import cufflinks as cf
cf.set_config_file(offline=True)
qf = cf.QuantFig(AAPL, legend='top', title = 'Apple Candle Chart')
qf.iplot()
qf = cf.QuantFig(AAPL, legend='top', title = 'Apple Candle Chart')
qf.add_volume() # 出来高もプロット
qf.add_sma([10,50],width=2, color=['red', 'green']) # 移動平均線もプロット
qf.iplot()
/opt/anaconda3/lib/python3.11/site-packages/cufflinks/quant_figure.py:1061: FutureWarning:
Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`