{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# ファイルの読み書き\n",
"\n",
"## 絶対パスと相対パス\n",
"\n",
"一般的なファイル構造\n",
"\n",
"Mac/Linuxの場合\n",
"\n",
"
\n",
"\n",
"Winodowsの場合\n",
"\n",
"
\n",
"\n",
"### 絶対パス(absolute path)\n",
"- ルートからの道順(パス)を指定する方法\n",
"- 厳密なルートで間違いが少ない\n",
"- 長くなりがち\n",
"\n",
"* Windowsの例\n",
" * C:¥Users¥username¥Desktop¥Folder1¥test.txt\n",
"* Mac/Linuxの例\n",
" * /Users/username/folder1/test.txt\n",
"\n",
"### 相対パス(relative path)\n",
"- 基準となるディレクトリ(カレントディレクトリ)からの道順(パス)を指定する方法\n",
"\n",
"`..` で1階層上のディレクトリ、`.`で同じディレクトリ\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"現在の場所(カレントディレクトリ)は、`%pwd`で確認できます。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"%pwd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"現在の場所(カレントディレクトリ)にあるファイルをリストアップするためには`%ls`で確認できます。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%ls"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## pandasを用いたcsvの読み書き"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" num | \n",
" datetime | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 100.0 | \n",
" 5 | \n",
" 2018-06-01 00:00:00 | \n",
"
\n",
" \n",
" 1 | \n",
" 40.0 | \n",
" 2 | \n",
" 2018-06-27 03:00:00 | \n",
"
\n",
" \n",
" 2 | \n",
" 300.0 | \n",
" 1 | \n",
" 2018-07-23 06:00:00 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" 0 | \n",
" 2018-08-18 09:00:00 | \n",
"
\n",
" \n",
" 4 | \n",
" 500.0 | \n",
" 4 | \n",
" 2018-09-13 12:00:00 | \n",
"
\n",
" \n",
" 5 | \n",
" 1000.0 | \n",
" 200 | \n",
" 2018-10-09 15:00:00 | \n",
"
\n",
" \n",
" 6 | \n",
" 300.0 | \n",
" 7 | \n",
" 2018-11-04 18:00:00 | \n",
"
\n",
" \n",
" 7 | \n",
" 400.0 | \n",
" 19 | \n",
" 2018-11-30 21:00:00 | \n",
"
\n",
" \n",
" 8 | \n",
" 240.0 | \n",
" 20 | \n",
" 2018-12-27 00:00:00 | \n",
"
\n",
" \n",
" 9 | \n",
" 3000.0 | \n",
" 100 | \n",
" 2019-01-22 03:00:00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price num datetime\n",
"0 100.0 5 2018-06-01 00:00:00\n",
"1 40.0 2 2018-06-27 03:00:00\n",
"2 300.0 1 2018-07-23 06:00:00\n",
"3 NaN 0 2018-08-18 09:00:00\n",
"4 500.0 4 2018-09-13 12:00:00\n",
"5 1000.0 200 2018-10-09 15:00:00\n",
"6 300.0 7 2018-11-04 18:00:00\n",
"7 400.0 19 2018-11-30 21:00:00\n",
"8 240.0 20 2018-12-27 00:00:00\n",
"9 3000.0 100 2019-01-22 03:00:00"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"price = [100, 40, 300, np.nan , 500, 1000, 300, 400, 240, 3000]\n",
"num = [5, 2, 1, 0, 4, 200, 7, 19, 20, 100]\n",
"datetimes = pd.date_range('20180601', periods=10, freq= '627H')\n",
"\n",
"df = pd.DataFrame({'price':price, 'num': num, 'datetime': datetimes })\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('./testcsv.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ちゃんと保存されているか、確認してみましょう。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%ls "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### pandasを用いたcsvの読み込み"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"保存したcsvファイルをDataFrameとして読み込む"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" num | \n",
" datetime | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 100.0 | \n",
" 5 | \n",
" 2018-06-01 00:00:00 | \n",
"
\n",
" \n",
" 1 | \n",
" 40.0 | \n",
" 2 | \n",
" 2018-06-27 03:00:00 | \n",
"
\n",
" \n",
" 2 | \n",
" 300.0 | \n",
" 1 | \n",
" 2018-07-23 06:00:00 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" 0 | \n",
" 2018-08-18 09:00:00 | \n",
"
\n",
" \n",
" 4 | \n",
" 500.0 | \n",
" 4 | \n",
" 2018-09-13 12:00:00 | \n",
"
\n",
" \n",
" 5 | \n",
" 1000.0 | \n",
" 200 | \n",
" 2018-10-09 15:00:00 | \n",
"
\n",
" \n",
" 6 | \n",
" 300.0 | \n",
" 7 | \n",
" 2018-11-04 18:00:00 | \n",
"
\n",
" \n",
" 7 | \n",
" 400.0 | \n",
" 19 | \n",
" 2018-11-30 21:00:00 | \n",
"
\n",
" \n",
" 8 | \n",
" 240.0 | \n",
" 20 | \n",
" 2018-12-27 00:00:00 | \n",
"
\n",
" \n",
" 9 | \n",
" 3000.0 | \n",
" 100 | \n",
" 2019-01-22 03:00:00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price num datetime\n",
"0 100.0 5 2018-06-01 00:00:00\n",
"1 40.0 2 2018-06-27 03:00:00\n",
"2 300.0 1 2018-07-23 06:00:00\n",
"3 NaN 0 2018-08-18 09:00:00\n",
"4 500.0 4 2018-09-13 12:00:00\n",
"5 1000.0 200 2018-10-09 15:00:00\n",
"6 300.0 7 2018-11-04 18:00:00\n",
"7 400.0 19 2018-11-30 21:00:00\n",
"8 240.0 20 2018-12-27 00:00:00\n",
"9 3000.0 100 2019-01-22 03:00:00"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r_df = pd.read_csv('testcsv.csv', index_col = 0)\n",
"r_df"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### 参考 pandasを用いた様々なファイル形式の読み書き"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`pandas`を用いたcsvの読み書きは以前紹介しましたが、その他の形式のファイルの読み書きもpandasでは行えます。\n",
"\n",
"いくつか代表的なものを紹介します。"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"> `read_csv`: 区切り文字で区切られたデータを読み込む
\n",
"> `read_excel`: ExcelのXLSやXLSXファイルからデータを読み込む
\n",
"> `read_json`: JSON(JavaScript Object Notation)の文字列表現からデータを読み込む
\n",
"> `read_pickle`: Pythonのpickleバイナリ形式で書き出されたオブジェクトを読み込む
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 250 entries, 0 to 249\n",
"Data columns (total 7 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Date 250 non-null object \n",
" 1 GOOG 250 non-null float64\n",
" 2 AAPL 250 non-null float64\n",
" 3 META 250 non-null float64\n",
" 4 AMZN 250 non-null float64\n",
" 5 NFLX 250 non-null float64\n",
" 6 TSLA 250 non-null float64\n",
"dtypes: float64(6), object(1)\n",
"memory usage: 13.8+ KB\n"
]
}
],
"source": [
"r_df = pd.read_csv('testcsv.csv', index_col = 0)\n",
"r_df.info()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"r_df.to_pickle('./test.pkl')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"DatetimeIndex: 250 entries, 2022-04-04 to 2023-03-31\n",
"Data columns (total 6 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 GOOG 250 non-null float64\n",
" 1 AAPL 250 non-null float64\n",
" 2 META 250 non-null float64\n",
" 3 AMZN 250 non-null float64\n",
" 4 NFLX 250 non-null float64\n",
" 5 TSLA 250 non-null float64\n",
"dtypes: float64(6)\n",
"memory usage: 13.7 KB\n"
]
}
],
"source": [
"r_df_pkl = pd.read_pickle('./test.pkl')\n",
"r_df_pkl.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## pickleの読み書き"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"new_list = [1, 2, 3, 4, 5, 10, 12, 4, 14]\n",
"\n",
"with open('./new_list.pkl','wb') as f:\n",
" pickle.dump(new_list, f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open('./new_list.pkl','rb') as f:\n",
" new_list_2 = pickle.load(f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"type(new_list_2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5, 10, 12, 4, 14]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"new_list_2"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}