{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ファイルの読み書き\n",
    "\n",
    "## 絶対パスと相対パス\n",
    "\n",
    "一般的なファイル構造\n",
    "\n",
    "Mac/Linuxの場合\n",
    "\n",
    "<img src=\"https://www.yuyashibuya.com/files/studygroup/Day2_path_mac.png\" width =\"80%\" >\n",
    "\n",
    "Winodowsの場合\n",
    "\n",
    "<img src=\"https://www.yuyashibuya.com/files/studygroup/Day2_path_win.png\" width =\"80%\" >\n",
    "\n",
    "### 絶対パス（absolute path）\n",
    "- ルートからの道順(パス)を指定する方法\n",
    "- 厳密なルートで間違いが少ない\n",
    "- 長くなりがち\n",
    "\n",
    "* Windowsの例\n",
    "    * C:¥Users¥username¥Desktop¥Folder1¥test.txt\n",
    "* Mac/Linuxの例\n",
    "    * /Users/username/folder1/test.txt\n",
    "\n",
    "### 相対パス（relative path）\n",
    "- 基準となるディレクトリ（カレントディレクトリ）からの道順(パス)を指定する方法\n",
    "\n",
    "`..` で１階層上のディレクトリ、`.`で同じディレクトリ\n",
    "\n",
    "<img src=\"https://www.yuyashibuya.com/files/studygroup/Day2_absolute_path.png\" width =\"80%\" >"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "現在の場所（カレントディレクトリ）は、`%pwd`で確認できます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%pwd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "現在の場所（カレントディレクトリ）にあるファイルをリストアップするためには`%ls`で確認できます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%ls"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## pandasを用いたcsvの読み書き"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "      <th>num</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100.0</td>\n",
       "      <td>5</td>\n",
       "      <td>2018-06-01 00:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>40.0</td>\n",
       "      <td>2</td>\n",
       "      <td>2018-06-27 03:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>300.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2018-07-23 06:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>2018-08-18 09:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>500.0</td>\n",
       "      <td>4</td>\n",
       "      <td>2018-09-13 12:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>200</td>\n",
       "      <td>2018-10-09 15:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>300.0</td>\n",
       "      <td>7</td>\n",
       "      <td>2018-11-04 18:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>400.0</td>\n",
       "      <td>19</td>\n",
       "      <td>2018-11-30 21:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>240.0</td>\n",
       "      <td>20</td>\n",
       "      <td>2018-12-27 00:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>3000.0</td>\n",
       "      <td>100</td>\n",
       "      <td>2019-01-22 03:00:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    price  num            datetime\n",
       "0   100.0    5 2018-06-01 00:00:00\n",
       "1    40.0    2 2018-06-27 03:00:00\n",
       "2   300.0    1 2018-07-23 06:00:00\n",
       "3     NaN    0 2018-08-18 09:00:00\n",
       "4   500.0    4 2018-09-13 12:00:00\n",
       "5  1000.0  200 2018-10-09 15:00:00\n",
       "6   300.0    7 2018-11-04 18:00:00\n",
       "7   400.0   19 2018-11-30 21:00:00\n",
       "8   240.0   20 2018-12-27 00:00:00\n",
       "9  3000.0  100 2019-01-22 03:00:00"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "price = [100, 40, 300, np.nan , 500, 1000, 300, 400, 240, 3000]\n",
    "num = [5, 2, 1, 0, 4, 200, 7, 19, 20, 100]\n",
    "datetimes = pd.date_range('20180601', periods=10, freq= '627H')\n",
    "\n",
    "df = pd.DataFrame({'price':price, 'num': num, 'datetime': datetimes })\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('./testcsv.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ちゃんと保存されているか、確認してみましょう。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%ls "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### pandasを用いたcsvの読み込み"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "保存したcsvファイルをDataFrameとして読み込む"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "      <th>num</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100.0</td>\n",
       "      <td>5</td>\n",
       "      <td>2018-06-01 00:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>40.0</td>\n",
       "      <td>2</td>\n",
       "      <td>2018-06-27 03:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>300.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2018-07-23 06:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>2018-08-18 09:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>500.0</td>\n",
       "      <td>4</td>\n",
       "      <td>2018-09-13 12:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>200</td>\n",
       "      <td>2018-10-09 15:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>300.0</td>\n",
       "      <td>7</td>\n",
       "      <td>2018-11-04 18:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>400.0</td>\n",
       "      <td>19</td>\n",
       "      <td>2018-11-30 21:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>240.0</td>\n",
       "      <td>20</td>\n",
       "      <td>2018-12-27 00:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>3000.0</td>\n",
       "      <td>100</td>\n",
       "      <td>2019-01-22 03:00:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    price  num             datetime\n",
       "0   100.0    5  2018-06-01 00:00:00\n",
       "1    40.0    2  2018-06-27 03:00:00\n",
       "2   300.0    1  2018-07-23 06:00:00\n",
       "3     NaN    0  2018-08-18 09:00:00\n",
       "4   500.0    4  2018-09-13 12:00:00\n",
       "5  1000.0  200  2018-10-09 15:00:00\n",
       "6   300.0    7  2018-11-04 18:00:00\n",
       "7   400.0   19  2018-11-30 21:00:00\n",
       "8   240.0   20  2018-12-27 00:00:00\n",
       "9  3000.0  100  2019-01-22 03:00:00"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "r_df = pd.read_csv('testcsv.csv', index_col = 0)\n",
    "r_df"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 参考　pandasを用いた様々なファイル形式の読み書き"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pandas`を用いたcsvの読み書きは以前紹介しましたが、その他の形式のファイルの読み書きもpandasでは行えます。\n",
    "\n",
    "いくつか代表的なものを紹介します。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> `read_csv`: 区切り文字で区切られたデータを読み込む<br>\n",
    "> `read_excel`:　ExcelのXLSやXLSXファイルからデータを読み込む<br>\n",
    "> `read_json`:　JSON(JavaScript Object Notation)の文字列表現からデータを読み込む<br>\n",
    "> `read_pickle`:　Pythonのpickleバイナリ形式で書き出されたオブジェクトを読み込む<br>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 250 entries, 0 to 249\n",
      "Data columns (total 7 columns):\n",
      " #   Column  Non-Null Count  Dtype  \n",
      "---  ------  --------------  -----  \n",
      " 0   Date    250 non-null    object \n",
      " 1   GOOG    250 non-null    float64\n",
      " 2   AAPL    250 non-null    float64\n",
      " 3   META    250 non-null    float64\n",
      " 4   AMZN    250 non-null    float64\n",
      " 5   NFLX    250 non-null    float64\n",
      " 6   TSLA    250 non-null    float64\n",
      "dtypes: float64(6), object(1)\n",
      "memory usage: 13.8+ KB\n"
     ]
    }
   ],
   "source": [
    "r_df = pd.read_csv('testcsv.csv', index_col = 0)\n",
    "r_df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r_df.to_pickle('./test.pkl')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "DatetimeIndex: 250 entries, 2022-04-04 to 2023-03-31\n",
      "Data columns (total 6 columns):\n",
      " #   Column  Non-Null Count  Dtype  \n",
      "---  ------  --------------  -----  \n",
      " 0   GOOG    250 non-null    float64\n",
      " 1   AAPL    250 non-null    float64\n",
      " 2   META    250 non-null    float64\n",
      " 3   AMZN    250 non-null    float64\n",
      " 4   NFLX    250 non-null    float64\n",
      " 5   TSLA    250 non-null    float64\n",
      "dtypes: float64(6)\n",
      "memory usage: 13.7 KB\n"
     ]
    }
   ],
   "source": [
    "r_df_pkl = pd.read_pickle('./test.pkl')\n",
    "r_df_pkl.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## pickleの読み書き"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "new_list = [1, 2, 3, 4, 5, 10, 12, 4, 14]\n",
    "\n",
    "with open('./new_list.pkl','wb') as f:\n",
    "    pickle.dump(new_list, f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('./new_list.pkl','rb') as f:\n",
    "    new_list_2 = pickle.load(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "list"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "type(new_list_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 4, 5, 10, 12, 4, 14]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "new_list_2"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}