EDA of Stock Market using Time Series

Usharbudha Dev
6 min readJun 21, 2020

--

EDA(Exploratory Data Analysis) is an approach for analyzing datasets to summarize their main characteristics, often with visual methods. Here, I’ve done the analysis of stocks of top tech companies to see how Covid-19 has affected them.

Time Series is a series of data points indexed in time order. Whereas, Time Series Forecasting is the use of a model to predict future values based on previously observed values. (Here, I am just visualising the stocks and not predicting the future value)

I’ve used Python for the analysis, so let’s jump into the code,

Importing the packages required for reading the data from Yahoo finance,

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style(‘whitegrid’)
%matplotlib inline
from pandas_datareader.data import DataReader
from datetime import datetime

Taken stocks of 4 tech companies (Apple, Microsoft, Google and Amazon) for EDA using Time Series. Stock data has been taken from Yahoo finance.

# The tech stocks I’ll be using are Apple, Google, Microsoft and Amazon
tech_list = [‘AAPL’, ‘GOOG’, ‘MSFT’, ‘AMZN’]

end = datetime.now()
start = datetime(end.year — 1, end.month, end.day)

#For loop for getting the stock data
for stock in tech_list:
globals()[stock] = DataReader(stock, ‘yahoo’, start, end)

Made a Data Frame using data from yahoo finance and arranged them in the following order, ‘Date’, ‘High’, ‘Low’, ‘Open’ ,‘Close’, ‘Volume’, ‘Adj. Close’, ‘Company_name’. * Volume is the number of shares that changed hands (Traded) during a given day. * Adj closing price (used to examine historical returns) factors in anything that might affect the stock price after the market is closed.

company_list = [AAPL, GOOG, MSFT, AMZN]
company_name = [“APPLE”, “GOOGLE”, “MICROSOFT”, “AMAZON”]

for company, com_name in zip(company_list, company_name):
company[“company_name”] = com_name

df = pd.concat(company_list, axis=0)
df.sample(10)

Output of the DataFrame

# Described Apple’s stock data
AAPL.describe()

# Information regarding Apple’s stock
AAPL.info()

# To see a historical view of the closing price. For historical view I’ll be using Adj. Closing price

plt.figure(figsize=(12, 8))
plt.subplots_adjust(top=1.25, bottom=1.2)

for i, company in enumerate(company_list, 1):
plt.subplot(2, 2, i)
company[‘Adj Close’].plot()
plt.ylabel(‘Adj Close’)
plt.xlabel(None)
plt.title(f”{tech_list[i — 1]}”)

Above charts shows the adj. closing price of all 4 companies from 1st April 2019 to 30th March 2020. It can be observed that, all 4 companies fell drastically from start of march 2020 (Due to Covid-19). It can be seen that, Google and Amazon prices fell the lowest in mid- March as compared to Apple and Microsoft.

# For plotting the total volume of stock being traded each day
plt.figure(figsize=(12, 8))
plt.subplots_adjust(top=1.25, bottom=1.2)

for i, company in enumerate(company_list, 1):
plt.subplot(2, 2, i)
company[‘Volume’].plot()
plt.ylabel(‘Volume’)
plt.xlabel(None)
plt.title(f”{tech_list[i — 1]}”)

Above chart shows the volume of shares of the tech companies from 1st April 2019 to 30th March 2020.

Calculating Moving Average with lag of 10, 20, 50 days.
#Moving Average
ma_day = [10, 20, 50]

for ma in ma_day:
for company in company_list:
column_name = f”MA for {ma} days”
company[column_name] = company[‘Adj Close’].rolling(ma).mean()

print(GOOG.columns)

df.groupby(“company_name”).hist(figsize=(12, 12));

Below is the subplot of tech companies with moving average of 10 days, 20 days and 50 days,

fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(8)
fig.set_figwidth(15)

AAPL[[‘Adj Close’, ‘MA for 10 days’, ‘MA for 20 days’, ‘MA for 50 days’]].plot(ax=axes[0,0])
axes[0,0].set_title(‘APPLE’)

GOOG[[‘Adj Close’, ‘MA for 10 days’, ‘MA for 20 days’, ‘MA for 50 days’]].plot(ax=axes[0,1])
axes[0,1].set_title(‘GOOGLE’)

MSFT[[‘Adj Close’, ‘MA for 10 days’, ‘MA for 20 days’, ‘MA for 50 days’]].plot(ax=axes[1,0])
axes[1,0].set_title(‘MICROSOFT’)

AMZN[[‘Adj Close’, ‘MA for 10 days’, ‘MA for 20 days’, ‘MA for 50 days’]].plot(ax=axes[1,1])
axes[1,1].set_title(‘AMAZON’)

fig.tight_layout()

Below I’ve plotted the percent change of daily returns using pct_change() on Adj. close column,
# I’ve used pct_change to find the percent change for each day
for company in company_list:
company[‘Daily Return’] = company[‘Adj Close’].pct_change()

# To plot the daily return percentage
fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(8)
fig.set_figwidth(15)

AAPL[‘Daily Return’].plot(ax=axes[0,0], legend=True, linestyle=’ — ‘, marker=’o’)
axes[0,0].set_title(‘APPLE’)

GOOG[‘Daily Return’].plot(ax=axes[0,1], legend=True, linestyle=’ — ‘, marker=’o’)
axes[0,1].set_title(‘GOOGLE’)

MSFT[‘Daily Return’].plot(ax=axes[1,0], legend=True, linestyle=’ — ‘, marker=’o’)
axes[1,0].set_title(‘MICROSOFT’)

AMZN[‘Daily Return’].plot(ax=axes[1,1], legend=True, linestyle=’ — ‘, marker=’o’)
axes[1,1].set_title(‘AMAZON’)

fig.tight_layout()

Below I’ve plotted the same percent change of daily returns but using distplot() from seaborn package to get a better view. It gives a quick view at a univariate distribution of a stock.

# Creating a new DataFrame for Closing prices
closing_df = DataReader(tech_list, ‘yahoo’, start, end)[‘Adj Close’]
closing_df.head()

# Making a new DataFrame for returns
tech_rets = closing_df.pct_change()
tech_rets.head()

Comparing google to itself using jointplot() from seaborn package, this shows the perfect linear relationship,

# Comparing Google to itself should show a perfectly linear relationship
sns.jointplot(‘GOOG’, ‘GOOG’, tech_rets, kind=’scatter’, color=’seagreen’)

Below I’ve used jointplot() to see the relationship between Google and Microsoft on the basis of daily returns. We can observe that both the stocks are linearly correlated to each other

# To compare the daily returns of Google and Microsoft
sns.jointplot(‘GOOG’, ‘MSFT’, tech_rets, kind=’scatter’)

Below I’ve used pairplot() for comparison between all 4 companies. It shows how each tock is related to each other. We can say that every tech company related to each other. If stock price of one company increases then it will affect other companies stocks.

sns.pairplot(tech_rets, kind=’reg’)

For below diagram, I’ve used PairGrid() function from seaborn package. In this I’ve used scatter plot from matplotlib package for upper triangle of the matrix and for lower triangle we’ve used kdeplot from seaborn package. This chart is same as pairplot() chart but the only difference is that I’ve used different charts to represent the relationship between the stocks.

return_fig = sns.PairGrid(tech_rets.dropna())

return_fig.map_upper(plt.scatter, color=’purple’)

return_fig.map_lower(sns.kdeplot, cmap=’cool_d’)

return_fig.map_diag(plt.hist, bins=30)

In the below chart, I’ve again used the PairGrid() for visualisation but this time I’ve taken closing price instead of daily returns,

returns_fig = sns.PairGrid(closing_df)

returns_fig.map_upper(plt.scatter,color=’purple’)

returns_fig.map_lower(sns.kdeplot,cmap=’cool_d’)

returns_fig.map_diag(plt.hist,bins=30)

Below I’ve used a heatmap() from seaborn package to see the correlation between daily returns of the stocks from numerical point of view,

# Correlation plot for the daily returns
sns.heatmap(tech_rets.corr(), annot=True, cmap=’summer’)

Below I’ve used a heatmap() from seaborn package to see the correlation between closing price of the stocks from numerical point of view

# Correlation plot for the closing price
sns.heatmap(closing_df.corr(), annot=True, cmap=’summer’)

In conclusion, on the basis of returns I’ve made a graph of Expected returns v/s Risk. On the x-axis is Expected Returns and on the y-axis is Risk. I calculated the mean and standard deviation of the returns. So, from the chart it can be interpreted that

· Google will give less returns but its more riskier

· Amazon will give more returns and is less riskier

· Microsoft will give more returns but its risk is greater

· Apple will give the maximum returns compared to others but it’s also the most riskier stock.

rets = tech_rets.dropna()

area = np.pi*20

plt.figure(figsize=(12, 10))
plt.scatter(rets.mean(), rets.std(), s=area)
plt.xlabel(‘Expected return’)
plt.ylabel(‘Risk’)

for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
plt.annotate(label, xy=(x, y), xytext=(50, 50), textcoords=’offset points’, ha=’right’, va=’bottom’,
arrowprops=dict(arrowstyle=’-’, color=’blue’, connectionstyle=’arc3,rad=-0.3'))

--

--

Usharbudha Dev
Usharbudha Dev

Written by Usharbudha Dev

Currently pursuing Masters in Data Science from NMIMS University

No responses yet