Sep 26, 2021

# Introduction

In this notebook we will focus on exploratory data analysis of banks stock prices. We will use pandas to directly read data from Yahoo Finance. The main objective is to show step-by-step how to analyze and visualize different features from the dataset to have a better understanding of the bank industry and how it behaves.

We will focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2020.

The following questions will be answered throughout the Notebook:

• What is the max Close price for each bank’s stock throughout the time period?
• On what date did Citigroup stock reach its highest price?
• Why does the first row have NaN values?
• Is there a stock that stands out?
• Did anything significant happen on 2009-01-20?
• Which stock would you classify as the riskiest over the entire time period?
• Which would you classify as the riskiest for the year 2015?

# Data

• Stock data from Jan 1st 2006 to Jan 1st 2020.
• Six banks.
• 6 columns and 3523 rows.
• Source: Yahoo Finance.

We will get stock information for the following banks:

Feature Columns

• High: Is the highest price at which a stock traded during the course of the trading day.
• Low: Is the lowest price at which a stock traded during the course of the trading day.
• Open: Is the price at which a stock started trading when the opening bell rang.
• Close: Is the last price at which a stock trades during a regular trading session.
• Volume: Is the number of shares that changed hands during a given day.
• Adj Close: The adjusted closing price amends a stock’s closing price to reflect that stock’s value after accounting for any corporate actions. Factors in corporate actions, such as stock splits, dividends, and rights offerings.

We want to analyze the behavior of the stock price of these banks. We will get stock data from Jan 1st 2006 to Jan 1st 2020 for each of these banks. Then, set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This will involve a few steps:

1. Use datetime to set start and end datetime objects.
2. Figure out how to use datareader to grab info on the stock.

Documentation: Remote Data Access

# Exploratory Data Analysis

Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. In this case we are going to visualize and analyze the historical data of these banks and try to find relevant information.

What is the max Close price for each bank’s stock throughout the time period?

• The bank with the highest stock price is Citigroup.
Bank Ticker BAC 54.900002 C 564.099976 GS 273.380005 JPM 139.399994 MS 89.300003 WFC 65.930000 dtype: float64

On what date did Citigroup stock reach its highest price?

248

### Returns DataFrame

Now we are going to create a new empty DataFrame called returns. This dataframe will contain the returns for each bank’s stock.

Returns are typically defined by:

$$rt = \frac{p_t - p{t-1}}{p{t-1}} = \frac{p_t}{p{t-1}} - 1$$

We can use pandas pct_change() method on the Close column to create a column representing this return value. Then we can create a for loop that goes and for each Bank Stock Ticker creates this returns column and set’s it as a column in the returns DataFrame.

Why does the first row have NaN values?

• Our first value has NaN because you can not get a percent return on the very first day because there is nothing in the past to compare it to.

Let’s create a pairplot using seaborn of the returns dataframe.

Is there a stock that stands out?

<seaborn.axisgrid.PairGrid at 0x7fd74f18a450>

Using this returns DataFrame, we will figure out on what dates each bank stock had the best and worst single day returns. Notice that 4 of the banks share the same day for the worst drop.

Did anything significant happen on 2009-01-20?

• It can be seen that of the 6 banks, 4 had their worst performance on the same day.
• President Obama took office on Jan. 20, 2009. (More information here)
• The subprime mortgage crisis also had a mayor part in the decline of prices.
• Markets had little confidence in the economy and the future was uncertain.
• The banking sector in general declining by 30%.
• Bank of America Corporation (BAC) dropped 29%, and Citigroup Inc. (C) sank 20%.
• The S&P 500 and the Nasdaq took similar hits on inauguration day, dropping 5.3% and 5.8%, respectively.
BAC Return 766 C Return 793 GS Return 766 JPM Return 766 MS Return 697 WFC Return 766 dtype: int64

## Standard Deviation

Let’s take a look at the standard deviation of the returns.

Standard Deviation: Is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

Standard deviation is the statistical measure of market volatility, measuring how widely prices are dispersed from the average price. If prices trade in a narrow trading range, the standard deviation will return a low value that indicates low volatility. Conversely, if prices swing wildly up and down, then standard deviation returns a high value that indicates high volatility.

Basically, standard deviation rises as prices become more volatile. As price action calms, standard deviation heads lower.

Which stock would you classify as the riskiest over the entire time period?

• Looks like the two most riskiest stocks are Citigroup and Morgan Stanley.

Create a distplot using seaborn of the 2015 returns for Morgan Stanley.

• We can see that the distribution is pretty stable.

Create a distplot using seaborn of the 2008 returns for CitiGroup

• Notice that the standard deviation is actually more stretched out.
• If you look a normal year like 2015, a normal deviation is 0.06 and for CitiGroup in 2008 was 0.6. That is actually 10 times more deviation.

## Pearson correlation matrix

We use the Pearson correlation coefficient to examine the strength and direction of the linear relationship between two continuous variables.

The correlation coefficient can range in value from −1 to +1. The larger the absolute value of the coefficient, the stronger the relationship between the variables. For the Pearson correlation, an absolute value of 1 indicates a perfect linear relationship. A correlation close to 0 indicates no linear relationship between the variables.

The sign of the coefficient indicates the direction of the relationship. If both variables tend to increase or decrease together, the coefficient is positive, and the line that represents the correlation slopes upward. If one variable tends to increase as the other decreases, the coefficient is negative, and the line that represents the correlation slopes downward.

Let’s create a heatmap of the correlation between the stocks Close Price.

<AxesSubplot:title={'center':'Pearson Correlation Matrix'}, xlabel='Bank Ticker', ylabel='Bank Ticker'>

Now we can use the same correlation of above to plot a clustermap.

<seaborn.matrix.ClusterGrid at 0x7fd74cc3f9d0>

# Financial Charts

Let’s create a line plot showing Close price for each bank for the entire index of time. We are going to show different ways of plotting the same information.

• Now we can clearly see the crash of Citigroup in 2008.
• Goldman Sachs also had a crash in 2008 but bounce back quite quickly after the recession.

## Returns Histograms

Documentation:Histograms in Python

We can also use plotly to show histograms. In this case we are showing the returns of each bank from 2018 to 2020.

## Candlestick Charts

Documentation: Candlestick Charts in Python

The candlestick chart is a style of financial chart describing open, high, low and close for a given x coordinate (most likely time). The boxes represent the spread between the open and close values and the lines represent the spread between the low and high values. Sample points where the close value is higher (lower) then the open value are called increasing (decreasing). By default, increasing candles are drawn in green whereas decreasing are drawn in red.

Let’s plot Bank of America stock price in Candlestick format.

## Bank Facet plot

Documentation: Time Series and Date Axes in Python

Facet plots, also known as trellis plots or small multiples, are figures made up of multiple subplots which have the same set of axes, where each subplot shows a subset of the data.

In this faceted area plot we can see all the banks separately.

## OHLC Charts

The OHLC chart (for open, high, low and close) is a style of financial chart describing open, high, low and close values for a given x coordinate (most likely time). The tip of the lines represent the low and high values and the horizontal segments represent the open and close values. Sample points where the close value is higher (lower) then the open value are called increasing (decreasing). By default, increasing items are drawn in green whereas decreasing are drawn in red.