correlation, correlation coefficient, Financial Market History, financial markets, Financial Media, Financial News, invest, investing, investing tips, math, mathematics, noise, statistics, time series, time series data
The first article of this three-part series covered the broad strokes of this issues to be aware of in terms of all the “data” and “relationships” that get thrown around by the financial media (print and television). Most of the discussion uses data points that are not statistically significant to draw any sort of conclusion. In fact, time series data is notoriously hard to model and predict the future. Additionally, the specific time series data of stock market returns is even more difficult.
You can refer to the link below to examine the content of the first article:
The task at hand for the second article is to put some “meat on the bones” of the discussion. I realize that anything to do with math and statistics is not easy for everyone (or of interest either). Therefore, I will be writing a supplemental article that covers the mathematics and statistics in more detail. The goal here is to be able to identify some of the more common errors that you will encounter.
The first item to talk about is any sort of data that has a substantial trend component. In layman’s terms, there is a data series where the line graph goes up or down in more of a straight-line manner. You can think of the Gross Domestic Product (GDP) of the United States here. Every year the GDP figures will generally go up unless there is a recession. But, even after the recession passes, the trend for GDP will resume upward. So, where does the problem come in?
I am going to give a contrived example to illustrate why it is dangerous to compare two series that are trending. The example will consist of two different equations which are trends. Both have the same trend component and an error term (we will call that eta). The variables will be exactly the opposite. More specifically, the two equations we will use are the following:
Trend_1 = Time + 100 + 0.9 * x + eta
Trend_2 = Time +100 – 0.9 * x + eta
Now the x values and eta values were simply generated by selected variables at random between 0 and 1. The eta values were also selected at random between 0 and 1. You can think of eta as representing the general “noise” that occurs on a daily basis when observing stock prices in the financial markets. So, let’s graph the first 100 observations for these two equations:
You will notice that the trend component dominates the line graphs. However, we know by construction that the two equations which produce trend_1 and trend_2 are fundamentally different. Now the correlation coefficient between those two equations is 0.9984. A correlation coefficient of 1 means that the two lines move in lockstep. Why is this important? Why is it very dangerous?
Well, financial pundits will talk about these types of graphs all the time. It looks like there is some relationship, but we know there is very little relationship between the two trends. In fact, we can look at these equations by subtracting the current value from the previous value to see what changes. Formally, this topic is called first differencing. It will allow us to see more clearly what we already know. Here is the graph:
Now we have a totally different picture. We can see that at many times the two trend equations are moving in exactly the opposite direction. In fact, the correlation coefficient for the first-differenced equations is 0.2675. There is only a slight positive relationship between the two trends.
In the example above, we can see that looking at the two trends is very deceiving. Remember that I added the eta term to represent “noise” that is always present in financial market data. So, anytime someone talks to you about the comparison of two trends, you should be very skeptical. You always want to see first-differenced data or at least a comparison of changes in some manner. Otherwise, you will mistakenly assume that there is a strong positive or negative relationship between two time series.
The second example that I am going to use is stock market returns for the S&P 500 Index from 1966 through 2018. Why start at 1966? Well, the S&P 500 Index started with its current number of component stocks back in 1957, and I would like to show annual stock returns and also ten-year annualized returns. This particular topic can get messy quite quickly, so I am not going to cover it in a lot of depth with statistical and mathematical jargon. For those of you who are interested, I had mentioned that it will be contained in a forthcoming supplemental article.
A great many individuals in the financial markets talk about stock market returns in the same breath as the normal distribution. What is the normal distribution? It is the old bell curve that you are familiar with. The normal distribution is symmetrical and tails off at the end as more and more data points are gathered. Well, stock market returns are anything but strongly normal.
Let’s first take a look at one-year stock market returns for the S&P 500 Index.
A useful test to see if a particular distribution is normal is the Jarque-Bera test. Now it is not necessary to know exactly what is being calculated. However, you should refer to the bottom of the box that reads “Probability”. The value of 0.179 is called a p-value. A p-value less than or equal to 0.10 means that we can reject the hypothesis that the one-year distribution of stock returns is normal. At a value of 0.179, we would not reject the hypothesis of normality for this distribution. However, the p-value in our case is not large enough to be totally sure and confident. But what about looking at annualized stock market returns over ten-year periods?
We can look at a similar graph to check to see if stock market returns over longer timeframes are indeed akin to the normal distribution (i.e. the bell curve). Here is the graph:
Looking at the same “Probability” value, we have 0.489. Therefore, we cannot reject the hypothesis that these stock market returns follow a normal distribution. Looking at ten-year annualized stock market data tells us that we can use the normal distribution as an assumption for calculating statistics.
Now why does this matter? Well, you will here over and over again statistics that apply only to the normal distribution in relationship to actual, observed stock market returns. We have just seen that stock market returns over the short-term stock returns weakly follow the normal distribution. On the other hand, long-term stock returns are definitely normal. Now I will not get into the technicalities, but time series data is indeed asymptotically normal. What? Say again?
This is just a fancy way of saying that, as the number of data points (sample size) approaches infinitely, the time series will look like the normal distribution. Pretty much all financial market and economic data have very few data points. In fact, you usually need several hundred data points prior to making any assumptions and using the statistics related to the normal distribution (think standard deviation or correlation coefficients).
Thus, most of the banter in the financial media is just subjective notions of what is going on in the stock market and the economy. More often than not, an assertion by someone in the financial print or television media is more of an educated guess than based on a solid mathematical foundation. That fact explains why financial pundits hedge their statements. Like I say half-jokingly, “I see the stock market going up in the next several months, but of course it might not resume its uptrend or could even take a leg downward”.
Well yes, I guarantee you that every day stocks will go up, down, or remain unchanged. This type of daily commentary in the financial press about the short-term performance of stocks (or other financial assets) is just not helpful and can be downright distracting you from investing for your long-term financial goals.
I apologize for getting too detailed in certain parts of this article. What are the key takeaways? First, you should be extremely leery of drawing any conclusions from the comparison of two or more data series that are trending upward or downward. Second, you need to have several hundred observations prior to invoking any reference to the normal distribution. So, what is left after that? As you might imagine, there are not too many comparisons or studies that pass the muster to give you insights on investing or actionable information to make changes to your investment portfolio.
Don’t focus on the mathematics or statistics. All you need to remember are the two takeaways above. And, first and foremost, you should always be skeptical whenever you are presented with comparisons and statistics related to the financial market or the economy as a whole.