A first exploration of financial data

Some FAQs about financial data

Is it hard to find financial data?

Depends what you are looking for. If you want to find data on stocks (equities), then this is reasonably easy to get hold of. You can go to Google finance or to Yahoo finance for example, and you then have access to the prices of many different stocks around the world. There are some places you can get hold of foreign exchange (FX) data, such as Olsen but there is usually some restriction, such as having to register, or only being able to download a limited amount of data (500 days at Olsen), or having to pay for the data. Options data is essentially unobtainable without paying up front, or without an inside connection in the industry. Much the same is true of futures price data.

Is it hard to download stuff?

If you go to Google finance you can get hold of some data very rapidly. You first need to type the stock ticker (=identifier) into the search box (for example, "T" is the ticker for ATT), then clicking on the `Get quotes' button will take you to the page for that stock. On the left, there is a link to historical prices, which you click to get through to the historical data. You can then choose your data window, and download the data to a spreadsheet. You will get information on high, low, open and close prices, as well as the volume of trade. Yahoo Finance works similarly.

How do I know what is the stock ticker for what company?

You can get the tickers of the stocks from the SP500 index from Wikipedia by typing "S&P500" in the search box.

OK, so I have some data in a spreadsheet - what now?

You now need to load the data into some package where it is easy to work with it. There are at least two good (and completely free) alternatives here: one is Scilab, and the other is R. Both can be downloaded and installed in a matter of minutes. Scilab is a general mathematical computational package similar to Matlab, and is in my view very easy to use. I find R a bit more clumsy, but it is well set up to handle standard statistical analyses, and copes with missing data and dates more efficiently than Scilab.

I've installed Scilab and R - how do I import and work on the data?

For now, let me just explain the use of Scilab, for which there is a worked example in DATAPAK. Download these files, and place them all in one directory. In Windows, you double-click on the Scilab icon, which opens a Scilab session. Go to the `File' button on the toolbar, click on this, and select `Change current directory ...' from the list. Then select the directory where the data and programs from DATAPAK have been placed. In Linux, you open an xterm, and cd to the directory where the files from DATAPAK have been put, and then you type the command `scilab' at the prompt. Either way, you now have an open Scilab window in the correct directory. The main script is called intro.sci. This contains a lot of comments (prefixed by the character string "//") as well as a small amount of code. You can read the script in whatever text editor you are used to, and reading this will get you started with the use of Scilab commands. The data in the file SomeSP.txt contains daily prices for ten years for 29 stocks from the S&P 500 index, arranged in a big array with 30 columns, the first of which contains the date (in terms of days counted from some arbitrary starting date), the rest containing the closing prices. There are also column headers to identify the different stocks; Scilab and R can both cope with these. Perhaps the simplest thing is just to type

exec intro.sci;

at the prompt in the Scilab window. This will execute all the commands in the script intro.sci; at various points, you will be offered self-explanatory choices, which you can work through. This first script steps you through some simple exploratory data analyses designed to answer some simple but natural questions about stock price data:

  • Are log-returns Gaussian, as they would be if the Black-Scholes model was correct?

  • Are log-returns stationary?

  • Are the tails of the distribution of log-returns polynomial?

  • Is the autocorrelation of log-returns consistent with a model with stationary independent increments?

    There is no formal statistical testing carried out here, but once you have seen the plots, you will be left in no doubt that the answer to all of the above questions is "No".

    If log-returns are not Gaussian, stationary,or independent, why do so many people use models which assume that they are?

    That's not so easy to answer. At some point, when I have the time to expand these notes, I will explain why it's not completely stupid.

    How could I make some money trading these stocks?

    This is a more interesting question! Well, you have the data, you have Scilab, now you could try to build some trading rule and see what it did for you. For example, you could simply buy-and-hold the stocks, splitting your wealth equally between them at time 0, and just sitting on that portfolio. Or you could try the 1/N strategy, where at the beginning of each day, you adjust your holdings of the different stocks so that the monetary values of your holdings of each stock are the same. You should find that this beats the buy-and-hold strategy pretty conclusively; I have not tried it on this data, but I would expect that this will give you a Sharpe ratio in the range 0.6 to 0.8. Don't forget to allow for transaction costs, which might amount to (say) 10 basis points ( = 1 part in 1000) on the change in your position.