Archive for March 31st, 2008

Massive Stock Datasets

Monday, March 31st, 2008

When data-mining, the first step is to obtain the data that you would like to mine. I have decided that I would like to try my hand at play­ing the stock mar­ket so it became nec­es­sary for me to obtain his­tor­i­cal stock mar­ket data. To that end, I have devised a method to obtain end of day results for every list­ing on NYSE, AMEX and NASDAQ since their incep­tion. The data is in the process of being assem­bled and I expect it to be com­plete within a few days. Current esti­mates expect the data to take up approx­i­mately 2GB, mak­ing it the largest sin­gle dataset that I have ever played with. Just hav­ing this much data makes my data hoard­ing senses tingle.

I’ll prob­a­bly spend a lit­tle bit of time putting the data into an easy to under­stand and use for­mat and then I’ll start look­ing for pat­terns. I’m hop­ing to throw my mod­el­ing back­ground and expe­ri­ence at the stock mar­ket to see if I can’t beat the sys­tem. If I can beat the stock mar­ket and make bajil­lions of dol­lars (or euro if the dol­lar col­lapses) that would be pretty sweet but if I don’t, at the very least, I expect to have fun play­ing with lots and lots of numbers.

As a sec­ond approach, since it turns out to be rather dif­fi­cult to get this sort of data in the first place, I’m half con­sid­er­ing the idea of clean­ing it up a bit and then reselling it myself.