Classification done at 2009-05-04T00:00:11.602436
The best symbol is HBC.
Scheduling trades for today.
Opening positions at 2009-05-04T09:29:30.100081
I am going to trade today.
action=buy~quantity=772~symbol=HBC~ordtype=Limit~price=36.65~expire=day~accountid=xxx
Waiting for order to fill...
772.0 shares filled at 36.65
action=sell~quantity=772.0~symbol=HBC~ordtype=Limit~price=37.38~expire=day~accountid=xxx
Closing positions at 2009-05-04T15:57:00.100089
We won today! All closed out.
I'm a fair way along writing the trade engine for arbit (what I call my statistical arbitrage program that's something like 2.5 years in the making (the name of which still makes me think of these)). I've started to realize that I'm writing a finite state machine, one which looks alarmingly like the one in TIBCO BusinessEvents (BE).
So, out of curiosity, I built up the state model using BE. I'm now thinking BE makes lot of sense for trade life cycle. I've pitched the idea before, but I think trying to write a trade engine from scratch has made it really obvious how valuable this is.

Now I just have to figure out how to do this in Python. I'm thinking for each trading day I create a new object and then progress it through with a bunch of flags. Yay for non scalable approaches!
I'm also starting to wonder if my one language to rule them all approach is a good idea. I use Python for most everything, with occasional calls into C for a couple libraries I need that don't exist in Python. Python interoperates natively with C, so this isn't a big deal. That said, it'd be nice to be able to call something mathy (Matlab, R, S+, even Mathematica), and there are about a billion things in Java that'd be nice to have.
I don't want to use Jython because I'm not convinced it has a future. The integration to the Ameritrade API is already as much code as my entire arbitrage program. Likewise, the cluster code is easily 3x the size of the arbitrage code. Then there's the market data scraper which is probably 5x the arbitrage code. Integration is hard... That said, I really like the idea of keeping it all in Python so this problem doesn't get worse.
import xml.dom.minidom
def xmltodict(xmlstring):
doc = xml.dom.minidom.parseString(xmlstring)
return elementtodict(doc.documentElement)
def elementtodict(parent):
child = parent.firstChild
if (child.nodeType == xml.dom.minidom.Node.TEXT_NODE):
return child.nodeValue
d={}
while child is not None:
if (child.nodeType == xml.dom.minidom.Node.ELEMENT_NODE):
try:
d[child.tagName]
except KeyError:
d[child.tagName]=[]
d[child.tagName].append(elementtodict(child))
child = child.nextSibling
return d
Updated: Fixed a bug that didn't like null nodes and another about whitespace, though I stole some code from an O'Reilly book for that.
import xml.dom.minidom
def xmltodict(xmlstring):
doc = xml.dom.minidom.parseString(xmlstring)
remove_whilespace_nodes(doc.documentElement)
return elementtodict(doc.documentElement)
def elementtodict(parent):
child = parent.firstChild
if (not child):
return None
elif (child.nodeType == xml.dom.minidom.Node.TEXT_NODE):
return child.nodeValue
d={}
while child is not None:
if (child.nodeType == xml.dom.minidom.Node.ELEMENT_NODE):
try:
d[child.tagName]
except KeyError:
d[child.tagName]=[]
d[child.tagName].append(elementtodict(child))
child = child.nextSibling
return d
def remove_whilespace_nodes(node, unlink=True):
remove_list = []
for child in node.childNodes:
if child.nodeType == xml.dom.Node.TEXT_NODE and not child.data.strip():
remove_list.append(child)
elif child.hasChildNodes():
remove_whilespace_nodes(child, unlink)
for node in remove_list:
node.parentNode.removeChild(node)
if unlink:
node.unlink()
Update 2: Someone already did this in the 2nd edition of the Python Cookbook. There's another one here too: http://code.activestate.com/recipes/116539/. So much for there only being one way to do something in Python...
from xml.dom.minidom import parse, Node
def xmltodict(filename):
doc = parse(filename)
return elementtodict(doc.documentElement)
def elementtodict(parent):
child = parent.firstChild
if not child:
return None
while child.nodeType == Node.TEXT_NODE and not child.data.strip():
child = child.nextSibling
if child.nodeType == Node.TEXT_NODE:
return child.nodeValue
d={}
while child is not None:
if (child.nodeType == Node.ELEMENT_NODE):
try:
d[child.tagName]
except KeyError:
d[child.tagName]=[]
d[child.tagName].append(elementtodict(child))
if len(d[child.tagName]) == 1:
d[child.tagName] = d[child.tagName][0]
child = child.nextSibling
return d
def elementtodict(parent):
child = parent.firstChild
if (not child):
return None
elif (child.nodeType == xml.dom.minidom.Node.TEXT_NODE):
val = child.nodeValue
try:
if '.' in val:
val = float(val)
else:
val = int(val)
except ValueError:
pass
return val
d={}
while child is not None:
if (child.nodeType == xml.dom.minidom.Node.ELEMENT_NODE):
try:
d[child.tagName]
except KeyError:
d[child.tagName]=[]
d[child.tagName].append(elementtodict(child))
child = child.nextSibling
for key, val in d.items():
if type(val) is list and len(val) == 1:
d[key] = val[0]
return d It turns out there are more reasonable options between professional brokers and scraping yahoo, google, and Ameritrade pages:
http://www.interactivebrokers.com/ibg/main.php
http://www.tdameritrade.com/tradingtools/partnertools/api_dev.html
I’ve been reading When Genius Failed, and it seems to misrepresent a lot of the theory it summarizes. The most egregious is the notion of equilibrium:
"An efficient market is a less volatile one (it has no Black Mondays) and, from day to day a less risky one. "
Black initially thought noise and information traders played a balancing act. In 1986, Black wrote:
"People who trade on noise are willing to trade even though from an objective point of view they would be better off not trading. Perhaps they think the noise they are trading on is information. Or perhaps they just like to trade.
With a lot of noise traders in the markets, it now pays for those with information to trade….
The information traders will not take large enough positions to eliminate the noise…
The noise that noise traders put into stock prices will be cumulative, in the same sense that a drunk tends to wander farther and farther from his starting point. Offsetting this, though, will be the research and actions taken by information traders. The farther the price of a stock gets from its value, the more aggressive the information traders will become. More of them will come in, and they will take larger positions. "
According to Mehrling’s biography of Black, his position continued to evolve, eventually to the position that noise is an integral part of equilibrium markets:
"…people were adopting trading strategies that increased price volatility by increasing buying pressure when prices rose and increasing selling pressure when prices fell…. Essentially given the increased demand, the ‘cost’ of portfolio insurance had to rise in order to equilibrate markets, and that meant that equilibrium mean reversion of asset prices had to rise… but mean reversion is not something that investors can readily observe, so for a while their behavior continued to reflect the historical lower rate of mean reversion. The result was that, as prices rose, investors miscalculated the degree to which expected return was falling. By October 19, enough investors had become aware of the state of affairs to calculate correctly, and prices fell until expected return was high enough that investors were willing to hold the existing quantities of stock."
The key point of this being that despite what numerous crap macro economics books might say, equilibrium and efficient markets can incorporate noise, and that noise can build in such a way to cause all manners of crashes. Given that Scholes and Merton were both at LTCM, I suspect this idea was well incorporated into their thought.
I like Black.
My model just died a flaming horrible death. This has happened before. My first love, BEA was devoured. Then CFC ate it. I recently got something about the shareholder suit for that one. It's funny, I don't think I ever lost money on CFC, even though I was long 10 or 15 times on the way down. Hurrah for volatility!
Yesterday my bucket was:
symbols=['C','CS','DB','GS','LEH','MER','RY','UBS','WB', 'HBC']
Happily, I'd been short a bunch of WB from Friday. Suffice to say my bills have all been paid today, and I now have a budget surplus, along with a cushion over that day trading limit for once.
Of course, looking at that list, you may notice a slight problem: namely that MER and LEH are for all intensive purposes gone. So, once again, it's time for a new model.

I'd like to avoid the bucket approach, since it requires a bunch of cross validation to pick the bucket, and then the bucket usually dies before too long. The banking one up there I picked something like 2 months ago. Not the most long lived model...
So, my new plan is to build games on ETFs. We'll see how this goes. First I've got to get rid of the GS I was short on today (whoops).
It's (mostly) good to be short! All my variables are growing inside a loop... too much Python lately.
I don't understand options. I'm looking at the for Lehman Brothers right now. The underlying stock was trading at 16.17 at close on Friday.
The option chain for puts expiring September 8 looks like:
| Symbol | Strike | Bid | Ask |
| +LYHUM | 19 | 3.700 | 3.800 |
| +LYHUL | 18 | 3.050 | 3.100 |
| +LYHUK | 17 | 2.470 | 2.510 |
| +LYHUJ | 16 | 1.990 | 2.030 |
| +LYHUC | 15 | 1.580 | 1.620 |
| +LYHUG | 14 | 1.230 | 1.290 |
| +LYHUS | 13 | 0.970 | 1.000 |
My first problem with this is the symbols. They are supposedly formed something like: optionSymbol = "+" + stockSymbol + monthCode + strikePriceCode
So, let's decipher "+LYHUM." The stock symbol is "LYH." The month code is "U." The strikeprice code is "M."
Of course, the actual symbol for Lehman Brother on the NYSE is "LEH." Part of the problem is that NASDAQ ticker symbols are all four characters, while options symbols have three for the stock, so each NASDAQ symbol must lose a character. Of course, this doesn't explain why the venerable Lehman Brothers didn't get to have "LEH."
"U" means this is a put expiring in September. While this doesn't make a lot of sense, at least the convention is consistent across options symbols.
"M" is the code for the strike price. It happens to mean 19. This is based on what the price of the underlying stock was when the option was created. Of course, there are only 26 letters, and for very volatile stocks you run out of letters. This is why some option have multiple stock symbols mapping back to the same underlying stock symbol. For instance, options on the terrifyingly volatile Ambac financial group (NYSE:ABK) start with "GIY," "YZL" or "ABK."
For more on this awful naming convention, which is apparently going away next year, wikipedia may be enlightening:
http://en.wikipedia.org/wiki/Option_symbol
http://en.wikipedia.org/wiki/Option_naming_convention
The bid and ask prices shown above are for 1 share. However, an option contract is actually for 100 shares. So, while I can't buy anything for $3.8, I can buy +LYHUM for $380 (plus miscellenaous fees). Owning one contract of +LYHUM entitles me to sell 100 shares of Lehman for $19 a share ($19*100=$1900 in total).
Because this is an American option, I can exercise it (make that sale described above) at any time before the expiration date. If it were a European option, I could only sell it on the expiration date. If it were an Asian option, I could only sell on the expiration date, subject to something about the average price over the life of the option.
Say I buy +LYHUM for the current asking price. This costs me $380. The current stock price is below the stike price, so I can exercise the option. That is, I buy 100 shares of Lehman Brothers for $16.17 each: $16.17*100=$1617. I then sell 100 shares of Lehman to someone for $19 each: $19*100=$1900. So, I've made $1900-$1617=$283 from exercising the option. Of course, the option cost me $380. So, I've lost: $380-$283=$97 from this little endevour.
So, now my question is: why on earth would I want to buy this option? That's more or less answered in the little graph I made, shown below:

Of course, this ignores all the complexities about the time value of money, but I'm still struggling with basics here. Basically, by buying a put for $380, you are expressing with certainty that the price of Lehman will dip below $15 before the thing expires on September 8th.
It looks like I can grab options data from Yahoo. Of course, historical data is unavailable, but the current day might be useful anyway.
Ideally, I'd like to model the option directly, as if were just an extremely volatile stock. A couple advantages to this:
(1) It's fairly odd, and not the prefered way of doing things.
(2) I wouldn't have to change anything I already have.
But, to do that I would need historic open, high, low, close data. It looks like I can buy that for a couple hundred dollars, but I would really need a server for that to be a viable approach.
So, I suppose I take the other approach: model the underlying stock, then pick options which are well priced according to my odd notion of volatility. This is going to be a pain though, because it involves a major rewrite. I also don't like it because I'm going to end up with something pretty close to Black-Scholes, which everyone is already using to value their options.
I tried to do something clever. First I was thinking I'd attach the motherboards to plywood, but Collin's experience said that was not so thrilling an idea. I tried to skewer them like little kebabs over my fan, but the contraption I built from bar stock and threaded rod would have fallen on the fan and been devoured.
Then I tried the Gulliver's travels approach: a thousand pieces of string holding the motherboards to the surface of the fan... but the motherboards started buckling under their load... so I went out and bought some cases... cheating, I know.
My power supplies glow blue Star Wars.
Biologists note that competition for food results in a stable evolutionary equilibrium characterized by multiple strategies. When competitors meet at a food site, they can either fight over the prize and risk injury – the "hawk" strategy – or withdraw and lose the food – the "dove" strategy. If every individual fights, a mutant who withdraws would eventually have a greater probability of procreating than the average fighter because of the risk of injury and the fact that only one fighter can win. (The dove occasionally finds uncontested food.) On the other hand, if every individual followed the dove strategy, a single fighter would gain a lot of food. The evolutionary equilibrium can be shown to involve either (a) part of the population always follows the hawk strategy and the complementary part follows the dove strategy or (b) every individual follows a randomized strategy, sometimes behaving as a hawk and sometimes as a dove. We can definitely rule out a world in which everyone follows the same fixed strategy.
The analogy to market efficiency is immediate: investors compete for the most "undervalued" asset. The hawk strategy is conducting security analysis. The dove strategy is passive investing: expending no effort on information analysis. Clearly, if everyone analyses securities, the benefits will be less than the costs. If everyone is passive, the benefits of analysis will be tremendous. The equilibrium is that some analyze, some don’t. Does it sound familiar? Note that the final equilibrium is characterized by a situation in which it is not worthwhile for the marginal passive investor to begin analyzing nor for the marginal active investor to cease conducting security analysis.
The Borg-Yahoo merger won't work. Here's why. It's like taking the two guys who finished second and third in a 100-yard dash and tying their legs together and asking for a rematch, believing that now they'll run faster.
I'm currently looking at Countrywide (CFC), the largest mortgage lender in the US which, unsurprisingly, has suffered quite a bit from the subprime mess.
According to the google, Countrywide has $14,818.45 million in equity with 576.02 million shares outstanding. Which gives a book value of $25.72.
But, quite a bit of that value is in mortgages which are never going to be collected on. Estimates vary. Countrywide CEO Mozilo claims that only 7% of Countrywide's loans in 2006 were subprime. The Times seems to think things are worse, but because articles go behind a paywall, I can't find the exact figures.
Anyway, Bloomberg thinks 13.4% of mortgages at the end of June were to borrowers likely to default. Let's round that up a bit and assume 20% default for all mortgages.
Then, the book value of Countrywide is only 20.58, but it's currently trading at $18.48.... so, it's trading below a very pessimistic book value. And, I'd assume that the stock is going to recover as the market does, scaling business back up.
Bank of America recently put $2 billion in as a loan at 7.25%, which is convertible to preferred stock at $18 as a safety measure. This makes me think that Bank of America's analysts believe CFC is worth considerably more than $18.
And I read somewhere that Richard Pzena is all mixed up in CFC now too. Though, I can find nothing with the google to substantiate that.
Smer?
It appears that past performance is, in fact, an indicator of future performance.
If there isn't a bug, this is a happy moment. I'm thinking there's a bug.

what is it you do at work exactly?

for that matter, what exactly do I do?
anyone have a linux server with a speedy internet connection and a gig or so of free space that they want to give me a shell account on?
I think at the height it was even more insane than that. I may just get a box at godaddy if no one has anything. Seems weird that none of us are dorky enough to have static IPs and servers any more... or at least a box at some hosting company with shell accounts...
This picture has 7 alone, and I'm pretty sure there were a few in the closet too.
The unprincipled approach to machine learning: use algorithms you've never even heard of in weka on your 100mb of stock data...
I'm having another one of those "Unless I've fucked up, I'm going to retire to Kopipi" moments. Let it stay.
Wow... that one only lasted a minute before I realized what I did wrong.
Anyone know how to invoke Matlab from the command line without bringing up the GUI? It seems to have something to do with the -automate flag, but I don't get it.
Right now I'm doing matlab -r computeBeta;exit
This brings up the Matlab window and closes it when done, but is kind of dumb. No one knows, do you? None of you use Matlab... boo.
Update: ...crap. Since it spawns off a new process, my Python keeps running happily while the Matlab runs and bad things happen. bah.
Weka seems to have gotten a hell of a lot cooler since I last used it (at least 3 years ago it seems). I suggest it for all you ML/data mining needs.
The notation here isn't quite right, but basically I'm asking:
For each time, what symbol maximizes:
p(highPrice(time,symbol)>openPrice(time,symbol)*1.02)
given:
openPrice(0,symbol),... openPrice(time-1,symbol)
closePrice(0,symbol),... closePrice(time-1,symbol)
lowPrice(0,symbol),... lowPrice(time-1,symbol)
highPrice(0,symbol),... highPrice(time-1,symbol)
?
Dear Lazyweb,
I really should have bought a bunch of books on financial modeling when I was in the Cambridge Press bookstore, but I'm dumb. I can't find the books I was looking at on Cambridge's website or on Amazon. I think I need to find a bookstore with a bunch of their books and hopefully great piles of Springer grad texts on math and finance as well so I can sift through and dig out what I need. Foyles was pretty good too, but once again, wrong continent.
These days I'm mostly in New York and Chicago. Surely one of these cities has such a bookstore...
I've already tried: NYU and the big Barnes and Noble textbook store. Columbia is later this afternoon.
Am I allowed to buy books at the UChicago coop?
Otherwise, I suppose I'm waiting until the next time I make it to Cambridge (for Cambridge Press) or Cambridge (for MIT press)...
Springer has a big office in or by the Flatiron, but I don't think it has books. Why is it so difficult to get arcane texts on mathematical finance?