Road to PhD: May 2010

After one term of no blogging, I'm still feeling lazy to write down stuff. Even today, as I gather of my will to write something, facing my huge window of a 48th floor condo with a view on Lake Ontario and the CN Tower, I really don't feel like putting in the pain to format the following math stuff in LaTeX...

For this work term, I had the choice between being a "Financial Software Developer" at Bloomberg, NY, or being a "Quantitative Financial Engineer" at TD Securities, TO, in the Equity Derivatives group. I ditched Bloomberg. Best choice ever.

The first day at TD Securities was pretty awful. Only one of my two bosses came in to work, and he was the one introducing me around. But since he was kind of socially awkward, I didn't get to know anybody really. Very quickly we sat down at our desks, with me not even knowing where the washrooms or the kitchen are. The environment is pretty nice. Not for anyone though. Completely open space on the trading floor with phones ringing, traders screaming, beeping here and there, I sat in front of my dual screens, probably 24 inch each, with two other screens for a Bloomberg terminal. My boss, W, introduced me to the whole software framework. Base code in C++, wrappers to Excel, Murex, Polaris, and I don't know what else. Without any background in "industrial coding", I look incredibly stupid listening to W and simply nodding. He showed me various APIs, version control with WinCVS, debugging and all those features with Visual C++, and WinMerge for differencing code. Holy crap that was information overload... Good thing he then talked to me about something slightly more recognizable after a few hours of me clicking around the codebase. He showed me the problem the last coop was working on maybe three weeks before I came in:

All valid correlation matrices are positive definite, but with inevitable errors in the data, or maybe someone fiddling with the data for experimentations, a "correlation matrix" doesn't always turn out to be positive definite. What do we do in those cases?

Since the set of correlations generated a non positive definite matrix, the last coop tried to find the maximal subset of correlations such that the resulting matrix was still positive definite. I didn't go through his code in depth, but it seems that he was using a Cholesky decomposition on a larger and larger set of correlations, and when it didn't work (A matrix is Cholesky decomposable iff it is positive semidefinite and symmetric) he would swap the last stock and tried another set, until he couldn't add more correlations to the matrix.

For the rest of the day I worked on this problem. I kept the idea of trying to find the maximal subset that still generates a positive definite matrix, but I found the way he was trying to find such subset was incredibly brute-force-stupid, and it wouldn't give any insight on "why" the matrix was not positive definite.

Second day comes. My other boss D, the one who interviewed me, arrives. He says hi and goes on working at his desk, next to W who is next to me. Great... I continue trying to familiarize myself with the code for another 3-4 hours, have lunch at my desk, and move on to the correlation problem.

After finding inspiration from teh internetz, and fiddling around with Matlab, I came up with an intuitive method.

Given a non positive definite correlation matrix S (symmetric, diagonals of 1, off-diagonals of (-1,1) but with at least one negative eigenvalue), we diagonalize it S = QLQ* where L is the diagonal matrix of real eigenvalues. This was possible because S is real symmetric. Let L' = L, except when L(i,i) is smaller than 0. For those cases we let L'(i,i) = eps. Now by construction, S' = QL'Q* is positive definite. Also, S' is the closest positive definite matrix in terms of the operator norm. Now we normalize S', a covariance matrix, to get S'' the new correlation matrix. Because of the normalization, S'' is not the closest valid correlation matrix to S, but it seems to be close enough when the S is not too badly behaved. There is an exchange between mathematical rigour and practicality. With this new valid correlation matrix S'', we look at the entry-wise difference squared with S, and plot it. This plot seems to show spikes of large differences in the rows/columns for certain stocks. Some correlations stand out by a large amount and are usually the "cause" for non positive definiteness. So we can take out those stocks associated with those problematic correlations and check if the resulting matrix is positive definite. What I have found experimentally is that by removing such stocks, the matrix really become positive definite. With a 90 stocks sample, 5 seem to show high spikes in the difference plot. Removing those 5 made the matrix positive definite, and re-adding any one of them made it non positive definite. Of course it doesn't show that this is the maximal subset generating a positive definite matrix, but it is a nice step away from being clueless, and this is definitely not a brute-force algorithm.

I arrived to this point around 4pm on Tuesday and immediately informed D. After my short presentation, he seemed very satisfied and excited. He wanted me to code it up immediately in C++ and make a new function for Excel for the traders to use. So he spent the next two and a half hours setting up my workspace and showing me how to compile stuff and test it on Excel (wtf.. if I didn't come up with those results how would I even start working with nothing set up?). It was a good day.

Third day, I start to actually fiddle with the code, modifying stuff and learning how a big project is organized. I start writing my newly designed test and I obviously encounter a myriad of little problems, each of which teach me something new about programming. Weirdly enough, I had to write my own matrix multiplication function in the matrix class, and I had to find a diagonalization method online to copy it in our code (diagonalization is not as simple as it looks theoretically, especially when you try to account for time and memory efficiency, and all the itty-gritty details).

Thursday was great. Thursday May 6th 2010, greatest intraday drop in the DOW due to some error, apparently by some unknown trader entering billion instead of million. PG spiked down by 30%, and various other smaller stocks plummeted to less than one measly cent and bounced right back up. That trader probably lost his job and ended his career by now as the SEC decides to intervene by cancelling trades that made/lost more than 60%. I continue working on my code in the midst of traders almost screaming as loud as in the movies and telephones ringing endlessly. Magical experience.

Friday I finish coding with all the output formatting, documentation, clean up, etc. At the end of the day I do some proofreading on some methodology documentation. D tells me I'll present this to the traders next Tuesday. Cool.

It's raining outside now.

Road to PhD

Saturday, May 8, 2010

First week at TD Securities

Blog Archive