Friday secrets

"Harry, I’m going to let you in on a little secret. Every day, once a day, give yourself a present. Don’t plan it, don’t wait for it, just let it happen. It could be a new shirt at the men's store, a catnap in your office chair or two cups of good hot black coffee."

2013-03-08 18.14.00

The most awesome terminal bash command!

So Matlab has this really nice feature that if you type the start of a command in the terminal window and then press the up button you can browse through previous commands containing that text. This is a really nice feature when changing commands often. The terminal should really have this feature it would make things much less unwieldy. Thankfully it exists! After some search I found this post. It turns all you have to do is add the following lines to your ~/.inputrc file(if it don't exist create it)

## arrow up
## arrow down

Restart the terminal. Now enter for example cd and press the up and down arrows to browse through your command line history on that command. Pretty awesome.

A short explanation / tutorial of the Kalman filter

In my list of methods to explain I have now come to the Kalman filter. This is a short summary of the tutorial of Welch and Bishop which can be found here. I have been reading up on pose recognition and a lot of people seem to be using it to track parts moving across the screen, so I needed to learn a tiny bit about it. The basic idea of the Kalman filter is that \(x\) depends on its previous value, \(x_{t-1}\) plus some control input from the previous step, \(u_{t-1}\) and some noise from the previous step, \(w_{t-1}\) in a linear way. This can be written as:

\(x_t = A x_{t-1} + B u_{t-1} + w_{t-1}\)

where \(A\) and \(B\) describe the dependencies / differences between the new state and the previous state and control input. We also have some measurements of the \(x\) variable which we call \(z\). \(z\) is a linear function of \(x\) plus some noise and we write it as

\(z_{t} = H x_{t} + v_{t}\)

The two noise terms, the process noise and measure noise is assumed to be normally distributed:

\(w\sim N(0,Q)\)
\(v\sim N(0,R)\)

Having determined our variables, and what kind of assumptions that are needed, we need to understand how our measurement, \(z\) can help our estimate of \(x\).

We start out by defining the á priori and posteriori state error. If we let \(\hat{x_{t}^{-}}\) be the á priori estimate(before the measurement) and \(\hat{x_{t}}\) be the posteriori estimate(after the measurement) we can write the error in the estimates as follows

\(e^{-}_{t} = x_{t}-\hat{x_{t}^{-}}\)
\(e_{t} = x_{t}-\hat{x_{t}}\)

The error estimate covariance is now simply

\(P_{t}^{-} = E[e_{t}^{-}e_{t}^{-T}]\)
\(P_{t} = E[e_{t}e_{t}^{T}]\)

We can let our final estimate be a linear combination of the two, the á priori and the posteriori estimate. By doing that we have incorporated the extra knowledge gained by having access to \(z\). Our estimate or update equation is now

\(\hat{x}_t = \hat{x_{t}^{-}} + K (z_{t}-H\hat{x_{t}^{-}} )\)

The subtraction term is apparently called innovation or residual and basically it reflects our belief of the discrepancy between the á priori estimate and the measurement. \(K\) here is the gain or blending factor which influences how much influence each part has. The question is how do we choose \(K\)? The idea here is that we want to choose the \(K\) that minimizes the covariance of the posteriori estimate, by doing so we get the most reliable range estimate. To minimize the covariance we plug the linear combination into the covariance estimate, calculate the expectations and then take the derivative of the trace w.r.t \(K\) and setting to zero and then solving. Doing this we can get an expression for \(K\) that looks like this

\(K_t = P_{t}^{-} H^{T} ( H P_{t}^{-} H^{T} + R )^{-1}\)

By taking the limit of \(R\), the measurement noise covariance, when it goes to zero we get \(K = H^{-1}\), this means that gain weighs the residual more heavily since our measurements practically are noise free. So \(R\) says something about our belief about how good our measurements are. Now if the á priori estimate of the covariance \(P^{-}_{t}\) goes to zero then \(K\) also moves towards zero meaning that our á priori estimate is pretty much perfect.

That being said we can move on to the structure of the Kalman algorithm

Á priori estimates
\(\hat{x_t^{-}} = A \hat{x_{t-1}} + B u_{t-1} + w_{t-1}\)
\(P^{-}_{t} = A P_{t-1} A^{T} + Q\)

Posteriori estimates
\(K_t = P_{t}^{-} H^{T} ( H P_{t}^{-} H^{T} + R )^{-1}\)
\(\hat{x}_t = \hat{x_{t}^{-}} + K (z_{t}-H\hat{x_{t}^{-}} )\)
\(P_{t} = (I - K_{t} H )P^{-}_{t}\)

Here is a simple implementation of a Kalman filter: kalmansim

So to summarize the Kalman filter: the basic idea is that the current state is linearly dependent on the previous state plus some control sequence and that there is a correlation of the new state with a measurement we procured of that state. The estimate of the new state can be written as a linear combination of a á priori estimate and posteriori estimate of the state. That is basically all the assumptions needed to derive the equations above it seems. Thanks Kalman.

On payoff matrices, decision rules, confusion and affinity matrices

Simple post today. Just some definitions and explanations of stuff that is pretty familiar to most people.

Payoff / Loss Matrices
Payoff matrices describe the reward for choosing a particular action and loss matrix just describes the loss related to choosing a particular action given a certain state. The most famous payoff matrix I guess is the prisoners dilemma matrix. Two criminals are caught. If they keep quiet they both get say 5 years, but if one of them tells on the other he gets 3 years and the other 10. If both tell on each other then they both get 8 years. To visualize the possible outcomes one can write them in a matrix

(Criminal1, Criminal2) Keeps quiet Tells
Keep Quiet (5,5) Years (10,3) Years
Tell (3,10) Years (8,8) Years

It is easy to see from this matrix that the best strategy on average is to tell on the companion, this is called a decision rule. There are several other ways of choosing a decision rule depending on the wanted outcome. One problem with the payoff matrix is that it doesn't take into account the possible repercussions that might happen if you snitch on your friend. One other interesting perspective comes when one considers the regret instead of the maximum payoff. All of a sudden the strategy becomes a minimization of the regret of all the outcomes. In the prisoner dilemma that would imply that we would incorporate the dilemma of being labeled a snitch etc. The minimization thus reflects a more conservative strategy which seems more reasonable in the terms of investing decision etc.

In a machine learning setting the payoff matrix is usually extended to a payoff or a loss function that reflects nonlinear effects and that factors other effects as well i.e. sums of non-linear functions. The functions are usually based upon statistics of previously seen data. In Bayesian generative modeling the payoff matrix or function would be the posterior probability and the decision rule would be the maximum or the average of the posterior.

Affinity Matrix / Similarity Matrix
An affinity/similarity matrix is matrix that describes the relation between all objects in a set. The matrix is symmetric and the relation between the objects in the set is described by the affinity function. The affinity function can be any function that takes as input any two objects from the set. Typical functions could be the exponential of the Euclidian distance or the covariance i.e. usually some metric. The affinity can be though of as a complete graph where each matrix entry describe the relation between the nodes in the graph.

Confusion Matrix
From Wikipedia "In the field of artificial intelligence, a confusion matrix is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another)."

What is a Sensory Motor Map - SMM

In robotics one often deal with combining visual input and motor kinematics. A Sensory Motor-Map(SMM) is a map between perception and action such that the robot gets an understanding of how certain motor actions affect the perceived reality. Simply put given some visual input the map gives the kinematics and dynamics to generate the given visual. Here is the Wikipedia page I created, hoping it will not get deleted

This post is part of my explaining concepts that surface in my PhD.

Two neat LaTex tricks that will make life easier

1. Typesetting modularized LaTex files

So you modularized your really long tex file, using includes or input. You find it a bit annoying that every time you change a chapter you have to typeset the main file, you can't just press the typeset button on the chapter file. It turns out that if you just put the following on the first line in your sub chapter file

%!TEX root = ./main.tex

where root equals the relative path to your main.tex file, then when typesetting the sub chapter file TexShop will update the whole file. Pretty neat!

2. Removing all those pesky help files that Latex creates when typesetting

LaTex outputs a lot of annoying extra files when typesetting. Especially if you have many chapters in different files it can quite annoying trying to find the file you want to edit, especially if you are using MacOS Finder... There is a neat solution to this that can be found here. What you do is add the following option when typesetting, which will tell TexShop to output to a sub directory.


The line can be added under the engine tab in TexShop. The directory must already exist for it to work. And the if you are using pdflatex the pdf file will of course be in the subdirectory.

The Trace Trick for Gaussian Log Likelihood

Maybe you have seen something like this when observing the log likelihood derivations for multivariate Gaussians

\( \ln p(X|\mu, \Sigma) = \frac{1}{2}\ln|\Sigma|- \frac{1}{2}X^{T}\Sigma^{-1}X + const = \frac{1}{2}\ln|\Sigma| - \frac{1}{2}Tr(\Sigma^{-1}XX^{T}) + const \)

and you wondered where that \(Tr\) came from. Here you can find a great explanation but I thought I would write it down for myself as well.

The exponential term in the multivariate Gaussian can be rewritten with a trace term i.e.

\(x^{T}\Sigma^{-1}x = tr(\Sigma^{-1}) x x^{T})\)

This comes from two neat properties of the trace

  • \(tr(AB) = tr(BA)\) if all dimensions work out.
  • \(tr(c) = c\) when \(c\) is a constant i.e. a \(1\times1\) matrix.

Looking at the exponential term in the Gaussian we realize that it is just a \(1 \times 1\) matrix meaning that we can write

\( x^{T}\Sigma^{-1}x = tr(x^{T}\Sigma^{-1}x) = tr(\Sigma^{-1} x x^{T})\)

Calculating the indices for n choose k

So if you want to calculate n choose k, or n over k or all k different combinations of n unique items or binomial coefficients in Matlab or in math language

\[\frac{n!}{(n-k)! k!}\]

you can write


But lets say you want to generate a list of these possible combinations? Well then you just write n as a vector

n = [1, 2, 3, 4] = [1:4]

so this would generate using \(k = 2\)

nchoosek(1:4,2)[1 21 31 42 32 43 4]

or why not do

nchoosek([1,4,10,17],2) = [1 41 101 174 104 1710 17]

Awesome discovery at 8.00 pm!

Using cricket connoisseurs and pint drinkers to explain Bayes' Theorem

In a teaching course I am taking I had to do a ten minute talk about anything, but it should be related to my research. The people listening to the talk was a mix of PhD students from different departments at my university. So I had to keep it simple enough so even people with weak mathematical backgrounds could understand what I was talking about. Most of the people did some slides explaining why they are interested in the subject they are doing their PhD in, which is fine but is sort of the easy path out. I thought about doing the same at first but figured a more fun exercise would be to show a proof on the black board. It is not much you can do in ten minutes so I settled for indirectly formulating Bayes' Theorem, so here goes:

Cricket players and pints

Imagine you are an advertising person for the cricket federation and you wanna figure out if a possible advertising campaign for getting people to attend cricket games more could target people at pubs. And so you want to find out the probability of people who like a pint at the local pub also liking a game of cricket. Sure you could just do a survey, but surveys cost money and you just want to find an initial guess as to wether this would be an interesting survey to carry out. So we if we write down the two configurations we like to research/model:

A = I like cricket.
B = I like beer.

Now what we want to find out is:

P(A|B) = The probability of a person liking cricket given that he/she likes beer

We realize that going around asking everyone if they like beer and then in turn asking them if they like cricket would be quite cumbersome. There is lots of pubs in Britain! What if we could ask people that like cricket if they like beer? That is a much smaller subset since we could just ask people at games if they like beer taking for granted that they fancy cricket. We could even figure it out by doing something with the sales figures for beers at the games. So now we have

P(B|A) = The probability of a person liking beer given that he/she likes cricket

Now if we stop and ask ourself what is it that we are trying to calculate here? Lets draw the two possible events as intersecting circles(a Venn diagram) like this:

So actually what we are trying to find out is what subset of the beer circle that likes cricket i.e. the intersection in the Venn diagram. With (Venn) digrams one has to be careful about what they really mean. They are more meant as a visual aid, for example we could also easily equally correct interpret the intersection as the subset of the cricket likers that also like beers. But in our model we already knew who liked beers so we are dealing with the probabilities over the population that likes beers.

Now that we have some idea about what we really want we could try to be a bit more specific. Had we had some numbers we would have written for the probability of the intersection

P(A,B) = Number of people who like beer and cricket / Total population

And for the probability of beer likers we could write

P(B) = Number of people that likes beers / Total population.

Now if we want to formulate the conditional probability P(A|B) we could write it as

P(A|B) = (Number of people who like beer and cricket / Total population) / (Number of people that likes beers / Total population)

So the total population cancels out in the denominator and numerator for P(B|A). So what we need to write is actually just

P(A|B) = P(A,B) / P(B)

So we are almost there, the only question is how do we model P(A,B)? P(A|B) was costly to model so P(A,B) should not be much cheaper. But what about P(B|A), that was less costly. Could we rewrite P(A,B) in terms of P(B|A)? It turns out that we can! If we write it out

Number of people who like beer and cricket / Total population = (Number of people that like beer that also like cricket / Number of people that likes cricket) * ( Number of people that likes cricket / Total population)

So basically what we are saying here is that given our total population the probability that people like both beers and cricket is the same as that of the probability that people who like cricket also likes beer multiplied by the probability that person likes cricket. We could view this as an elimination process - first we pick out the cricket likers from the total population and then from these we pick out the beer likers. Formally we write

P(A,B) = P(B|A) P(A)

And so finally if we put it all together we get

Bayes' Theorem: P(A|B) = P(B|A) P(A) / P(B)

And we now have a notion of the probability of a person liking cricket given that he/she likes beer and wether not we should carry out that survey we wanted to evaluate.


Debugging in Matlab

Getting in to someones code or just cleaning up your own Matlab code it's good to know how to do break points and other useful to make the process flow a bit smoother. I thought I would do a list of my favorites for making my code work faster and with less tedious work.

The best debug command in Matlab: dbstop if error 

Scenario: Some error happens in some file that is called by the main program which causes it to terminate with some cryptic error about some of the variables.

Since you probably modularized your code your usual procedure would be to to a disp command for the specific variable or set up a break point. This is oh so tedious but what if you could just acces the variable in the code or other variables that are in the scope at the moment!? You can with: dbstop if error. Just run it before executing your code and it will put you in the debug mode. The program will stop with the workspace full of the variables available when the error happened and you can easily tinker around with them to see what was the real problem in your implementation. After you are done just run dbquit in the command window and you will exit debug mode.

The Try-Catch statement

Matlab actually has a try-catch statement extremely useful when reading files or any other type of error prone procedure the statement looks like this

  // Your code here
catch ERROR
  // Catching the error. ERROR contains the error information
  disp(ERROR); // Display the error object information.

Where the ERROR is he ERROR object (you can call it what you want like THEBIGERROR or something else).

Setting breakpoints

In Matlab you can also set break points meaning that the code will stop at the point where you set the break point and let you access the variables available in the workspace at that particular point in time. You can even set conditional breakpoints which could be very useful in loops or value testing.  Here are the most useful commands links go to the Matlab manual.

dbclear Clear breakpoints
dbcont Resume execution
dbdown Reverse workspace shift performed by dbup, while in debug mode
dbquit Quit debug mode
dbstack Function call stack
dbstatus List all breakpoints
dbstep Execute one or more lines from current breakpoint
dbstop Set breakpoints for debugging
dbtype List text file with line numbers
dbup Shift current workspace to workspace of caller, while in debug mode

(dbstep nlines - steps through code execution nlines at a time)

Other useful commands

lasterror - Gives an object with the error message, identifier and the stack from the last error occurred.

smart indent - just clears up the formatting of your code indenting the code in proper Matlab style.