ALA implemented, AOTM eval started
Yay. I implemented Vasconcelos' Asympotic Likelihood Approximation to the KL-divergence between GMMs. Then I created a distance matrix for the 414 "tuna.artists" (really the artists in the playola DB after re-ripping), using the ALA on GMMs fit to the anchorspace points. One caveat is that I only trained the models with 20 EM iterations because I was impatient, so I'll have to do it again with longer training later.
Then I compared the distance metric to the AOTM data by plotting the conditional co-occurrence densities (conditioned on each artist), but sorted according to the audio-based ALA distance.
Some example results:
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-aguilera.jpg
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-coldplay.jpg
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-abdul.jpg
In the plots, the list of artists in the upper right corner is the top 5 artists sorted by ALA-distance. The top 5 sorted by co-occurrence probability are also labeled.
I think I need to normalize these to get rid of popularity effect. If we really want to compare the distance-based ranking with the co-occurrence-based ranking, which is what this plot essentially does, then i should normalize the probabilites by popularity. otherwise, e.g. radiohead often has a high probability, which doesn't necessarily mean that radiohead is similar to the conditioned artist.
next:
- musicseer eval on ALA
- try to quantify this AOTM eval. not sure about the fitting-the-exponential idea I had before, there's really not much reason to believe it should behave exponentially. what I really want to do, I guess is ranking comparison between the distance-based ranking and the conditional density ranking, but normalized as I mentioned.
- retrain the GMMs with longer EM iterations
Posted by: Anonymous | March 21, 2003 11:56 PM