« AOTM & audio | Main | Free Airtime Campaign »

ALA implemented, AOTM eval started

Yay. I implemented Vasconcelos' Asympotic Likelihood Approximation to the KL-divergence between GMMs. Then I created a distance matrix for the 414 "tuna.artists" (really the artists in the playola DB after re-ripping), using the ALA on GMMs fit to the anchorspace points. One caveat is that I only trained the models with 20 EM iterations because I was impatient, so I'll have to do it again with longer training later.

Then I compared the distance metric to the AOTM data by plotting the conditional co-occurrence densities (conditioned on each artist), but sorted according to the audio-based ALA distance.

Some example results:
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-aguilera.jpg
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-coldplay.jpg
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-abdul.jpg

In the plots, the list of artists in the upper right corner is the top 5 artists sorted by ALA-distance. The top 5 sorted by co-occurrence probability are also labeled.

I think I need to normalize these to get rid of popularity effect. If we really want to compare the distance-based ranking with the co-occurrence-based ranking, which is what this plot essentially does, then i should normalize the probabilites by popularity. otherwise, e.g. radiohead often has a high probability, which doesn't necessarily mean that radiohead is similar to the conditioned artist.

next:
- musicseer eval on ALA
- try to quantify this AOTM eval. not sure about the fitting-the-exponential idea I had before, there's really not much reason to believe it should behave exponentially. what I really want to do, I guess is ranking comparison between the distance-based ranking and the conditional density ranking, but normalized as I mentioned.
- retrain the GMMs with longer EM iterations

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)