another (minor) disaster
but why are the results for the old stuff - erdos, opennap, n2 - different? esp. n2 is now worse than random!!?
I figured out the first part: the numbers for erdos etc changed when I replaced the buggy topset-400 id mapping because of another bug in the scoring code. When the code saw a response that it didn't have a sim value for, it was returning from the function Response() instead of "next'ing" to the next SIM-type for the same user judgment. So if erdos was after one of the ank14 things, e.g., then a bunch of judgments would get ignored. Now that it's fixed, the numbers look like the numbers from the old "Quest for ground truth" paper, which is good.
But what about n2? Looking more closely at the SIM file, I'm not sure that it's wrong after all. It looks horrible; maybe the question is, how was it getting decent scores before?! Under AOTM eval, it also does extremely badly: .12 while the other ones do about .36 (random is .07). Maybe I have the wrong SIM file somehow? I'll get Beth to run it and see what she comes up with...
I fixed the ICME paper and resubmitted it.