In this paper we present our work on Microblog Track of TREC 2011. We tried two methods to tackle the problem of tweets retrieval, namely EMAX and RTB. The first method EMAX is mainly based on the intuition that not only should retrieved tweets contain the keywords in given queries but also provide more information. This results in a ranking method based on self-information. Our second method RTB tries to incorporate the importance of recency along with relevance in microblog retrieval tasks. Therefore, we adapt portfolio theory to balance the relevance dimension and re- cency dimension. However, the evaluation results suggest no significant improvement from both two methods because of the short lengths of documents, the noisy and spam tweets and the re-ordering in recency. Meanwhile, we also present some ideas during the course of participation. By close examining the judgments, we find that most of relevant documents are those containing a link to external resource and have a length of around 17 words, which is different from the collection statistics.
Text REtrieval Conference
Human-Centered Data Analytics

Li, W., de Vries, A., & Eickhoff, C. (2011). DMIR on Microblog Track 2011. In Proceedings of Text REtrieval Conference 2011 (20).