Evaluation of Pseudo-Relevance Feedback using Wikipedia Murtadha Aljubran Alex James ABSTRACT Users have specific information needs when they do search using any information retrieval system. The users try to express their needs in unstructured queries which tend to be short and doesn't have a very good description of the information needs in most cases. Using the shallow language statistics including probabilistic or language models such as BM25 or Indri respectively can be used to enhance the retrieval system metrics like Mean Average Precision (MAP). However, such methods depend on query terms and their presence in the retrieved document to dene relevance. Query expansion is a technique that can be used to overcome this problem and allow including other terms to the query that have a high probability of being important in defining the relevance of retrieved documents. In this project, we explore query expansion using the documents resulting from the initial query on the same corpus on which we will run the final query. There are a couple of parameters to optimize for such as the number of documents and terms to be included in query expansion, and Indri model internal parameters including smoothing factors. We create an index of the Wikipedia corpus which is used to construct the expansion query. The question that we try to answer is whether the quality of the corpus used for expansion, along with the basis for expansion, produce a signicant improvement in metrics such as MAP and precision at top 30 retrieved documents. We show that the quality and the selection criteria of expansion documents are important factors in query expansion performance that can improve these metrics.