KDD-Cup 2004: results and analysis

This paper summarizes and analyzes the results of the 2004 KDD-Cup. The competition consisted of two tasks from the areas of particle physics and protein homology detection. It focused on the problem of optimizing supervised learning to different performance measures (accuracy, cross-entropy, ROC area, SLAC-Q, squared error, average precision, top 1, and rank of last). A total of 102 groups participated in the competition, 6 of which received awards or honorable mentions. Their approaches are described in other papers in this issue of SIGKDD Explorations. In this paper we do not analyze any particular approach, but give insight into the performance of the field of competitors as a whole. We study what fraction of the participants found good solutions, how well participants were able to optimize to different performance measures, how homogeneous their submitted predictions are, and if the best submissions represent the maximal performances that could reasonably be achieved. We are keeping the KDD-Cup 2004 WWW site open and have added an automatic scoring system for new submissions in order to encourage further research in this area.