Guest editorial: Machine learning for the Internet

Our ability to obtain efficient algorithmic solutions is often limited by factors such as a poor understanding of the underlying problem domain, the intrinsic presence of uncertainty, or, in some cases, the unaffordable computational cost of finding an exact solution. For all of these factors, problem instances of an enormous size can be both a curse and a blessing. As problem instances become larger, many of these confounding factors are often magnified, hence size can be a curse. However, enormous problem instances may also yield an unexpected source of power in finding solutions when size can be leveraged in nontrivial ways. The World Wide Web has been at the center of a revolution in how algorithms are designed with massive amounts of data in mind. The essence of this revolution is conceptually very simple: real-world massive data sets are, more often than not, highly structured and regular. Regularities can be used in two complementary ways. First, systematic regularities within massive data sets can be used to craft algorithms that are potentially suboptimal in the worst-case, but highly effective for expected cases. Second, nonsystematic regularities—those that are too subtle to be encoded within an algorithm—can be discovered by