The Optimal PAC Algorithm

Assume we are trying to learn a concept class C of VC dimension d with respect to an arbitrary distribution. There is PAC sample size bound that holds for any algorithm that always predicts with some consistent concept in the class C (BEHW89): \(O(\frac{1}{\epsilon}(dlog\frac{1}{\epsilon}+log\frac{1}{\epsilon}))\), where e and δ are the accuracy and confidence parameters. Thus after drawing this many examples (consistent with any concept in C), then with probability at least 1–δ, the error of the produced concept is at most e. Here the examples are drawn with respect to an arbitrary but fixed distribution D, and the accuracy is measured with respect to the same distribution.