The Elements of Statistical Learning

The model errors do not have obvious clusters unless there are obvious errors in the model. Our clustering strategy was to get meaningful spatio-temporal distribution of the model errors based on the distribution of data in the clusters. In addition, common statistics that have been used so far in the validation of the models can be calculated for each cluster. We wanted to keep the clustering procedure relatively simple and easy to implement for the wide audience who are not experts in the field of unsupervised machine learning. The other aspect of using the K-means, instead of the other algorithms, e.g. kernel K-means, was the relatively low computational time. Usually, in ocean model validation we deal with a huge dataset.