Clustering: probably approximately useless?

Clustering never seems to live up to the hype. To paraphrase the popular saying, clustering looks good in theory, yet often fails to deliver in practice. Why? You would think that something so simple and elegant as finding groups of similar items in data would be incredibly useful. Yet often it isn't. The problem is that clustering rarely finds the groups you want, or expected, or that are most useful for the task at hand. There are so many good ways to cluster a dataset that the odds of coming up with the clustering that is best for what you are doing now are small. How do we fix this and make clustering more useful in practice? How do we make clustering do what you want, while still giving it the freedom to "do its own thing" and surprise us?