Apache Mahout is a scalable machine learning library. It contains clustering, classification and collaborative filtering algorithms which are implemented on top of scalable distributed filesystems.

Currently Mahout supports mainly three use cases: Recommendation, clustering and classification. Recommendation takes users' behavior and from that tries to find items users might like. Clustering groups items into groups without knowing labels of these items in advance, e.g. groups of text documents of topically related documents. Classification learns from existing labeled items and is able to assign unlabelled new items to the (hopefully) correct category).  This can be obtained either by a binary (or multilable) classifier (yes or no) or a chance predictor (the chance the outcome will be yes).


Related news