Machine Learning
ML library
: uses dataframes
MLLib
: uses RDDs (deprecated)
Capabilities
- Feature Extraction
- TF (Term Frequency)
- IDF (Inverse Document Frequency)
- Useful for search
- Basic Statistics
- Chi-squared test
- Pearson correlation
- Spearman correlation
- Min, max, mean, variance
- Linear regression
- Fit a line onto a set of data
- Logistic regression
- Fit a curve onto a set of data
- Support Vector Machines (SVM)
- Find complex classification divisions
- Naive Bayes classifier
- Classification
- Decision trees
- K-Means clustering
- Cluster data based on attributes
- Unsupervised learning
- Principal Component Analysis (PCA)
- Singular value decomposition (SVD)
- Dimensionality reduction techniques
- Recommendations
- Alternating Least Squares (ALS)