About 600 results
Open links in new tab
  1. MLlib: Main Guide - Spark 4.1.0 Documentation

    As of Spark 2.0, the RDD -based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame -based API in the spark.ml package.

  2. MLlib | Apache Spark

    MLlib is Apache Spark's scalable machine learning library, with APIs in Java, Scala, Python, and R.

  3. ML Pipelines - Spark 4.1.0 Documentation

    Machine learning can be applied to a wide variety of data types, such as vectors, text, images, and structured data. This API adopts the DataFrame from Spark SQL in order to support a variety of data …

  4. Classification and regression - Spark 4.1.0 Documentation

    The spark.ml implementation supports decision trees for binary and multiclass classification and for regression, using both continuous and categorical features.

  5. MLlib (DataFrame-based) — PySpark 4.1.0 documentation - Apache …

    MLlib (DataFrame-based) # Note From Apache Spark 4.0.0, all builtin algorithms support Spark Connect.

  6. Apache Spark™ - Unified Engine for large-scale data analytics

    Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

  7. MLlib: RDD-based API - Spark 4.1.0 Documentation

    This page documents sections of the MLlib guide for the RDD-based API (the spark.mllib package). Please see the MLlib Main Guide for the DataFrame-based API (the spark.ml package), which is now …

  8. Clustering - Spark 4.1.0 Documentation

    The spark.ml implementation uses the expectation-maximization algorithm to induce the maximum-likelihood model given a set of samples. GaussianMixture is implemented as an Estimator and …

  9. PySpark Overview — PySpark 4.1.0 documentation - Apache Spark

    Dec 11, 2025 · Built on top of Spark, MLlib is a scalable machine learning library that provides a uniform set of high-level APIs that help users create and tune practical machine learning pipelines.

  10. Overview - Spark 4.0.0 Documentation

    It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for …