Spark Machine Learning

About 600 results

Open links in new tab

Any time

apache.org
https://spark.apache.org › docs › latest › ml-guide.html
MLlib: Main Guide - Spark 4.1.0 Documentation
As of Spark 2.0, the RDD -based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame -based API in the spark.ml package.
apache.org
https://spark.apache.org › mllib
MLlib | Apache Spark
MLlib is Apache Spark's scalable machine learning library, with APIs in Java, Scala, Python, and R.
apache.org
https://spark.apache.org › docs › latest › ml-pipeline.html
ML Pipelines - Spark 4.1.0 Documentation
Machine learning can be applied to a wide variety of data types, such as vectors, text, images, and structured data. This API adopts the DataFrame from Spark SQL in order to support a variety of data …
apache.org
https://spark.apache.org › docs › latest › ml-classification-regression.html
Classification and regression - Spark 4.1.0 Documentation
The spark.ml implementation supports decision trees for binary and multiclass classification and for regression, using both continuous and categorical features.
apache.org
https://spark.apache.org › docs › latest › api › python › reference › pyspark.ml.…
MLlib (DataFrame-based) — PySpark 4.1.0 documentation - Apache …
MLlib (DataFrame-based) # Note From Apache Spark 4.0.0, all builtin algorithms support Spark Connect.
apache.org
https://spark.apache.org
Apache Spark™ - Unified Engine for large-scale data analytics
Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
apache.org
https://spark.apache.org › docs › latest › mllib-guide.html
MLlib: RDD-based API - Spark 4.1.0 Documentation
This page documents sections of the MLlib guide for the RDD-based API (the spark.mllib package). Please see the MLlib Main Guide for the DataFrame-based API (the spark.ml package), which is now …
apache.org
https://spark.apache.org › docs › latest › ml-clustering.html
Clustering - Spark 4.1.0 Documentation
The spark.ml implementation uses the expectation-maximization algorithm to induce the maximum-likelihood model given a set of samples. GaussianMixture is implemented as an Estimator and …
apache.org
https://spark.apache.org › docs › latest › api › python
PySpark Overview — PySpark 4.1.0 documentation - Apache Spark
Dec 11, 2025 · Built on top of Spark, MLlib is a scalable machine learning library that provides a uniform set of high-level APIs that help users create and tune practical machine learning pipelines.
apache.org
https://spark.apache.org › docs
Overview - Spark 4.0.0 Documentation
It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for …

Pagination
- 1
- 2
- 3
- Next