Class/Object

org.apache.spark.mllib.clustering

KMeans

Related Docs: object KMeans | package clustering

Permalink

class KMeans extends Serializable with Logging

K-means clustering with a k-means++ like initialization mode (the k-means|| algorithm by Bahmani et al).

This is an iterative algorithm that will make multiple passes over the data, so any RDDs given to it should be cached by the user.

Annotations
@Since( "0.8.0" )
Linear Supertypes
Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KMeans
  2. Logging
  3. Serializable
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KMeans()

    Permalink

    Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, initializationMode: "k-means||", initializationSteps: 2, epsilon: 1e-4, seed: random}.

    Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, initializationMode: "k-means||", initializationSteps: 2, epsilon: 1e-4, seed: random}.

    Annotations
    @Since( "0.8.0" )

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def getEpsilon: Double

    Permalink

    The distance threshold within which we've consider centers to have converged.

    The distance threshold within which we've consider centers to have converged.

    Annotations
    @Since( "1.4.0" )
  11. def getInitializationMode: String

    Permalink

    The initialization algorithm.

    The initialization algorithm. This can be either "random" or "k-means||".

    Annotations
    @Since( "1.4.0" )
  12. def getInitializationSteps: Int

    Permalink

    Number of steps for the k-means|| initialization mode

    Number of steps for the k-means|| initialization mode

    Annotations
    @Since( "1.4.0" )
  13. def getK: Int

    Permalink

    Number of clusters to create (k).

    Number of clusters to create (k).

    Annotations
    @Since( "1.4.0" )
    Note

    It is possible for fewer than k clusters to be returned, for example, if there are fewer than k distinct points to cluster.

  14. def getMaxIterations: Int

    Permalink

    Maximum number of iterations allowed.

    Maximum number of iterations allowed.

    Annotations
    @Since( "1.4.0" )
  15. def getSeed: Long

    Permalink

    The random seed for cluster initialization.

    The random seed for cluster initialization.

    Annotations
    @Since( "1.4.0" )
  16. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  17. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  18. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  19. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  20. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  31. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  32. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  33. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  34. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  35. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  36. def run(data: RDD[Vector]): KMeansModel

    Permalink

    Train a K-means model on the given set of points; data should be cached for high performance, because this is an iterative algorithm.

    Train a K-means model on the given set of points; data should be cached for high performance, because this is an iterative algorithm.

    Annotations
    @Since( "0.8.0" )
  37. def setEpsilon(epsilon: Double): KMeans.this.type

    Permalink

    Set the distance threshold within which we've consider centers to have converged.

    Set the distance threshold within which we've consider centers to have converged. If all centers move less than this Euclidean distance, we stop iterating one run.

    Annotations
    @Since( "0.8.0" )
  38. def setInitialModel(model: KMeansModel): KMeans.this.type

    Permalink

    Set the initial starting point, bypassing the random initialization or k-means|| The condition model.k == this.k must be met, failure results in an IllegalArgumentException.

    Set the initial starting point, bypassing the random initialization or k-means|| The condition model.k == this.k must be met, failure results in an IllegalArgumentException.

    Annotations
    @Since( "1.4.0" )
  39. def setInitializationMode(initializationMode: String): KMeans.this.type

    Permalink

    Set the initialization algorithm.

    Set the initialization algorithm. This can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++ (Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.

    Annotations
    @Since( "0.8.0" )
  40. def setInitializationSteps(initializationSteps: Int): KMeans.this.type

    Permalink

    Set the number of steps for the k-means|| initialization mode.

    Set the number of steps for the k-means|| initialization mode. This is an advanced setting -- the default of 2 is almost always enough. Default: 2.

    Annotations
    @Since( "0.8.0" )
  41. def setK(k: Int): KMeans.this.type

    Permalink

    Set the number of clusters to create (k).

    Set the number of clusters to create (k).

    Annotations
    @Since( "0.8.0" )
    Note

    It is possible for fewer than k clusters to be returned, for example, if there are fewer than k distinct points to cluster. Default: 2.

  42. def setMaxIterations(maxIterations: Int): KMeans.this.type

    Permalink

    Set maximum number of iterations allowed.

    Set maximum number of iterations allowed. Default: 20.

    Annotations
    @Since( "0.8.0" )
  43. def setSeed(seed: Long): KMeans.this.type

    Permalink

    Set the random seed for cluster initialization.

    Set the random seed for cluster initialization.

    Annotations
    @Since( "1.4.0" )
  44. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  45. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  46. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  47. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  48. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def getRuns: Int

    Permalink

    This function has no effect since Spark 2.0.0.

    This function has no effect since Spark 2.0.0.

    Annotations
    @Since( "1.4.0" ) @deprecated
    Deprecated

    (Since version 2.1.0) This has no effect and always returns 1

  2. def setRuns(runs: Int): KMeans.this.type

    Permalink

    This function has no effect since Spark 2.0.0.

    This function has no effect since Spark 2.0.0.

    Annotations
    @Since( "0.8.0" ) @deprecated
    Deprecated

    (Since version 2.1.0) This has no effect

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped