tf.contrib.factorization.KMeans

Creates the graph for k-means clustering.

tf.contrib.factorization.KMeans(
    inputs, num_clusters, initial_clusters=RANDOM_INIT,
    distance_metric=SQUARED_EUCLIDEAN_DISTANCE, use_mini_batch=False,
    mini_batch_steps_per_iteration=1, random_seed=0, kmeans_plus_plus_num_retries=2,
    kmc2_chain_length=200
)

Args
`inputs`	An input tensor or list of input tensors. It is assumed that the data points have been previously randomly permuted.
`num_clusters`	An integer tensor specifying the number of clusters. This argument is ignored if initial_clusters is a tensor or numpy array.
`initial_clusters`	Specifies the clusters used during initialization. One of the following: a tensor or numpy array with the initial cluster centers. a function f(inputs, k) that returns up to k centers from `inputs`. "random": Choose centers randomly from `inputs`. "kmeans_plus_plus": Use kmeans++ to choose centers from `inputs`. "kmc2": Use the fast k-MC2 algorithm to choose centers from `inputs`. In the last three cases, one batch of `inputs` may not yield `num_clusters` centers, in which case initialization will require multiple batches until enough centers are chosen. In the case of "random" or "kmeans_plus_plus", if the input size is <= `num_clusters` then the entire batch is chosen to be cluster centers.
`distance_metric`	Distance metric used for clustering. Supported options: "squared_euclidean", "cosine".
`use_mini_batch`	If true, use the mini-batch k-means algorithm. Else assume full batch.
`mini_batch_steps_per_iteration`	Number of steps after which the updated cluster centers are synced back to a master copy.
`random_seed`	Seed for PRNG used to initialize seeds.
`kmeans_plus_plus_num_retries`	For each point that is sampled during kmeans++ initialization, this parameter specifies the number of additional points to draw from the current distribution before selecting the best. If a negative value is specified, a heuristic is used to sample O(log(num_to_sample)) additional points.
`kmc2_chain_length`	Determines how many candidate points are used by the k-MC2 algorithm to produce one new cluster centers. If a (mini-)batch contains less points, one new cluster center is generated from the (mini-)batch.

Raises
`ValueError`	An invalid argument was passed to initial_clusters or distance_metric.

Methods

`training_graph`

View source

training_graph()

Generate a training graph for kmeans algorithm.

This returns, among other things, an op that chooses initial centers (init_op), a boolean variable that is set to True when the initial centers are chosen (cluster_centers_initialized), and an op to perform either an entire Lloyd iteration or a mini-batch of a Lloyd iteration (training_op). The caller should use these components as follows. A single worker should execute init_op multiple times until cluster_centers_initialized becomes True. Then multiple workers may execute training_op any number of times.

Returns
A tuple consisting of:
`all_scores`	A matrix (or list of matrices) of dimensions (num_input, num_clusters) where the value is the distance of an input vector and a cluster center.
`cluster_idx`	A vector (or list of vectors). Each element in the vector corresponds to an input row in 'inp' and specifies the cluster id corresponding to the input.
`scores`	Similar to cluster_idx but specifies the distance to the assigned cluster instead.
`cluster_centers_initialized`	scalar indicating whether clusters have been initialized.
`init_op`	an op to initialize the clusters.
`training_op`	an op that runs an iteration of training.

tf.contrib.factorization.KMeans

Args

Raises

Methods

training_graph

`training_graph`