Thursday, January 3rd, 2019


Sentence cluster with Kmeans algorithm

  This is unsupervised learning project in the SJSU data mining course. Students need to classify the sentences (already numerized by the SJSU course Prof) into clusters using Kmeans algorithm. Below is the performance of students in the class. Seems not so bad.   The whole project includes 3 parts: data preprocess, Bisec Kmeans and the basic Kmeans algorithm. 1. Data preprocess The clustering documents have 27673 terms in total. That is said number of features are 27673. Here is an excerpt of the sample data.  First, we will buildRead More