今度はMahoutでクラスタリング(ソース編)
Mahoutシリーズを最初から読む場合はこちらApache Mahoutで機械学習してみるべ - 都元ダイスケ IT-PRESS。前回はこちら今度はMahoutでクラスタリング - 都元ダイスケ IT-PRESS。
準備
まずmvnの依存設定を。以前と同じようにmahout-coreは要ります。それに加えて*1slf4jとlogback*2、そしてcommons-io*3を入れておきます。
pom.xml
<dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-core</artifactId> <version>0.4</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>${lib.slf4j.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>jcl-over-slf4j</artifactId> <version>${lib.slf4j.version}</version> </dependency> <dependency> <groupId>ch.qos.logback</groupId> <artifactId>logback-core</artifactId> <version>${lib.logback.version}</version> </dependency> <dependency> <groupId>ch.qos.logback</groupId> <artifactId>logback-classic</artifactId> <version>${lib.logback.version}</version> </dependency> <dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.0</version> </dependency> ... <properties> <lib.slf4j.version>1.6.0</lib.slf4j.version> <lib.logback.version>0.9.21</lib.logback.version> </properties>
logback.xml
で、ログ設定ファイルこんなんをsrc/main/resouces直下に置いておきましょう。
<?xml version="1.0" encoding="UTF-8"?> <configuration> <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender"> <Target>System.out</Target> <layout class="ch.qos.logback.classic.PatternLayout"> <Pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</Pattern> </layout> </appender> <root> <level value="INFO" /> <appender-ref ref="STDOUT" /> </root> </configuration>
Javaソース
やっと本質的なトコはいりますよー。とりあえず、サンプルコードでは、最後の3次元ベクトルをクラスタリングしてみましょう。
まずはクラスタリング対象のベクトル群を用意します。ここでは前回の3Dベクトル9つを使います。
static final double[][] points = { {8, 8, 8}, {8, 7.5, 9}, {7.7, 7.5, 9.8}, {0, 7.5, 9}, {0.1, 8, 8}, {-1, 9, 7.5}, {9, -1, -0.8}, {7.7, -1.2, -0.1}, {8.2, 0.2, 0.2}, };
で、今回のクラスタリングには k-means clastering という手法を使います。この手法では、あらかじめ「最終的にいくつのクラスタを作るのか」、という k の値を決めなければなりません。ここでは k = 3 として、3つのクラスタを作る前提でいきます。
Mahoutのクラスタリングでは、いきなりHadoopが出て来ます。とは言え、Hadoopクラスタを組む必要はなく、standaloneで走らせることはできます。その際「クラスタリングの対象となる9つのベクトル」と「3つのクラスタ」をあらかじめHDFS上にファイルとして配置する必要があります。これを writePointsToFile と writeClustersToFile メソッドで行っています。
そしてクラスタリングの処理を実行。クラスタリングの計算は、HDFSからデータを読み込み、そして結果もHDFSに書き込みます。従って、計算後にはHDFSを読み出す処理として readClusteredPointsFromFile を実行しています。
public static void main(String args[]) throws Exception { int k = 3; List<Vector> vectors = getPoints(points); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); // HDFSにベクトルとクラスタを書き込む writePointsToFile(vectors, "target/input/points/file1", fs, conf); writeClustersToFile(k, vectors, "target/input/clusters/part-00000", fs, conf); // クラスタリングを実行 Path pointsPath = new Path("target/input/points"); Path clustersPath = new Path("target/input/clusters"); Path outputPath = new Path("target/output"); KMeansDriver.run(conf, pointsPath, clustersPath, outputPath, new EuclideanDistanceMeasure(), 0.001, 10, true, false); // クラスタリングの結果をHDFSから読み出し、コンソールに表示する readClusteredPointsFromFile(fs, conf); } static List<Vector> getPoints(double[][] raw) { List<Vector> points = new ArrayList<Vector>(); for (double[] fr : raw) { Vector vec = new RandomAccessSparseVector(fr.length); vec.assign(fr); points.add(vec); } return points; } static void writePointsToFile(List<Vector> points, String fileName, FileSystem fs, Configuration conf) throws IOException { Path path = new Path(fileName); SequenceFile.Writer writer = null; try { writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class, VectorWritable.class); long recNum = 0; VectorWritable vec = new VectorWritable(); for (Vector point : points) { vec.set(point); writer.append(new LongWritable(recNum++), vec); } } finally { IOUtils.closeQuietly(writer); } } static void writeClustersToFile(int k, List<Vector> vectors, String fileName, FileSystem fs, Configuration conf) throws IOException { Path path = new Path(fileName); SequenceFile.Writer writer = null; try { writer = new SequenceFile.Writer(fs, conf, path, Text.class, Cluster.class); for (int i = 0; i < k; i++) { Vector vec = vectors.get(i); Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure()); writer.append(new Text(cluster.getIdentifier()), cluster); } } finally { IOUtils.closeQuietly(writer); } } static void readClusteredPointsFromFile(FileSystem fs, Configuration conf) throws IOException { Path path = new Path("target/output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"); SequenceFile.Reader reader = null; try { reader = new SequenceFile.Reader(fs, path, conf); IntWritable key = new IntWritable(); WeightedVectorWritable value = new WeightedVectorWritable(); while (reader.next(key, value)) { System.out.println(value.toString() + " belongs to cluster " + key.toString()); } } finally { IOUtils.closeQuietly(reader); } }
参考までに、importはこちら。同じ単純名のクラスが意外とある。
import org.apache.commons.io.IOUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.clustering.WeightedVectorWritable; import org.apache.mahout.clustering.kmeans.Cluster; import org.apache.mahout.clustering.kmeans.KMeansDriver; import org.apache.mahout.common.distance.EuclideanDistanceMeasure; import org.apache.mahout.math.RandomAccessSparseVector; import org.apache.mahout.math.Vector; import org.apache.mahout.math.VectorWritable;
結果
クラスタリングの結果は以下の通り。それぞれのベクトルが cluster 0 〜 cluster 2 に分類されていることが分かると思います。
1.0: [8.000, 8.000, 8.000] belongs to cluster 1 1.0: [8.000, 7.500, 9.000] belongs to cluster 1 1.0: [7.700, 7.500, 9.800] belongs to cluster 1 1.0: [1:7.500, 2:9.000] belongs to cluster 2 1.0: [0.100, 8.000, 8.000] belongs to cluster 2 1.0: [-1.000, 9.000, 7.500] belongs to cluster 2 1.0: [9.000, -1.000, -0.800] belongs to cluster 0 1.0: [7.700, -1.200, -0.100] belongs to cluster 0 1.0: [8.200, 0.200, 0.200] belongs to cluster 0
参考までに、結果を出す前にだーーっと流れるログはこんな感じ。Hadoopのジョブとして動いているのが分かると思います。
22:16:11.733 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - Input: target/input/points Clusters In: target/input/clusters Out: target/output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure 22:16:11.738 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - convergence: 0.0010 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} 22:16:11.739 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - K-Means Iteration 1 22:16:11.768 [main] INFO o.a.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 22:16:11.878 [main] INFO org.apache.mahout.common.HadoopUtil - Deleting target/output/clusters-1 22:16:11.885 [main] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 22:16:12.497 [main] INFO o.a.h.m.lib.input.FileInputFormat - Total input paths to process : 1 22:16:12.781 [main] INFO org.apache.hadoop.mapred.JobClient - Running job: job_local_0001 22:16:12.787 [Thread-14] INFO o.a.h.m.lib.input.FileInputFormat - Total input paths to process : 1 22:16:12.880 [Thread-14] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100 22:16:17.532 [main] INFO org.apache.hadoop.mapred.JobClient - map 0% reduce 0% 22:16:17.534 [Thread-14] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720 22:16:17.535 [Thread-14] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680 22:16:17.646 [Thread-14] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output 22:16:18.047 [Thread-14] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0 22:16:18.051 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 22:16:18.055 [Thread-14] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:18.055 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done. 22:16:18.064 [Thread-14] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:18.072 [Thread-14] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments 22:16:18.087 [Thread-14] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 239 bytes 22:16:18.087 [Thread-14] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:18.184 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 22:16:18.185 [Thread-14] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:18.186 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_r_000000_0 is allowed to commit now 22:16:18.190 [Thread-14] INFO o.a.h.m.l.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_r_000000_0' to target/output/clusters-1 22:16:18.191 [Thread-14] INFO o.a.hadoop.mapred.LocalJobRunner - reduce > reduce 22:16:18.192 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_r_000000_0' done. 22:16:18.535 [main] INFO org.apache.hadoop.mapred.JobClient - map 100% reduce 100% 22:16:18.535 [main] INFO org.apache.hadoop.mapred.JobClient - Job complete: job_local_0001 22:16:18.537 [main] INFO org.apache.hadoop.mapred.JobClient - Counters: 13 22:16:18.537 [main] INFO org.apache.hadoop.mapred.JobClient - Clustering 22:16:18.538 [main] INFO org.apache.hadoop.mapred.JobClient - Converged Clusters=1 22:16:18.538 [main] INFO org.apache.hadoop.mapred.JobClient - FileSystemCounters 22:16:18.538 [main] INFO org.apache.hadoop.mapred.JobClient - FILE_BYTES_READ=2741232 22:16:18.539 [main] INFO org.apache.hadoop.mapred.JobClient - FILE_BYTES_WRITTEN=2792502 22:16:18.539 [main] INFO org.apache.hadoop.mapred.JobClient - Map-Reduce Framework 22:16:18.539 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce input groups=3 22:16:18.540 [main] INFO org.apache.hadoop.mapred.JobClient - Combine output records=3 22:16:18.540 [main] INFO org.apache.hadoop.mapred.JobClient - Map input records=9 22:16:18.541 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce shuffle bytes=0 22:16:18.541 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce output records=3 22:16:18.541 [main] INFO org.apache.hadoop.mapred.JobClient - Spilled Records=6 22:16:18.542 [main] INFO org.apache.hadoop.mapred.JobClient - Map output bytes=675 22:16:18.542 [main] INFO org.apache.hadoop.mapred.JobClient - Combine input records=9 22:16:18.543 [main] INFO org.apache.hadoop.mapred.JobClient - Map output records=9 22:16:18.543 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce input records=3 22:16:18.547 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - K-Means Iteration 2 22:16:18.548 [main] INFO o.a.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 22:16:18.576 [main] INFO org.apache.mahout.common.HadoopUtil - Deleting target/output/clusters-2 22:16:18.578 [main] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 22:16:19.072 [main] INFO o.a.h.m.lib.input.FileInputFormat - Total input paths to process : 1 22:16:19.630 [main] INFO org.apache.hadoop.mapred.JobClient - Running job: job_local_0002 22:16:19.632 [Thread-28] INFO o.a.h.m.lib.input.FileInputFormat - Total input paths to process : 1 22:16:20.622 [Thread-28] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100 22:16:20.719 [main] INFO org.apache.hadoop.mapred.JobClient - map 0% reduce 0% 22:16:22.272 [Thread-28] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720 22:16:22.273 [Thread-28] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680 22:16:22.321 [Thread-28] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output 22:16:22.323 [Thread-28] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0 22:16:22.326 [Thread-28] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting 22:16:22.327 [Thread-28] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:22.327 [Thread-28] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done. 22:16:22.358 [Thread-28] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:22.360 [Thread-28] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments 22:16:22.360 [Thread-28] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 239 bytes 22:16:22.361 [Thread-28] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:22.428 [Thread-28] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting 22:16:22.429 [Thread-28] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:22.430 [Thread-28] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now 22:16:22.434 [Thread-28] INFO o.a.h.m.l.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to target/output/clusters-2 22:16:22.435 [Thread-28] INFO o.a.hadoop.mapred.LocalJobRunner - reduce > reduce 22:16:22.436 [Thread-28] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done. 22:16:23.265 [main] INFO org.apache.hadoop.mapred.JobClient - map 100% reduce 100% 22:16:23.266 [main] INFO org.apache.hadoop.mapred.JobClient - Job complete: job_local_0002 22:16:23.266 [main] INFO org.apache.hadoop.mapred.JobClient - Counters: 12 22:16:23.267 [main] INFO org.apache.hadoop.mapred.JobClient - FileSystemCounters 22:16:23.267 [main] INFO org.apache.hadoop.mapred.JobClient - FILE_BYTES_READ=5484503 22:16:23.267 [main] INFO org.apache.hadoop.mapred.JobClient - FILE_BYTES_WRITTEN=5583630 22:16:23.267 [main] INFO org.apache.hadoop.mapred.JobClient - Map-Reduce Framework 22:16:23.268 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce input groups=3 22:16:23.268 [main] INFO org.apache.hadoop.mapred.JobClient - Combine output records=3 22:16:23.268 [main] INFO org.apache.hadoop.mapred.JobClient - Map input records=9 22:16:23.269 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce shuffle bytes=0 22:16:23.269 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce output records=3 22:16:23.269 [main] INFO org.apache.hadoop.mapred.JobClient - Spilled Records=6 22:16:23.269 [main] INFO org.apache.hadoop.mapred.JobClient - Map output bytes=675 22:16:23.269 [main] INFO org.apache.hadoop.mapred.JobClient - Combine input records=9 22:16:23.269 [main] INFO org.apache.hadoop.mapred.JobClient - Map output records=9 22:16:23.270 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce input records=3 22:16:23.273 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - K-Means Iteration 3 22:16:23.274 [main] INFO o.a.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 22:16:23.289 [main] INFO org.apache.mahout.common.HadoopUtil - Deleting target/output/clusters-3 22:16:23.291 [main] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 22:16:23.496 [main] INFO o.a.h.m.lib.input.FileInputFormat - Total input paths to process : 1 22:16:24.679 [Thread-41] INFO o.a.h.m.lib.input.FileInputFormat - Total input paths to process : 1 22:16:24.690 [main] INFO org.apache.hadoop.mapred.JobClient - Running job: job_local_0003 22:16:24.729 [Thread-41] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100 22:16:25.043 [Thread-41] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720 22:16:25.044 [Thread-41] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680 22:16:25.101 [Thread-41] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output 22:16:25.103 [Thread-41] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0 22:16:25.106 [Thread-41] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting 22:16:25.107 [Thread-41] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:25.107 [Thread-41] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0003_m_000000_0' done. 22:16:25.113 [Thread-41] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:25.114 [Thread-41] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments 22:16:25.115 [Thread-41] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 239 bytes 22:16:25.115 [Thread-41] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:25.190 [Thread-41] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting 22:16:25.191 [Thread-41] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:25.191 [Thread-41] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0003_r_000000_0 is allowed to commit now 22:16:25.195 [Thread-41] INFO o.a.h.m.l.output.FileOutputCommitter - Saved output of task 'attempt_local_0003_r_000000_0' to target/output/clusters-3 22:16:25.196 [Thread-41] INFO o.a.hadoop.mapred.LocalJobRunner - reduce > reduce 22:16:25.196 [Thread-41] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0003_r_000000_0' done. 22:16:25.702 [main] INFO org.apache.hadoop.mapred.JobClient - map 100% reduce 100% 22:16:25.703 [main] INFO org.apache.hadoop.mapred.JobClient - Job complete: job_local_0003 22:16:25.704 [main] INFO org.apache.hadoop.mapred.JobClient - Counters: 13 22:16:25.704 [main] INFO org.apache.hadoop.mapred.JobClient - Clustering 22:16:25.705 [main] INFO org.apache.hadoop.mapred.JobClient - Converged Clusters=3 22:16:25.705 [main] INFO org.apache.hadoop.mapred.JobClient - FileSystemCounters 22:16:25.705 [main] INFO org.apache.hadoop.mapred.JobClient - FILE_BYTES_READ=8227859 22:16:25.706 [main] INFO org.apache.hadoop.mapred.JobClient - FILE_BYTES_WRITTEN=8374758 22:16:25.706 [main] INFO org.apache.hadoop.mapred.JobClient - Map-Reduce Framework 22:16:25.706 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce input groups=3 22:16:25.706 [main] INFO org.apache.hadoop.mapred.JobClient - Combine output records=3 22:16:25.707 [main] INFO org.apache.hadoop.mapred.JobClient - Map input records=9 22:16:25.707 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce shuffle bytes=0 22:16:25.707 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce output records=3 22:16:25.708 [main] INFO org.apache.hadoop.mapred.JobClient - Spilled Records=6 22:16:25.709 [main] INFO org.apache.hadoop.mapred.JobClient - Map output bytes=675 22:16:25.709 [main] INFO org.apache.hadoop.mapred.JobClient - Combine input records=9 22:16:25.710 [main] INFO org.apache.hadoop.mapred.JobClient - Map output records=9 22:16:25.710 [main] INFO org.apache.hadoop.mapred.JobClient - Reduce input records=3 22:16:25.713 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - Clustering data 22:16:25.714 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - Running Clustering 22:16:25.714 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - Input: target/input/points Clusters In: target/output/clusters-3 Out: target/output/clusteredPoints Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure@343a9d95 22:16:25.714 [main] INFO o.a.m.clustering.kmeans.KMeansDriver - convergence: 0.0010 Input Vectors: org.apache.mahout.math.VectorWritable 22:16:25.714 [main] INFO o.a.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 22:16:25.730 [main] INFO org.apache.mahout.common.HadoopUtil - Deleting target/output/clusteredPoints 22:16:25.732 [main] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 22:16:25.932 [main] INFO o.a.h.m.lib.input.FileInputFormat - Total input paths to process : 1 22:16:26.259 [main] INFO org.apache.hadoop.mapred.JobClient - Running job: job_local_0004 22:16:26.270 [Thread-54] INFO o.a.h.m.lib.input.FileInputFormat - Total input paths to process : 1 22:16:26.404 [Thread-54] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting 22:16:26.405 [Thread-54] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:26.405 [Thread-54] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0004_m_000000_0 is allowed to commit now 22:16:26.410 [Thread-54] INFO o.a.h.m.l.output.FileOutputCommitter - Saved output of task 'attempt_local_0004_m_000000_0' to target/output/clusteredPoints 22:16:26.411 [Thread-54] INFO o.a.hadoop.mapred.LocalJobRunner - 22:16:26.411 [Thread-54] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0004_m_000000_0' done. 22:16:27.261 [main] INFO org.apache.hadoop.mapred.JobClient - map 100% reduce 0% 22:16:27.261 [main] INFO org.apache.hadoop.mapred.JobClient - Job complete: job_local_0004 22:16:27.262 [main] INFO org.apache.hadoop.mapred.JobClient - Counters: 5 22:16:27.262 [main] INFO org.apache.hadoop.mapred.JobClient - FileSystemCounters 22:16:27.262 [main] INFO org.apache.hadoop.mapred.JobClient - FILE_BYTES_READ=5484682 22:16:27.262 [main] INFO org.apache.hadoop.mapred.JobClient - FILE_BYTES_WRITTEN=5581897 22:16:27.262 [main] INFO org.apache.hadoop.mapred.JobClient - Map-Reduce Framework 22:16:27.263 [main] INFO org.apache.hadoop.mapred.JobClient - Map input records=9 22:16:27.263 [main] INFO org.apache.hadoop.mapred.JobClient - Spilled Records=0 22:16:27.263 [main] INFO org.apache.hadoop.mapred.JobClient - Map output records=9