An improved FastMap algorithm
Taiguo? Qu,Zixing Cai
DOI: https://doi.org/10.13232/j.cnki.jnju.2016.04.013
2016-01-01
Abstract:Classical multidimensional scaling(CMDS)is a very common method for dimensionality reduction,data vi-sualization,machine learning,pattern recognition,etc.When the distance is Euclidean,the CMDS solution can be viewed as the proj ections of the samples onto the principal axes of the sample set.The problem of CMDS is that its time increases rapidly with the increase of data size.As one of the fast variants of CMDS,the FastMap algorithm has been widely used in many fields.It includes some recursive proj ections.And each proj ection consists of three steps. Firstly,a pivot is obtained by passing through two far apart points(referred to as pivot points from now on);then the samples are proj ected onto the pivot and the coordinates of the samples in a low-dimensional Euclidean space are ob-tained;and finally,the distances between all samples are modified.The shortcoming of the FastMap algorithm is that it can only find approximate solution of CMDS.In this paper,the FastMap algorithm is analyzed in detail.It is found that the essence of the FastMap algorithm is to proj ect the samples onto a set of mutually orthogonal directions de-termined by the pivots.Since these directions are usually different from the principal axes of the sample set,the FastMap algorithm can only get the approximate solution of CMDS.It is also found that the pivots can be selected from a subset provided that its intrinsic dimension is equal to that of the whole sample set.Last but not least,it is found that only the distances between the pivot points and the samples are used to obtain the FastMap coordinates. That is to say,it’s unnecessary to modify the distances between all the samples.Based on the theoretical analysis,an improved algorithm called iFastMap (improved FastMap )is put forward in this paper.By introducing principal component analysis,the FastMap coordinates can be aligned with the CMDS coordinates.As a result,the iFastMap algorithm can find exactly the same solution as that of CMDS.In addition,by selecting a subset with the same intrinsic dimension as that of the whole sample set,choosing the pivots only from this subset,and modifying only the distances between the pivot points and all the samples after each proj ection,the speed of the iFastMap algorithm is further improved.The experimental results verify the complete consistency of the solutions of the iFastMap algorithm and CMDS and high efficiency of the iFastMap algorithm.