Nonlinear dimensionality reduction
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Nonlinear_dimensionality_reduction"
.

High-dimensional data, meaning data which requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded non-linear manifold within the higher dimensional space. If the manifold is of low enough dimension then the data can be visualised in the low dimensional space.

Below are summarized some important algorithms from the history of manifold learning and nonlinear dimensionality reduction. Many of these non-linear dimensionality reduction methods are related to linear methods which are listed below. The non-linear methods can be broadly classified into two groups: those which actually provide a mapping (either from the high dimensional space to the low dimensional embedding or vice versa), and those that just give a visualisation. Typically those that just give a visualisation are based on proximity data - that is, distance measurements.

content

Contents

Linear methods

Non-linear mappings

Perhaps the principal method amongst those that provide a mapping from the high dimensional space to the embedded space is kernel PCA1. This method provides a non-linear principal components analysis (PCA) by applying the kernel trick. Kernel PCA first (implicitly) construct a higher dimensional space, in which there are a large number of linear relations between the dimensions. Subsequently, the low-dimensional data representation is obtained by applying traditional PCA.

Principal curves and manifolds give the natural geometric framework for nonlinear dimensionality reduction and extend the geometric interpretation of PCA by explicitly constructing an embedded manifold, and by encoding using standard geometric projection onto the manifold. How to define the "simplicity" of the manifold is problem-dependent, however, it is commonly measured by the intrinsic dimensionality and/or the smoothness of the manifold.2

Gaussian process latent variable models (GPLVM)3 are a probabilistic non-linear PCA. Like kernel PCA they use a kernel function to form the mapping (in the form of a Gaussian process). However in the GPLVM the mapping is from the embedded space to the data space (like density networks and GTM) whereas in kernel PCA it is in the opposite direction.

Other nonlinear techniques include techniques for locally linear embedding (such as Locally Linear Embedding (LLE)4, Hessian LLE, Laplacian Eigenmaps, and LTSA5). These techniques construct a low-dimensional data representation using a cost function that retains local properties of the data (actually these techniques can be viewed upon as defining a graph-based kernel for Kernel PCA). In this way, the techniques are capable of unfolding datasets such as the Swiss roll. Techniques that employ neighborhood graphs in order to retain global properties of the data include Isomap6 and Maximum Variance Unfolding.

A completely different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special kind of feed-forward neural networks. Although the idea of autoencoders is quite old, training of the encoders has only recently become possible through the use of Restricted Boltzmann machines. Related to autoencoders is the NeuroScale algorithm, which uses stress functions inspired by multidimensional scaling and Sammon mappings (see below) to learn a non-linear mapping from the high dimensional to the embedded space. The mappings in NeuroScale are based on radial basis function networks.

Kohonen maps (also called Self-Organizing Maps or SOM) and its probabilistic variant generative topographic mapping (GTM) use a point representation in the embedded space to form a latent variable model which is based around a non-linear mapping from the embedded space to the high dimensional space. These techniques are related to work on density networks, which also are based around the same probabilistic model.

Methods based on proximity matrices

A method based on proximity matrices is one where the data is presented to the algorithm in the form of a similarity matrix or a distance matrix. These methods all fall under the broader class of metric multidimensional scaling. The variations tend to be differences in how the proximity data is computed; for example, Isomap, locally linear embeddings, maximum variance unfolding, and Sammon mapping (which is not in fact a mapping) are examples of metric multidimensional scaling methods.

See also

References

  1. ^ B. Schölkopf, A. Smola, K.-R. Muller, Kernel Principal Component Analysis, In: Bernhard Schölkopf, Christopher J. C. Burges, Alexander J. Smola (Eds.), Advances in Kernel Methods-Support Vector Learning, 1999, MIT Press Cambridge, MA, USA, 327-352. ISBN 0-262-19416-3
  2. ^ A. Gorban, B. Kegl, D. Wunsch, A. Zinovyev (Eds.), Principal Manifolds for Data Visualisation and Dimension Reduction, LNCSE 58, Springer, Berlin - Heidelberg - New York, 2007. ISBN 978-3-540-73749-0
  3. ^ The Gaussian Processes Web Site
  4. ^ S. T. Roweis and L. K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science Vol 290, 22 December 2000, 2323-2326.
  5. ^ Zhenyue Zhang and Hongyuan Zha, Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment, SIAM Journal on Scientific Computing 26 (1) (2005), 313 - 338.
  6. ^ J. B. Tenenbaum, V. de Silva, J. C. Langford, A Global Geometric Framework for Nonlinear Dimensionality Reduction, Science 290, (2000), 2319-2323.
  7. ^ ELastic MAPs

External links

© jGames.co.uk 2007 (some content from Wikipedia under GDL ) !-- ValueClick Media 468x60 and 728x90 Banner CODE for jgames.co.uk -->
Your Ad Here