fbpx
Wikipedia

Bayesian interpretation of kernel regularization

Within bayesian statistics for machine learning, kernel methods arise from the assumption of an inner product space or similarity structure on inputs. For some such methods, such as support vector machines (SVMs), the original formulation and its regularization were not Bayesian in nature. It is helpful to understand them from a Bayesian perspective. Because the kernels are not necessarily positive semidefinite, the underlying structure may not be inner product spaces, but instead more general reproducing kernel Hilbert spaces. In Bayesian probability kernel methods are a key component of Gaussian processes, where the kernel function is known as the covariance function. Kernel methods have traditionally been used in supervised learning problems where the input space is usually a space of vectors while the output space is a space of scalars. More recently these methods have been extended to problems that deal with multiple outputs such as in multi-task learning.[1]

A mathematical equivalence between the regularization and the Bayesian point of view is easily proved in cases where the reproducing kernel Hilbert space is finite-dimensional. The infinite-dimensional case raises subtle mathematical issues; we will consider here the finite-dimensional case. We start with a brief review of the main ideas underlying kernel methods for scalar learning, and briefly introduce the concepts of regularization and Gaussian processes. We then show how both points of view arrive at essentially equivalent estimators, and show the connection that ties them together.

The supervised learning problem edit

The classical supervised learning problem requires estimating the output for some new input point   by learning a scalar-valued estimator   on the basis of a training set   consisting of   input-output pairs,  .[2] Given a symmetric and positive bivariate function   called a kernel, one of the most popular estimators in machine learning is given by

 

(1)

where   is the kernel matrix with entries  ,  , and  . We will see how this estimator can be derived both from a regularization and a Bayesian perspective.

A regularization perspective edit

The main assumption in the regularization perspective is that the set of functions   is assumed to belong to a reproducing kernel Hilbert space  .[2][3][4][5]

Reproducing kernel Hilbert space edit

A reproducing kernel Hilbert space (RKHS)   is a Hilbert space of functions defined by a symmetric, positive-definite function   called the reproducing kernel such that the function   belongs to   for all  .[6][7][8] There are three main properties make an RKHS appealing:

1. The reproducing property, which gives name to the space,

 

where   is the inner product in  .

2. Functions in an RKHS are in the closure of the linear combination of the kernel at given points,

 .

This allows the construction in a unified framework of both linear and generalized linear models.

3. The squared norm in an RKHS can be written as

 

and could be viewed as measuring the complexity of the function.

The regularized functional edit

The estimator is derived as the minimizer of the regularized functional

 

(2)

where   and   is the norm in  . The first term in this functional, which measures the average of the squares of the errors between the   and the  , is called the empirical risk and represents the cost we pay by predicting   for the true value  . The second term in the functional is the squared norm in a RKHS multiplied by a weight   and serves the purpose of stabilizing the problem[3][5] as well as of adding a trade-off between fitting and complexity of the estimator.[2] The weight  , called the regularizer, determines the degree to which instability and complexity of the estimator should be penalized (higher penalty for increasing value of  ).

Derivation of the estimator edit

The explicit form of the estimator in equation (1) is derived in two steps. First, the representer theorem[9][10][11] states that the minimizer of the functional (2) can always be written as a linear combination of the kernels centered at the training-set points,

 

(3)

for some  . The explicit form of the coefficients   can be found by substituting for   in the functional (2). For a function of the form in equation (3), we have that

 

We can rewrite the functional (2) as

 

This functional is convex in   and therefore we can find its minimum by setting the gradient with respect to   to zero,

 

Substituting this expression for the coefficients in equation (3), we obtain the estimator stated previously in equation (1),

 

A Bayesian perspective edit

The notion of a kernel plays a crucial role in Bayesian probability as the covariance function of a stochastic process called the Gaussian process.

A review of Bayesian probability edit

As part of the Bayesian framework, the Gaussian process specifies the prior distribution that describes the prior beliefs about the properties of the function being modeled. These beliefs are updated after taking into account observational data by means of a likelihood function that relates the prior beliefs to the observations. Taken together, the prior and likelihood lead to an updated distribution called the posterior distribution that is customarily used for predicting test cases.

The Gaussian process edit

A Gaussian process (GP) is a stochastic process in which any finite number of random variables that are sampled follow a joint Normal distribution.[12] The mean vector and covariance matrix of the Gaussian distribution completely specify the GP. GPs are usually used as a priori distribution for functions, and as such the mean vector and covariance matrix can be viewed as functions, where the covariance function is also called the kernel of the GP. Let a function   follow a Gaussian process with mean function   and kernel function  ,

 

In terms of the underlying Gaussian distribution, we have that for any finite set   if we let   then

 

where   is the mean vector and   is the covariance matrix of the multivariate Gaussian distribution.

Derivation of the estimator edit

In a regression context, the likelihood function is usually assumed to be a Gaussian distribution and the observations to be independent and identically distributed (iid),

 

This assumption corresponds to the observations being corrupted with zero-mean Gaussian noise with variance  . The iid assumption makes it possible to factorize the likelihood function over the data points given the set of inputs   and the variance of the noise  , and thus the posterior distribution can be computed analytically. For a test input vector  , given the training data  , the posterior distribution is given by

 

where   denotes the set of parameters which include the variance of the noise   and any parameters from the covariance function   and where

 

The connection between regularization and Bayes edit

A connection between regularization theory and Bayesian theory can only be achieved in the case of finite dimensional RKHS. Under this assumption, regularization theory and Bayesian theory are connected through Gaussian process prediction.[3][12]

In the finite dimensional case, every RKHS can be described in terms of a feature map   such that[2]

 

Functions in the RKHS with kernel   can be then be written as

 

and we also have that

 

We can now build a Gaussian process by assuming   to be distributed according to a multivariate Gaussian distribution with zero mean and identity covariance matrix,

 

If we assume a Gaussian likelihood we have

 

where  . The resulting posterior distribution is the given by

 

We can see that a maximum posterior (MAP) estimate is equivalent to the minimization problem defining Tikhonov regularization, where in the Bayesian case the regularization parameter is related to the noise variance.

From a philosophical perspective, the loss function in a regularization setting plays a different role than the likelihood function in the Bayesian setting. Whereas the loss function measures the error that is incurred when predicting   in place of  , the likelihood function measures how likely the observations are from the model that was assumed to be true in the generative process. From a mathematical perspective, however, the formulations of the regularization and Bayesian frameworks make the loss function and the likelihood function to have the same mathematical role of promoting the inference of functions   that approximate the labels   as much as possible.

See also edit

References edit

  1. ^ Álvarez, Mauricio A.; Rosasco, Lorenzo; Lawrence, Neil D. (June 2011). "Kernels for Vector-Valued Functions: A Review". arXiv:1106.6251 [stat.ML].
  2. ^ a b c d Vapnik, Vladimir (1998). Statistical learning theory. Wiley. ISBN 9780471030034.
  3. ^ a b c Wahba, Grace (1990). Spline models for observational data. SIAM.
  4. ^ Schölkopf, Bernhard; Smola, Alexander J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press. ISBN 9780262194754.
  5. ^ a b Girosi, F.; Poggio, T. (1990). "Networks and the best approximation property" (PDF). Biological Cybernetics. 63 (3). Springer: 169–176. doi:10.1007/bf00195855. hdl:1721.1/6017. S2CID 18824241.
  6. ^ Aronszajn, N (May 1950). "Theory of Reproducing Kernels". Transactions of the American Mathematical Society. 68 (3): 337–404. doi:10.2307/1990404. JSTOR 1990404.
  7. ^ Schwartz, Laurent (1964). "Sous-espaces hilbertiens d'espaces vectoriels topologiques et noyaux associés (noyaux reproduisants)". Journal d'Analyse Mathématique. 13 (1). Springer: 115–256. doi:10.1007/bf02786620. S2CID 117202393.
  8. ^ Cucker, Felipe; Smale, Steve (October 5, 2001). "On the mathematical foundations of learning". Bulletin of the American Mathematical Society. 39 (1): 1–49. doi:10.1090/s0273-0979-01-00923-5.
  9. ^ Kimeldorf, George S.; Wahba, Grace (1970). "A correspondence between Bayesian estimation on stochastic processes and smoothing by splines". The Annals of Mathematical Statistics. 41 (2): 495–502. doi:10.1214/aoms/1177697089.
  10. ^ Schölkopf, Bernhard; Herbrich, Ralf; Smola, Alex J. (2001). "A Generalized Representer Theorem". Computational Learning Theory. Lecture Notes in Computer Science. Vol. 2111/2001. pp. 416–426. doi:10.1007/3-540-44581-1_27. ISBN 978-3-540-42343-0.
  11. ^ De Vito, Ernesto; Rosasco, Lorenzo; Caponnetto, Andrea; Piana, Michele; Verri, Alessandro (October 2004). "Some Properties of Regularized Kernel Methods". Journal of Machine Learning Research. 5: 1363–1390.
  12. ^ a b Rasmussen, Carl Edward; Williams, Christopher K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press. ISBN 0-262-18253-X.

bayesian, interpretation, kernel, regularization, this, article, technical, most, readers, understand, please, help, improve, make, understandable, experts, without, removing, technical, details, 2012, learn, when, remove, this, message, within, bayesian, stat. This article may be too technical for most readers to understand Please help improve it to make it understandable to non experts without removing the technical details May 2012 Learn how and when to remove this message Within bayesian statistics for machine learning kernel methods arise from the assumption of an inner product space or similarity structure on inputs For some such methods such as support vector machines SVMs the original formulation and its regularization were not Bayesian in nature It is helpful to understand them from a Bayesian perspective Because the kernels are not necessarily positive semidefinite the underlying structure may not be inner product spaces but instead more general reproducing kernel Hilbert spaces In Bayesian probability kernel methods are a key component of Gaussian processes where the kernel function is known as the covariance function Kernel methods have traditionally been used in supervised learning problems where the input space is usually a space of vectors while the output space is a space of scalars More recently these methods have been extended to problems that deal with multiple outputs such as in multi task learning 1 A mathematical equivalence between the regularization and the Bayesian point of view is easily proved in cases where the reproducing kernel Hilbert space is finite dimensional The infinite dimensional case raises subtle mathematical issues we will consider here the finite dimensional case We start with a brief review of the main ideas underlying kernel methods for scalar learning and briefly introduce the concepts of regularization and Gaussian processes We then show how both points of view arrive at essentially equivalent estimators and show the connection that ties them together Contents 1 The supervised learning problem 2 A regularization perspective 2 1 Reproducing kernel Hilbert space 2 2 The regularized functional 2 3 Derivation of the estimator 3 A Bayesian perspective 3 1 A review of Bayesian probability 3 2 The Gaussian process 3 3 Derivation of the estimator 4 The connection between regularization and Bayes 5 See also 6 ReferencesThe supervised learning problem editThe classical supervised learning problem requires estimating the output for some new input point x displaystyle mathbf x nbsp by learning a scalar valued estimator f x displaystyle hat f mathbf x nbsp on the basis of a training set S displaystyle S nbsp consisting of n displaystyle n nbsp input output pairs S X Y x 1 y 1 x n y n displaystyle S mathbf X mathbf Y mathbf x 1 y 1 ldots mathbf x n y n nbsp 2 Given a symmetric and positive bivariate function k displaystyle k cdot cdot nbsp called a kernel one of the most popular estimators in machine learning is given by f x k K l n I 1 Y displaystyle hat f mathbf x mathbf k top mathbf K lambda n mathbf I 1 mathbf Y nbsp 1 dd where K k X X displaystyle mathbf K equiv k mathbf X mathbf X nbsp is the kernel matrix with entries K i j k x i x j displaystyle mathbf K ij k mathbf x i mathbf x j nbsp k k x 1 x k x n x displaystyle mathbf k k mathbf x 1 mathbf x ldots k mathbf x n mathbf x top nbsp and Y y 1 y n displaystyle mathbf Y y 1 ldots y n top nbsp We will see how this estimator can be derived both from a regularization and a Bayesian perspective A regularization perspective editThe main assumption in the regularization perspective is that the set of functions F displaystyle mathcal F nbsp is assumed to belong to a reproducing kernel Hilbert space H k displaystyle mathcal H k nbsp 2 3 4 5 Reproducing kernel Hilbert space edit A reproducing kernel Hilbert space RKHS H k displaystyle mathcal H k nbsp is a Hilbert space of functions defined by a symmetric positive definite function k X X R displaystyle k mathcal X times mathcal X rightarrow mathbb R nbsp called the reproducing kernel such that the function k x displaystyle k mathbf x cdot nbsp belongs to H k displaystyle mathcal H k nbsp for all x X displaystyle mathbf x in mathcal X nbsp 6 7 8 There are three main properties make an RKHS appealing 1 The reproducing property which gives name to the space f x f k x k f H k displaystyle f mathbf x langle f k mathbf x cdot rangle k quad forall f in mathcal H k nbsp where k displaystyle langle cdot cdot rangle k nbsp is the inner product in H k displaystyle mathcal H k nbsp 2 Functions in an RKHS are in the closure of the linear combination of the kernel at given points f x i k x i x c i displaystyle f mathbf x sum i k mathbf x i mathbf x c i nbsp This allows the construction in a unified framework of both linear and generalized linear models 3 The squared norm in an RKHS can be written as f k 2 i j k x i x j c i c j displaystyle f k 2 sum i j k mathbf x i mathbf x j c i c j nbsp and could be viewed as measuring the complexity of the function The regularized functional edit The estimator is derived as the minimizer of the regularized functional 1 n i 1 n f x i y i 2 l f k 2 displaystyle frac 1 n sum i 1 n f mathbf x i y i 2 lambda f k 2 nbsp 2 dd where f H k displaystyle f in mathcal H k nbsp and k displaystyle cdot k nbsp is the norm in H k displaystyle mathcal H k nbsp The first term in this functional which measures the average of the squares of the errors between the f x i displaystyle f mathbf x i nbsp and the y i displaystyle y i nbsp is called the empirical risk and represents the cost we pay by predicting f x i displaystyle f mathbf x i nbsp for the true value y i displaystyle y i nbsp The second term in the functional is the squared norm in a RKHS multiplied by a weight l displaystyle lambda nbsp and serves the purpose of stabilizing the problem 3 5 as well as of adding a trade off between fitting and complexity of the estimator 2 The weight l displaystyle lambda nbsp called the regularizer determines the degree to which instability and complexity of the estimator should be penalized higher penalty for increasing value of l displaystyle lambda nbsp Derivation of the estimator edit The explicit form of the estimator in equation 1 is derived in two steps First the representer theorem 9 10 11 states that the minimizer of the functional 2 can always be written as a linear combination of the kernels centered at the training set points f x i 1 n c i k x i x k c displaystyle hat f mathbf x sum i 1 n c i k mathbf x i mathbf x mathbf k top mathbf c nbsp 3 dd for some c R n displaystyle mathbf c in mathbb R n nbsp The explicit form of the coefficients c c 1 c n displaystyle mathbf c c 1 ldots c n top nbsp can be found by substituting for f displaystyle f cdot nbsp in the functional 2 For a function of the form in equation 3 we have that f k 2 f f k i 1 N c i k x i j 1 N c j k x j k i 1 N j 1 N c i c j k x i k x j k i 1 N j 1 N c i c j k x i x j c K c displaystyle begin aligned f k 2 amp langle f f rangle k amp left langle sum i 1 N c i k mathbf x i cdot sum j 1 N c j k mathbf x j cdot right rangle k amp sum i 1 N sum j 1 N c i c j langle k mathbf x i cdot k mathbf x j cdot rangle k amp sum i 1 N sum j 1 N c i c j k mathbf x i mathbf x j amp mathbf c top mathbf K mathbf c end aligned nbsp We can rewrite the functional 2 as 1 n y K c 2 l c K c displaystyle frac 1 n mathbf y mathbf K mathbf c 2 lambda mathbf c top mathbf K mathbf c nbsp This functional is convex in c displaystyle mathbf c nbsp and therefore we can find its minimum by setting the gradient with respect to c displaystyle mathbf c nbsp to zero 1 n K Y K c l K c 0 K l n I c Y c K l n I 1 Y displaystyle begin aligned frac 1 n mathbf K mathbf Y mathbf K mathbf c lambda mathbf K mathbf c amp 0 mathbf K lambda n mathbf I mathbf c amp mathbf Y mathbf c amp mathbf K lambda n mathbf I 1 mathbf Y end aligned nbsp Substituting this expression for the coefficients in equation 3 we obtain the estimator stated previously in equation 1 f x k K l n I 1 Y displaystyle hat f mathbf x mathbf k top mathbf K lambda n mathbf I 1 mathbf Y nbsp A Bayesian perspective editThe notion of a kernel plays a crucial role in Bayesian probability as the covariance function of a stochastic process called the Gaussian process A review of Bayesian probability edit As part of the Bayesian framework the Gaussian process specifies the prior distribution that describes the prior beliefs about the properties of the function being modeled These beliefs are updated after taking into account observational data by means of a likelihood function that relates the prior beliefs to the observations Taken together the prior and likelihood lead to an updated distribution called the posterior distribution that is customarily used for predicting test cases The Gaussian process edit A Gaussian process GP is a stochastic process in which any finite number of random variables that are sampled follow a joint Normal distribution 12 The mean vector and covariance matrix of the Gaussian distribution completely specify the GP GPs are usually used as a priori distribution for functions and as such the mean vector and covariance matrix can be viewed as functions where the covariance function is also called the kernel of the GP Let a function f displaystyle f nbsp follow a Gaussian process with mean function m displaystyle m nbsp and kernel function k displaystyle k nbsp f G P m k displaystyle f sim mathcal GP m k nbsp In terms of the underlying Gaussian distribution we have that for any finite set X x i i 1 n displaystyle mathbf X mathbf x i i 1 n nbsp if we let f X f x 1 f x n displaystyle f mathbf X f mathbf x 1 ldots f mathbf x n top nbsp then f X N m K displaystyle f mathbf X sim mathcal N mathbf m mathbf K nbsp where m m X m x 1 m x N displaystyle mathbf m m mathbf X m mathbf x 1 ldots m mathbf x N top nbsp is the mean vector and K k X X displaystyle mathbf K k mathbf X mathbf X nbsp is the covariance matrix of the multivariate Gaussian distribution Derivation of the estimator edit Further information Minimum mean square error Linear MMSE estimator for linear observation process In a regression context the likelihood function is usually assumed to be a Gaussian distribution and the observations to be independent and identically distributed iid p y f x s 2 N f x s 2 displaystyle p y f mathbf x sigma 2 mathcal N f mathbf x sigma 2 nbsp This assumption corresponds to the observations being corrupted with zero mean Gaussian noise with variance s 2 displaystyle sigma 2 nbsp The iid assumption makes it possible to factorize the likelihood function over the data points given the set of inputs X displaystyle mathbf X nbsp and the variance of the noise s 2 displaystyle sigma 2 nbsp and thus the posterior distribution can be computed analytically For a test input vector x displaystyle mathbf x nbsp given the training data S X Y displaystyle S mathbf X mathbf Y nbsp the posterior distribution is given by p f x S x ϕ N m x s 2 x displaystyle p f mathbf x S mathbf x boldsymbol phi mathcal N m mathbf x sigma 2 mathbf x nbsp where ϕ displaystyle boldsymbol phi nbsp denotes the set of parameters which include the variance of the noise s 2 displaystyle sigma 2 nbsp and any parameters from the covariance function k displaystyle k nbsp and where m x k K s 2 I 1 Y s 2 x k x x k K s 2 I 1 k displaystyle begin aligned m mathbf x amp mathbf k top mathbf K sigma 2 mathbf I 1 mathbf Y sigma 2 mathbf x amp k mathbf x mathbf x mathbf k top mathbf K sigma 2 mathbf I 1 mathbf k end aligned nbsp The connection between regularization and Bayes editA connection between regularization theory and Bayesian theory can only be achieved in the case of finite dimensional RKHS Under this assumption regularization theory and Bayesian theory are connected through Gaussian process prediction 3 12 In the finite dimensional case every RKHS can be described in terms of a feature map F X R p displaystyle Phi mathcal X rightarrow mathbb R p nbsp such that 2 k x x i 1 p F i x F i x displaystyle k mathbf x mathbf x sum i 1 p Phi i mathbf x Phi i mathbf x nbsp Functions in the RKHS with kernel K displaystyle mathbf K nbsp can be then be written as f w x i 1 p w i F i x w F x displaystyle f mathbf w mathbf x sum i 1 p mathbf w i Phi i mathbf x langle mathbf w Phi mathbf x rangle nbsp and we also have that f w k w displaystyle f mathbf w k mathbf w nbsp We can now build a Gaussian process by assuming w w 1 w p displaystyle mathbf w w 1 ldots w p top nbsp to be distributed according to a multivariate Gaussian distribution with zero mean and identity covariance matrix w N 0 I exp w 2 displaystyle mathbf w sim mathcal N 0 mathbf I propto exp mathbf w 2 nbsp If we assume a Gaussian likelihood we have P Y X f N f X s 2 I exp 1 s 2 f w X Y 2 displaystyle P mathbf Y mathbf X f mathcal N f mathbf X sigma 2 mathbf I propto exp left frac 1 sigma 2 f mathbf w mathbf X mathbf Y 2 right nbsp where f w X w F x 1 w F x n displaystyle f mathbf w mathbf X langle mathbf w Phi mathbf x 1 rangle ldots langle mathbf w Phi mathbf x n rangle nbsp The resulting posterior distribution is the given by P f X Y exp 1 s 2 f w X Y n 2 w 2 displaystyle P f mathbf X mathbf Y propto exp left frac 1 sigma 2 f mathbf w mathbf X mathbf Y n 2 mathbf w 2 right nbsp We can see that a maximum posterior MAP estimate is equivalent to the minimization problem defining Tikhonov regularization where in the Bayesian case the regularization parameter is related to the noise variance From a philosophical perspective the loss function in a regularization setting plays a different role than the likelihood function in the Bayesian setting Whereas the loss function measures the error that is incurred when predicting f x displaystyle f mathbf x nbsp in place of y displaystyle y nbsp the likelihood function measures how likely the observations are from the model that was assumed to be true in the generative process From a mathematical perspective however the formulations of the regularization and Bayesian frameworks make the loss function and the likelihood function to have the same mathematical role of promoting the inference of functions f displaystyle f nbsp that approximate the labels y displaystyle y nbsp as much as possible See also editRegularized least squares Bayesian linear regression Bayesian interpretation of Tikhonov regularizationReferences edit Alvarez Mauricio A Rosasco Lorenzo Lawrence Neil D June 2011 Kernels for Vector Valued Functions A Review arXiv 1106 6251 stat ML a b c d Vapnik Vladimir 1998 Statistical learning theory Wiley ISBN 9780471030034 a b c Wahba Grace 1990 Spline models for observational data SIAM Scholkopf Bernhard Smola Alexander J 2002 Learning with Kernels Support Vector Machines Regularization Optimization and Beyond MIT Press ISBN 9780262194754 a b Girosi F Poggio T 1990 Networks and the best approximation property PDF Biological Cybernetics 63 3 Springer 169 176 doi 10 1007 bf00195855 hdl 1721 1 6017 S2CID 18824241 Aronszajn N May 1950 Theory of Reproducing Kernels Transactions of the American Mathematical Society 68 3 337 404 doi 10 2307 1990404 JSTOR 1990404 Schwartz Laurent 1964 Sous espaces hilbertiens d espaces vectoriels topologiques et noyaux associes noyaux reproduisants Journal d Analyse Mathematique 13 1 Springer 115 256 doi 10 1007 bf02786620 S2CID 117202393 Cucker Felipe Smale Steve October 5 2001 On the mathematical foundations of learning Bulletin of the American Mathematical Society 39 1 1 49 doi 10 1090 s0273 0979 01 00923 5 Kimeldorf George S Wahba Grace 1970 A correspondence between Bayesian estimation on stochastic processes and smoothing by splines The Annals of Mathematical Statistics 41 2 495 502 doi 10 1214 aoms 1177697089 Scholkopf Bernhard Herbrich Ralf Smola Alex J 2001 A Generalized Representer Theorem Computational Learning Theory Lecture Notes in Computer Science Vol 2111 2001 pp 416 426 doi 10 1007 3 540 44581 1 27 ISBN 978 3 540 42343 0 De Vito Ernesto Rosasco Lorenzo Caponnetto Andrea Piana Michele Verri Alessandro October 2004 Some Properties of Regularized Kernel Methods Journal of Machine Learning Research 5 1363 1390 a b Rasmussen Carl Edward Williams Christopher K I 2006 Gaussian Processes for Machine Learning The MIT Press ISBN 0 262 18253 X Retrieved from https en wikipedia org w index php title Bayesian interpretation of kernel regularization amp oldid 1173786168, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.