It is often the case that we know something about the process generating the data. For example, DNA sequences have been generated through evolution in a series of modifications from ancestor sequences, text can be viewed as being generated by a source of words perhaps reflecting the topic of the document, a time series may have been generated by a dynamical system of a certain type, 2-dimensional images by projections of a 3-dimensional scene, and so on.
For all of these data sources we have some, albeit imperfect, knowledge about the source generating the data, and hence of the type of invariances, features and similarities (in a word, patterns) that we can expect it to contain. Even simple and approximate models of the data can be used to create kernels that take account of the insight thus afforded.
Models of data can be either deterministic or probabilistic and there are also several different ways of turning them into kernels. In fact some of the kernels we have already encountered can be regarded as derived from generative data models.
However in this chapter we put the emphasis on generative models of the data that are frequently pre-existing. We aim to show how these models can be exploited to provide features for an embedding function for which the corresponding kernel can be efficiently evaluated.
Although the emphasis will be mainly on two classes of kernels induced by probabilistic models, P-kernels and Fisher kernels, other methods exist to incorporate generative information into kernel design.