Latent Dirichlet Allocation (LDA) is a generative model which is used as a language topic model and so on.
Each random variable means the following
- θ : document-topic distribution,
- φ : topic-word distribution,
- Z : word topic,
- W : word,
There are some populaer estimation methods for LDA, and Collapsed Gibbs sampling (CGS) is one of them.
This method is to integral out random variables except for word topic {z_mn} and draw each z_mn from posterior.
The posterior of z_mn is the following:
where n_mz is a word count of document m with topic z, n_tz is a count of word t with topic z, n_z is a word count with topic z and -mn means “except z_mn.”
The estimation iterates until its perplexity converges or appropriate times.
where
and n_m is a word count of document m.
However perplexities usually decrease as learnings are progressing, my experiment told some different tendencies.
Continued on the next post.

Pingback: Quora