Collapsed Gibbs Sampling Estimation for Latent Dirichlet Allocation (1)

Latent Dirichlet Allocation (LDA) is a generative model which is used as a language topic model and so on.

Graphical model of LDA

Graphical model of LDA (this figure from Wikipedia)

Each random variable means the following

  • θ : document-topic distribution, document-topic multinomial drawn from Dirichlet distribution
  • φ : topic-word distribution, topic-word multinomial drawn from Dirichlet distribution
  • Z : word topic, word topic drawn from multinomial
  • W : word, word drawn from multinomial

There are some populaer estimation methods for LDA, and Collapsed Gibbs sampling (CGS) is one of them.
This method is to integral out random variables except for word topic {z_mn} and draw each z_mn from posterior.
The posterior of z_mn is the following:

Collapsed Gibbs sampling of LDA

where n_mz is a word count of document m with topic z, n_tz is a count of word t with topic z, n_z is a word count with topic z and -mn means “except z_mn.”
The estimation iterates until its perplexity converges or appropriate times.

Perplexity of LDA


Topic-word distributions
Document-topic distributions

and n_m is a word count of document m.
However perplexities usually decrease as learnings are progressing, my experiment told some different tendencies.

Continued on the next post.

This entry was posted in LDA, Machine Learning. Bookmark the permalink.

7 Responses to Collapsed Gibbs Sampling Estimation for Latent Dirichlet Allocation (1)

  1. Pingback: Quora

  2. B says:

    Hi, Thanks for your help. Why you don’t have minus before exp in perplexity?

  3. nikhil says:

    hi shuyo…
    your code is good..although i havent tried them…..
    but before coding..i need to understand how actualy the equations of LDA are derived….
    m having trouble in understanding these equations..all papers i have been through…talk of messy mathematics..and somewhere in between i get stuck….
    it would be very helpful to me.nd many others…if you could write up a small concrete example..explaining how actualy topics infered..for one or two iterations….describing along equations….
    since you hve wriiten the would be easy to you……
    thank you 🙂

    • shuyo says:

      I cannot tell it plainly without understanding probabilistic model and optimization.
      I suppose you should try “a small concrete example” as you mentioned.

  4. Cherries says:

    I am wondering why don’t we need p(θ|α) in perplexity(w) equation.


  5. W says:

    Hi, thank for explaining LDA. I’m wondering why don’t normalize the prior term (theta_mz) in the posterior formula?

    The term with beta has denominator and nominator, whereas the term with alpha only has nomimator

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s