Hierarchical Dirichlet Processes (Teh+ 2006) are a nonparametric bayesian topic model which can treat infinite topics.

In particular, HDP-LDA is interesting as an extention of LDA.

(Teh+ 2006) introduced updates of Collapsed Gibbs sampling for a general framework of HDP, but not for HDP-LDA.

To obtain updates of HDP-LDA, it is necessary to apply the base measure H and the emission F(phi) on HDP-LDA’s setting into the below equation:

, (eq. 30 on [Teh+ 2006])

where h is a probabilistic density function of H and f is one of F.

In the case of HDP-LDA, H is a Dirichlet distribution over vocabulary and F is a topic-word multinominal distribution, that is

where ,

.

To substitute these for equation (30), we obtain

,

where

We also need f_k^new when t takes a new table. It is obtained as the following:

And it is necessary to write down f_k(x_jt) also for sampling k.

For

(it means “term count of word w with topic k”)

(excluding ),

When implementation in Python, it is faster not to unfold Gamma functions than another. It is necessary to use these logarithms in either case, or f_k(x_jt) must overflow float range.

Finally,

### Like this:

Like Loading...

*Related*

Hi, thank you so much for your explanation here. I have a question about this process. I found that in chong wang’s code, he “sample_tables(d_state, q, f) ” for each document after he sampled all words in this doc. I am curious why he did this. Do you have any idea?

Though I cannot tell certain things, that is to shorten learning, isn’t it?

I’m trying to fill in the steps in your derivation of (30). Do you have any insight on the missing steps here: http://mathb.in/34749?key=f1b1b8e9c8ef6386abf89eb81f6a23347485e887. It is something to do with the conjugacy, right?

I think I got it: http://i.imgur.com/tKAe9Yr.png. Needed to see that there are two normalizing coefficients of a Dirichet distribution hidden in there.