Hierarchical Dirichlet Processes (Teh+ 2006) are a nonparametric bayesian topic model which can treat infinite topics.
In particular, HDP-LDA is interesting as an extention of LDA.

(Teh+ 2006) introduced updates of Collapsed Gibbs sampling for a general framework of HDP, but not for HDP-LDA.
To obtain updates of HDP-LDA, it is necessary to apply the base measure H and the emission F(phi) on HDP-LDA’s setting into the below equation: , (eq. 30 on [Teh+ 2006])

where h is a probabilistic density function of H and f is one of F.
In the case of HDP-LDA, H is a Dirichlet distribution over vocabulary and F is a topic-word multinominal distribution, that is where , .

To substitute these for equation (30), we obtain    ,

where We also need f_k^new when t takes a new table. It is obtained as the following:  And it is necessary to write down f_k(x_jt) also for sampling k.  For (it means “term count of word w with topic k”) (excluding ),  When implementation in Python, it is faster not to unfold Gamma functions than another. It is necessary to use these logarithms in either case, or f_k(x_jt) must overflow float range.

Finally, ## 9 thoughts on “HDP-LDA updates”

1. ming says:

Hi, thank you so much for your explanation here. I have a question about this process. I found that in chong wang’s code, he “sample_tables(d_state, q, f) ” for each document after he sampled all words in this doc. I am curious why he did this. Do you have any idea?

1. shuyo says:

Though I cannot tell certain things, that is to shorten learning, isn’t it?

1. Tim Hopper says:

I think I got it: http://i.imgur.com/tKAe9Yr.png. Needed to see that there are two normalizing coefficients of a Dirichet distribution hidden in there.

2. ljessons says:

Dear Shuyo:
Thank your very much for deriving the formula and implementing the hdp LDA in python, I’m trying to learn hdp LDA these days and learned a lot from your code , but i am still confused about your code in the implementation of word distribution, document distribution and perplexity.Could you write another article to derive these formula, It will help me a lot to understand hdp.
Exactly, i have been learning hdp lda for half a month, and still don’t how to draw the graphical model of hdp lda that contained the indicator varables z, t. I don’t know where to ask for help, nobody around me knows about graphical model ,nor hdp. Any help would be very appreciate.
I hope you can send a email at your earliest convenience, this is my Gmail: ljessons93@gmail.com.
Best wishes to you, ljessons.

1. ljessons says:

Finally, by forcing myself to read Teh’s paper over and over again, I know how to draw the graphical model of hdp lda. In fact, I knew how to draw graphical model of the hierarchical DP mixture model, it is just i didn’t know why Teh first sampling t and next sampling k, and that made me not confident to draw the graphical model of hdp lda(contained indicator variable of table t and dish k). And the reason why Teh first sampling t and next sampling k is due to collapsed gibbs sampling rule, nothing else.
Any way , Shuyo,thank you very much for derive the above formulations. It’s very good work.

3. gqgq says:

Thank you so much for your kind sharing. I am confused about the codes to get the “topic distribution for document” when calculating the perplexity. Could you share me the mathematic equation on how to get that? I think there are two Dirichlet distribution for (t&k) to multiple a Multinomial distribution for Nm. I have some problems in solving this step.
Thanks so much!

1. gqgq says:

sorry for the typing error. It should be “for (t&k) to multiply a Multinomial distribution”

4. YinYou says:

Many thanks for your excellent detailed derivative!