Author Archives: shuyo

Python implementation of Labeled LDA (Ramage+ EMNLP2009)

Labeled LDA (D. Ramage, D. Hall, R. Nallapati and C.D. Manning; EMNLP2009) is a supervised topic model derived from LDA (Blei+ 2003). While LDA’s estimated topics don’t often equal to human’s expectation because it is unsupervised, Labeled LDA is to … Continue reading

Posted in Uncategorized | 6 Comments

HDP-LDA updates

Hierarchical Dirichlet Processes (Teh+ 2006) are a nonparametric bayesian topic model which can treat infinite topics. In particular, HDP-LDA is interesting as an extention of LDA. (Teh+ 2006) introduced updates of Collapsed Gibbs sampling for a general framework of HDP, … Continue reading

Posted in LDA, Machine Learning, Nonparametric Bayesian | 4 Comments

[Kim+ ICML12] Dirichlet Process with Mixed Random Measures

We held a private reading meeting for ICML 2012. I took and introduced [Kim+ ICML12] “Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data.” This is the presentation for it. DP-MRM [Kim+ ICML12] is a … Continue reading

Posted in LDA, Machine Learning, Nonparametric Bayesian | 6 Comments

Short Text Language Detection with Infinity-Gram

I talked about language detection (language identification) for twitter at NAIST(NARA Institute of Science and Technology). This is its slide. Tweets are too short to detect their languages precisely. I guess that one reason is because features extracted from a … Continue reading

Posted in Language Detection, NLP | 24 Comments

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

In April 2012, We held a private reading meeting for NIPS 2011. I read “Iterative Learning for Reliable Crowdsourcing Systems” [Karger+ NIPS11]. [Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems View more presentations from Shuyo Nakatani This paper targets Amazon … Continue reading

Posted in NLP | Leave a comment

[Freedman+ EMNLP11] Extreme Extraction – Machine Reading in a Week

In December 2011, We held a private reading meeting for EMNLP 2011. I read “Extreme Extraction – Machine Reading in a Week” [Freedman+ EMNLP11]. Extreme Extraction – Machine Reading in a Week View more presentations from Shuyo Nakatani This paper … Continue reading

Posted in NLP | Leave a comment

Why is Norwegian and Danish identification difficult?

I re-post the estimation table of ldig (twitter language detection). lang size detected correct precision recall cs 5329 5330 5319 0.9979 0.9981 da 5478 5483 5311 0.9686 0.9695 de 10065 10076 10014 0.9938 0.9949 en 9701 9670 9569 0.9896 0.9864 … Continue reading

Posted in Language Detection | 8 Comments