Estimation of ldig (twitter Language Detection) for LIGA dataset

LIGA[Tromp+ 11] is a twitter language detection for 6 languages (German, English, Spanish, French, Italian and Dutch). It uses a graph with 3-grams for long distance features and detects 95-98% accuracy. They open their dataset here which has 9066 tweets,

twitter replaces a string ‘\u2028’ into ‘\u2070’

I posted a tweet about Unicode's line feed code, including a string '\u2028'. Then it was replaced '\u2070'! Hence not only '\u2028′(LINE SEPARATOR) but also '\u2029′(PARAGRAPH SEPARATOR) is done so, twitter intends to do something (awful? :P) for line feed

