• Home /
  • Enrichments /
  • Language Detection 1.0
  • Language Detection 1.0


    In premium products, Gnip conducts an activity-by-activity analysis to attempt to classify the language of each activity’s text as a specific language. Currently, we support the following languages: Arabic (ar), Chinese (zh), Danish (da), Dutch (nl), English (en), Finnish (fi), French (fr), German (de), Greek (el), Hebrew (he), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Norwegian (no), Persian (fa), Polish (pl), Portuguese (pt), Russian (ru), Spanish (es), Swedish (sv), Thai (th), Turkish (Tr), and Ukrainian (uk).

    Limitations

    • The Language Detection enrichment attempts to determine the language of a Tweet based on the limited text available. The result may not be accurate in all cases due to limitations of language detection with short text strings, the use of multiple languages in a single activity, etc.
    • If an activity is too short or contains insufficient content to make a definitive decision, we will not make a language classification.
    • Gnip offers a Power Track operator to filter on this field, but note that where an activity is not classified as any language, it will not match any rule containing a language filter. However, you may use these operators as negations, to exclude activities with the Gnip language classification values you specify.
    • Language classification is not available on the Data Collector, or on data sources which have no text to classify (e.g. Foursquare).

    Sample Payload

    "gnip":
    {
        "language":
        {
            "value": "en"
        }
    }