Firehose streams provide streaming realtime access to unfiltered data. The content delivered in Firehose streams is pre-defined, and is not based on rules or keywords defined by the customer. However, there are several variations of the Firehose stream that may be used by Gnip customers – some types are only available for specific sources.
Full firehose streams provide 100% of the publisher’s realtime firehose to your app, with no additional limitations. Full firehose streams are available for the following sources, although note that in some cases, different types of data are split into separate Firehose streams (e.g. Tumblr Posts Firehose vs. Tumblr Likes Firehose).
Decahose provides a 10% random sample of the realtime Twitter Firehose through a streaming connection. This is accomplished via a realtime sampling algorithm which randomly selects the data, while still allowing for the expected low-latency delivery of data as it is sent through the firehose by Twitter.
User Mention Stream provides a realtime stream of every Tweet in the Twitter firehose that contains a “mention” of a Twitter user, such as @replies and retweets. Data is delivered in bulk, and does not support additional filtering (e.g. for keywords).
Compliance Stream provides a realtime stream of every Twitter compliance event in the firehose, including Tweet Deletes, User Deletes and Undeletes, and Scrub Geo activities. This data is used for keeping stored Twitter data in compliance with Twitter’s terms of service.
Firehose streams deliver realtime data while you are connected to the stream, and do not provide access to recent data. The data is delivered via Gnip’s Activity Streams format in all cases, and in some cases, the source’s native data format may also be available. See the specific source pages linked above for details on the respective activity types and formats available.