PowerTrack provides customers with the ability to filter a data source’s full firehose, and only receive the data that they or their customers are interested in. This is accomplished by applying Gnip’s PowerTrack filtering language to match Tweets based on a wide variety of attributes, including user attributes, geo-location, language, and many others. Using PowerTrack rules to filter a data source ensures that customers receive all of the data, and only the data they need for your app.
Notably, PowerTrack filtering is available in realtime streams (as described here), as well as Replay, Historical PowerTrack, and the Search API. However, this overview focuses solely on the realtime PowerTrack product.
Rules & Filtering
As described, customers add filtering rules to the PowerTrack stream to determine which activities will be sent through the connection. The PowerTrack stream can support thousands of these individual rules, and deliver the combined set of matching activities through the single stream connection.
The set of PowerTrack rules used to filter a customer’s stream is highly flexible. If a customer needs to add a new filtering rule to capture a different type of content, or remove an existing rule, their app can send a request to the PowerTrack API to make it happen. When that request is sent, the filtering rules are automatically modified and the changes simply take effect in the data stream with no need to reconnect. This allows customers to provide data for many customers at scale, while supporting distinct filtering requirements for each of those customers.
Data is delivered to the customer’s app through a constant stream as it is created. The realtime stream does not provide recent data – rather, it begins filtering for and delivering results based on the time a filtering rule is added to the stream. If, in addition to realtime data, your product also requires instant access to recent data, we recommend using our Search API.
Data is in Gzip compressed JSON format, using an implementation of the Activity Streams schema, although some sources may also be available in the native format provided by the data source. For details on the types of activities provided for a given source, as well as the data format, see our Sources documentation, noting that only complete-access sources are supported in PowerTrack.
When an activity is delivered through your PowerTrack stream, Gnip adds metadata in the “matching rules” portion of that activity to indicate which rule or rules caused that specific activity to be delivered. If multiple rules match a single activity, the activity is delivered a single time with each of the matching rules included in this metadata. The matching rules provide an easy way to associate a specific activity with specific rules and customers in your product, even where you have many customers with lots of distinct rules. Since the data is delivered through a single stream in this manner, scaling up as your product gains additional customers is simple.
PowerTrack filtering rules should each be created with a tag. Rule tags have no special meaning to the Gnip system, they are simply treated as opaque strings carried along with the actual rule value. They are included in the “matching rules” metadata in returned activity payload. Tags provide an easy way to identify matching rules and manage rule sets. For example, you may generate a unique ID tag for each rule, and allow your app to reference that ID within activities it processes. These unique rule ID’s can then be used by your application to associate a result with your business logic such as specific customers, campaigns, categories, or other related groups.
Note that rules and their tags cannot be updated. In order to “update” either a rule or tag you need to first delete the rule, then add it again with the desired tag. The recommendation and best practice is to only use unique IDs as your tags, then create business logic in your own application for these unique identifiers. This way, you can adjust rule logic internally without needing to constantly delete/create rules with Gnip.
Integrating with PowerTrack
To integrate PowerTrack into your product, you will need to build an integration that can do the following:
- Establish a streaming connection to the PowerTrack stream API
- Asynchronously send POST and DELETE requests to the PowerTrack rules API to add and delete rules from the stream
- Handle low data volumes – Maintain the streaming connection, and ensure buffers are flushed regularly
- Handle high data volumes – de-couple stream ingestion from additional processing using asynchronous processes
- Reconnect to the stream automatically when disconnected for any reason
For details on the types of requests needed for tasks 1 and 2, and important considerations in implementing them, see the API reference.
For information on consuming a realtime data stream, see here.
In addition to the above, your app will likely want to take advantage of reliability features offered by Gnip, including:
- Redundant Streams - A second connection to provide a hot failover
- Backfill - Reconnect from the point you left off
- Replay Stream - Data recovery tool for the recent past