When using Gnip’s Activity Streams format on a filtered stream (i.e. PowerTrack), Gnip adds metadata to indicate which rule or rules matched the specific results.
Matching will be done via exact match on the terms contained in a rule, scanning the content of the activity with and without punctuation. Matching is not case sensitive. When the content is found to contain all terms defined in the rule, Gnip will insert a gnip:matching_rule element indicating the rule was found to match the activity. Additionally, where a given result matches more than one rule, all matching rules will be included in the gnip:matching_rules element.
- If you are using ‘standard’ rules (with a maximum length of 1,024 characters), both the matching rules’ value and tag will be included.
- If you are using ‘long’ rules (with a maximum length of 2,048 characters), only the rule tag is included. For this reason using rule tags with long rules is critical since that is the only metadata returned in the gnip:matching_rules payload.
See the next section for examples.
Activities delivered through PowerTrack (realtime, Replay, and Historical) will contain the gnip:matching_rules object in JSON, similar to the following examples:
With standard rules:
With long rules:
In PowerTrack, the matching_rules object reflects all rules that matched the given result. In other words, if more than one rule matches a specific activity, the activity will only be delivered once, but the matching_rules element will contain all the rules that matched.
Data streams served via the Gnip Data Collector in Activity Streams format will contain a gnip:matching_rules object in XML, similar to the following example:
<gnip:matching_rules> <gnip:matching_rule rel="source">quake</gnip:matching_rule> <gnip:matching_rule rel="inferred">dc</gnip:matching_rule> <gnip:matching_rule rel="inferred">nyc</gnip:matching_rule> </gnip:matching_rules>
For the rel=’source’ case, the rule is the actual rule that Gnip was using to poll the service when the activity was first found. The rel=’inferred’ elements represent rules that also match the activity but weren’t the rules being polled for when the activity was initially seen. There will only ever be at most one ‘source’ rule but possibly many ‘inferred’ rules.
The content used in the search is the value of the <content> element within the <activity:object> element. Activities retrieved with rules that match on data other than the <content> element (e.g. Twitter’s Stream - Follow endpoint, which matches on the user ID of the author), will not contain matching_rule data.
It is important to note that this processing is done by Gnip after the activity has been retrieved from the data source for the purpose of associating a rule to an activity that was found. It therefore does not affect the actual search for the activity. Rules must be entered in the format supported by the data source for the purposes of the search.