Capturing Media on Twitter
posted on 15 April 2014 by Leah Barren
Gnip’s PowerTrack filtering language gives you the ability to filter the Twitter Firehose, and a number of other sources’ firehoses, for data that is relevant to you and your brand. PowerTrack’s operators allow you specify what is delivered to you in almost real time. More and more frequently, users are including media specific to brands in their social media activities.
The has:media and has:links operators are two options in PowerTrack that can be useful in tracking activities that contain links to media. However, there are some significant differences in how they function, and what they’ll return. The has:media has a far narrower scope than has:links. Specifically, has:media only looks for Tweets with content in the twitter_entities.media field, which only ever includes pic.twitter.com links for images uploaded directly through Twitter, as of the time of writing. This could change in the future, if Twitter begins including more types of content in the “media” entity, but since photos are the only media that Twitter allows to be directly uploaded by the user today, there are no references to other types or sources of media.
The has:links operator, on the other hand, will return any activity that has a link in the Tweet body, regardless of what it is linking to. This includes any media uploaded to Twitter, because a pic.twitter.com URL is generated when a Twitter user uploads a photo, but it is certainly not limited to photos. Used by itself, has:links simply returns any activity that includes a URL, which can be a large volume of poorly-targeted data if you only care about Tweets with images or videos. For that reason, the has:links should only be used in combination with keywords or other operators that more specifically target the content you want.
But what if you and your brand is interested in knowing every time a customer Tweets a photo about your company or product, regardless of whether it was uploaded directly to Twitter or another popular social platform? For example, what if a Twitter user uploaded a photo to Flickr, and then shared the link on Twitter? A rule simply using the has:media operator would miss this Tweet, and the has:links operator would deliver it, but would also flood you with large volumes of irrelevant content. This is where the url_contains: operator is helpful.
The url_contains: operator is the most useful way to filter for media that is not covered by has:media. The url_contains: operator matches on URL substrings. It can be enclosed in quotes to allow for the top level domain to be included in the query. For example, you could filter on:
This particular search would return activities where there is a link from flickr.com. On the other hand, if you’re merely interested in any time your product or company appears in a URL in a Tweet, you could do this:
This take on the url_contains operator would return any activity where “Gnip” appears anywhere in the URL - whether it is from gnip.com or even someting like this:
Going back to the scenario presented above, if you wanted to track Tweets where a photo was posted to Twitter in a Tweet that mentioned your company or product, you could use the following syntax.
(gnip OR url_contains:gnip) (url_contains:"flickr.com" OR has:media)
You could then add additional ‘url_contains’ terms to the second group for other image hosting services you wanted to capture. This also applies to video-hosting services – you would simply need to identify the structure used by links from that service and incorporate it into an additional url_contains term.