Identifying and Understanding Retweets & Quote Tweets

posted on 07 February 2014 by stephen compston


Gnip customers often want to know the specifics around identifying and integrating Retweets into their products, but can run into a few common roadblocks. If you’re looking for the best way to incorporate Retweets into your product, this article will provide everything you need to know about identifying them, and best practices for extracting the information you need from them.

What is a Retweet?&nbsp

A Retweet is an action taken by a Twitter user to share another user’s Tweet without alteration, using Twitter’s explicit Retweet functionality.

A Retweet retains information about the user who posted the original Tweet, as well as the user who Retweeted them.
Retweets are an important part of Twitter’s platform – they permit content to be shared rapidly and with attribution, and are the most easily measured form of content engagement on the platform. Many social analytics tools use the number of Retweets a particular Tweet receives in calculating its impact or reach (i.e. its importance). However, to do so, your app must be able to accurately identify Retweets.

How to identify a Retweet&nbsp

The proper method for identifying a Retweet depends on the data format you are using in your app, as the fields differ between Twitter’s native format, and Gnip’s normalized Activity Streams format.

In the Activity Streams format from Gnip, your app should look at the root-level ‘verb’ field. If this field’s value is ‘share’, then the activity is a Retweet. If it is a ‘post’, then it is an original Tweet, or one of the variations.

For example, here is an excerpt from the root-level of a Retweet:

{
  "id": "tag:search.twitter.com,2005:299935329132105728",
  "objectType": "activity",
  "actor": {...},
  "verb": "share",
  ...
}

And here is an original Tweet:

{
  "id": "tag:search.twitter.com,2005:403224522679009280",
  "objectType": "activity",
  "actor": {...},
  "verb": "post",
  ...
}

Note that the method above requires you to look at the root-level (outermost) ‘verb’ field in the payload. In the case of Retweets, there will also be a verb field within the object – this relates to the original Tweet (the one being shared), and is not relevant for this determination.

In Twitter’s native data format, Retweets can be identified by the presence of data in the ‘retweeted_status’ field.

Integrating Retweets&nbsp

Once your app is able to properly identify Retweets, it needs to know which fields to use to retrieve the various pieces of data it needs. This can be confusing due to the multi-layered structure of a Retweet object.

Each Retweet contains two layers: 1) an outer layer, which holds data related to the Retweet action itself, and the user who performed the Retweet, and 2) and inner layer which holds data about the original Tweet, including data about the user who posted it. The outer layer exists at the root-level of the JSON Tweet object – the root level – while the inner layer is contained within the root-level “object” field. More clearly, the Retweet is a Tweet object, which contains another whole Tweet object within the “object” field.

In the example below, this is represented by some excerpted fields, with Tweet ID 299935329132105728 being the “Retweet”, and 299935121384034304 being the original (the one being shared).

{
  "id": "tag:search.twitter.com,2005:299935329132105728",
  "actor": {...},
  "verb": "share",
  "body": "RT @DJ1NDY: #NowPlaying ...",
  "object": {
    "id": "tag:search.twitter.com,2005:299935121384034304",
    "actor": {},
    "verb": "post",
    "body": "#NowPlaying ...",
    ...
  },
  ...
}

These layers contain essentially the same attributes – a field holding the text of the Tweet, an object to represent the user, a field for the time the Tweet was posted, etc. In Retweets, some of these are obviously different – e.g. the user who posted the original Tweet (represented in object.actor) is different than the user who Retweeted them (represented in the root-level actor).

In other cases, the content is similar. The best example of this is the ‘body’ or ‘text’ of the Tweet since Retweets are just sharing the unmodified text of the original. However, these fields are not identical due to Twitter’s handling of Retweets.

Specifically, the object.body field contains the original unmodified text of the Tweet which is being shared. In contrast, the root-level body field contains a slightly version of this which has been slightly modified by Twitter. For this, Twitter takes the original text and appends ‘RT @username’ at the beginning, using the username of the user who posted the original Tweet.

Additionally, in cases where the additional characters added by Twitter cause the length of the Tweet to exceed 140 characters, Twitter then truncates the text of the Tweet to make it fit, adding ellipses at the end. For example, the original text in the Tweet being shared in the Tweet above was:

 #NowPlaying on PanjabRadio, @TwinBeatsUK & @Saini_Surinder_  - 'Lok Boliyan' - Sky Channel 0130/DAB/iPhone App/panjabradio.co.uk #bhangra

However, in the Retweet, Twitter modifies the root-level “body” to look like the following.

 RT @DJ1NDY: #NowPlaying on PanjabRadio, @TwinBeatsUK & @Saini_Surinder_  - 'Lok Boliyan' - Sky Channel 0130/DAB/iPhone App/panjabrad ...

Prior to November 2013, this body truncation affected usernames, URLs, hashtags, and stock symbols extracted by Twitter into the twitter_entities fields, truncating them and preventing matching. Historical data from before this date will also include truncated entities. Today, entities are extracted into the payload before the Tweet body is truncated, which will allow for matching on the included entities even if they’re not visible in the RT body.

However, the correct original values are still included – you just need to know where to look. To properly get data from a Retweet, your app should use the following:

  • for Tweet text: object.body
  • for Twitter entities: object.twitter_entities
  • for Retweet Counts, root-level retweetCount
  • for Favorite Counts: object.favoritesCount

Retweet Counts and Favorite Counts&nbsp

Twitter provides a retweetCount field in Tweets and Retweets. In Tweets (not Retweets), this count is always (or almost always) zero because Twitter sends them through the firehose at or near the time of creation, when there hasn’t been enough time for someone to Retweet them yet.

For Retweets, this represents the current retweet count for the Tweet being shared, at the time of that given Retweet. Note that this is the Retweet count provided by Twitter. Gnip doesn’t have insight into how this number is calculated, and it may be the case that it includes things like Tweets from protected accounts in the calculation, which is not possible with the public firehose provided to Gnip. The public firehose used by Gnip is limited to Tweets and Retweets from public Twitter accounts.

Favorite Counts also appear as described above, with the same qualifiers as Retweets. However, note that there is no favorite ‘activity’ sent through the firehose by Twitter, so you will not receive an update every time a user favorites a Tweet. In fact, you will only be able to get an updated favorite count for a Tweet each time that it is Retweeted. An alternative would be to update numbers separately from Gnip by sending requests to Twitter’s public REST API directly.

Quote Tweets&nbsp

Quote Tweets are where users can select to quote another Tweet and add a comment of their own to be sent to their followers. Quote Tweets are convenient for users because it allows them to share a Tweet, but then add their comments with a full 140 characters.

When a Tweet is quoted, the quote Tweet includes the full payload of the quoted status (original Tweet) within the quote Tweet payload. This looks like the following for the example Tweet above (this example is in Activity Streams):

{
    "id": "tag:search.twitter.com,2005:600699303225466880",
    "objectType": "activity",
    "actor": {
        "objectType": "person",
        "id": "id:twitter.com:63046977",
        "link": "http://www.twitter.com/happycamper",
        .
        .
    }
    "verb": "post",
    .
    .
    "twitter_quoted_status": {
        "id": "tag:search.twitter.com,2005:600436537998897153",
        "objectType": "activity",
        "actor": {
            "objectType": "person",
            "id": "id:twitter.com:373471064",
            "link": "http://www.twitter.com/TwitterMusic",
            "displayName": "Twitter Music",
            .
            .
          }
          .
          .
    }
    .
    .
}

In terms of filtering, for quote Tweets, content from both the original quoted Tweet and the new “comment” Tweet can be matched. This filtering change is focused on operators that match on the Tweet body, including keywords, contains:, phrases, proximity, @mentions, #hashtags, $cashtags, url_contains:, url:, has:links, has:mentions, has:media, has:hashtags, and has:symbols.

Quote Tweet filtering and the fully hydrated Quoted Tweet payload is available for PowerTrack 2.0, Replay 2.0 and Historical PowerTrack 2.0.

Note: Please visit the Sample Payloads section of the site to view a Quote Tweet in both Original and Activity Streams format.

For additional information on quoted Tweets please see the Twitter description HERE as well.

Additional Considerations&nbsp

Note, however, that the Retweet and Favorite counts for these will not be instructive – they only relate to this ‘new’ Tweet, and will almost always be zero.

For any other questions related to identifying and integrating ReTweets or Quote Tweets, contact the Gnip support team.


Tags: twitter, sources