Expanded URLs 2.0


With the Expanded URLs enrichment, Gnip will automatically expand shortened URLs that are included in the original payload of the activity, and include the resulting URL as an additional piece of data within the payload. Additionally, Gnip will extract key HTML data (page title and description) from the resulting URL and include it in the payload.

Expanded URL data will be included in Gnip’s PowerTrack, Replay, Volume Stream, Search, and Historical PowerTrack APIs.


Expanded URL Data


Original Format Field Name Activity Streams Field Name Example Value Description
entities.urls.url gnip.urls.url https:\/\/t.co\/b9ZdzRxzFK The shortened URL that is encoded as a t.co when the Tweet is created.
entities.url.unwound.url gnip.urls.expanded_url http:\/\/www.today.com\/parents\/joke-s-you-kid-11-family-friendly-april-fools-pranks-t83276 The expanded URL, or the final destination that our URL resolution process was able to reach.
entities.urls.unwound.status gnip.urls.expanded_status 200 The HTTP status code for the final destination that our URL resolution process was able to reach.
entities.urls.unwound.title gnip.urls.expanded_url_title The joke's on you, kid: 11 family-friendly April Fools pranks The HTML title from the final destination that our URL resolution process was able to reach.
entities.urls.unwound.description gnip.urls.expanded_url_description If your kids are practical jokers, turn this April Fools' Day into a family affair. The HTML description from the final destination that our URL resolution process was able to reach.

The Historical PowerTrack, Search, and PowerTrack APIs support filtering based on Expanded URL data. See the appropriate product documentation for more details on what operators are available for filtering on Expanded URLs data.


HTTP Status Codes

The expanded URL enrichment also provides the HTTP status code for the final URL we are attempting to unwind. In normal cases, this will be a 200 value. Other 400-series values indicate problems with resolving the URL.

Various status codes may be returned when attempting to unwind a URL. During the process of unwinding a URL, if we get a redirect, we will follow them indefinitely until we either:

  • Hit a 200 series code (success)
  • Hit a non-redirect series code (failures)
  • Time out because the final URL could not be resolved in a reasonable amount of time (returns a 408 - timeout)
  • Hit an exception of some sort

If an exception is hit, we use the following mapping between reasons and status codes returned:

Reason Status Code Returned
SSL Exceptions 403 (Forbidden)
Unwinding not allowed by URL 405
Socket Timeout 408 (Timeout)
Unknown Host Exception 404 (Not Found)
Unsupported Operation 404 (Not Found)
Connect Exception 404 (Not Found)
Illegal Argument 400 (Bad Request)
Everything else 400 (Bad Request)

Sample Payload

In Original Format, expanded URL data will be included in the entities.urls.unwound section of the payload.

{
    "entities": {
        "urls": [
            {
                "url": "https: //t.co/b9ZdzRxzFK",
                "expanded_url": "http: //www.today.com/parents/joke-s-you-kid-11-family-friendly-april-fools-pranks-t83276",
                "display_url": "today.com/parents/joke-s…",
                "unwound": {
                    "url": "http: //www.today.com/parents/joke-s-you-kid-11-family-friendly-april-fools-pranks-t83276",
                    "status": 200,
                    "title": "The joke is on you kid: 11 family-friendly April Fools pranks",
                    "description": "If your kids are practical jokers, turn this April Fools' Day into a family affair."
                },
                "indices": [
                    43,
                    66
                ]
            }
        ]
    }
}

In Activity Streams Format, expanded URL data will be included in the gnip.urls section of the payload.

{
    "gnip": {
        "urls": [
            {
                "url": "https://t.co/b9ZdzRxzFK",
                "expanded_url": "http://www.today.com/parents/joke-s-you-kid-11-family-friendly-april-fools-pranks-t83276",
                "expanded_status": 200,
                "expanded_url_title": "The joke's on you, kid: 11 family-friendly April Fools pranks",
                "expanded_url_description": "If your kids are practical jokers, turn this April Fools' Day into a family affair."
            }
        ]
    }
}

FAQ 

To resolve a shortened link as described above, our system sends HTTP HEAD requests to the URL provided, and follows any redirects until it arrives at the final URL. This URL (NOT the content of the page itself) is then included in the data payloads we send to our customers.

For requests made to the Full Archive Search API, we currently only support expanded URL data for Tweets 13 months old or newer.