• Home /
  • Enrichments /
  • Expanded URLs 1.0
  • Expanded URLs 1.0


    With the Expanded URLs enrichment, Gnip will automatically expand shortened URLs that are included in the original payload of the activity, and include the resulting URL as an additional piece of data within the payload.

    For premium products like PowerTrack, and Search API, Gnip attempts to resolve URLs down to the final URL, where possible, and all expanded urls will be included in the gnip.urls portion of the payload (with the original URL accompanying). Additionally, PowerTrack products can utilize the url_contains: operator in order to match activities based on their expanded URLs, which would not be possible without such expansion.

    The gnip.urls.expanded_status field provides the HTTP status code for the final URL we were attempting to unwind. In normal cases, this will be a 200 value. Other 400-series values indicate problems with resolving the URL.

    Various status codes may be returned when attempting to unwind a URL. During the process of unwinding a URL, if we get a redirect, we will follow them indefinitely until we either:

    • Hit a 200 series code (success)
    • Hit a non-redirect series code (failures)
    • Time out because the final URL could not be resolved in a reasonable amount of time (returns a 408 - timeout)
    • Hit an exception of some sort

    If an exception is hit, we use the following mapping between reasons and status codes returned:

    Reason Status Code Returned
    SSL Exceptions 403 (Forbidden)
    Unwinding not allowed by URL 405
    Socket Timeout 408 (Timeout)
    Unknown Host Exception 404 (Not Found)
    Unsupported Operation 404 (Not Found)
    Connect Exception 404 (Not Found)
    Illegal Argument 400 (Bad Request)
    Everything else 400 (Bad Request)

    Expanded URLs in premium products resemble the following:

    Sample Payload

    "gnip":
    {
        "urls":
        [
            {
                "url":"http://t.co/dg48ARkkZ7",
                "expanded_url":"http://blog.gnip.com/social-data-enterprise/",
                "expanded_status":200
            }
        ]
    }
    

    On streams served via the Data Collector, the URL expansion processing is performed after the activity has been retrieved from the data source, and only resolves the URL through one level of redirection. Expanding URLs on these streams does not allow the data source to match a greater number of activities than it would have otherwise, unlike what is available for premium products like PowerTrack.

    FAQ 

    To resolve a shortened link as described above, our system sends HTTP HEAD requests to the URL provided, and follows any redirects until it arrives at the final URL. This URL (NOT the content of the page itself) is then included in the data payloads we send to our customers.