This page is designed to help you migrate from version 1.0 to 2.0 of our 30-Day Search API. Below you’ll find a summary of the changes, new feature list, as well as version differences to help with the transition.
- New Endpoint URLs
- Operator Deprecation
- Select operators that are currently supported by 30-Day Search are no longer supported
- Enrichments and Associated Filtering
- Language Classification (“lang:” operator)
- Utilize Twitter language classification as opposed to the legacy Gnip language classification
- Migrate use of bio_lang operator to lang operator
- Enhanced URL Expansion (url title and meta description)
- Migrate use of “url_contains:” operator to “url: operator
- Publisher parameter is no longer required for either counts or data endpoint
Below is a comprehensive list of the new operators and features available in 2.0.
|Feature / Operator:||Details:|
|2048 character query support||Search now supports 2048 character queries without positive or negative clause limits|
|has:images operator||Query for only tweets with native images|
|has:videos operator||Query for only tweets with native videos|
|has:symbols operator||Also known as cashtags, has:symbols support querying for only tweets that have symbols/cashtags|
|$ operator (e.g. $AAPL)||Query for tweets with a given symbol / cashtag|
|Elimination of the “high velocity ‘next’ token issue”||In Gnip’s 30-Day Search API version 1.0, there's an issue where the pagination can get stuck in an infinite loop if the volume of activities in a given bucket exceeds the maxResults parameter. Most commonly, this occurs when more than 500 tweets occurred in a minute. This issue does not exist in Search 2.0.|
|Enhanced URL Expansion Enrichment||Tweets returned by the Search API now include the enhanced URL Expansion enrichment including the URL’s Title and Meta Description|
|totalResults field||Counts queries will now include an aggregate total count for the response|
|Request Parameters included in Response||For ease of use, we now include the request parameters in the response body|
|Expanded language tokenization||Support for tokenization of additional languages besides Latin and Japanese including Korean, Chinese, Arabic and more!|
|Enrichments and Matching rules in Original Format||Original format streams can include any Gnip enrichment, including the “matching rules” element|
New Endpoint URLs
- Data: https://gnip-api.twitter.com/search/30day/accounts/:accountName/:label.json
- Counts: https://gnip-api.twitter.com/search/30day/accounts/:accountName/:label>/counts.json
The table below contains operators that have been deprecated in version 2.0 and the suggested replacement operator (if available).
|Current Operator||Replacement Operator (if exists)||Details and Rationale|
|url_contains:||url:||url_contains: was originally built as a substring match on the urls of a tweet. We are implementing a “url” operator that performs a tokenized match|
|twitter_lang:||lang:||Like PowerTrack 2.0, Gnip’s language classification is being deprecated and replaced with Twitter’s language classification. With this change, the more commonly used “lang” operator will now be applied to the Twitter language classification, rendering the twitter_lang operator unnecessary and redundant.|
|has:lang||-lang:und||Like PowerTrack 2.0, the Twitter language classification field is present for all tweets and unclassified tweets have a value of “und” (undefined), so a query for only tweets with a language classification would now be -lang:und (i.e. NOT language=undefined).|
|bio_lang:||lang:||Migrate use of bio_lang operator to lang operator. The "bio_lang" operator had very low adoption and can be replaced by the "lang" operator, which now uses Twitter's language classification.|
The following 1.0 operators are not currently supported in Search 2.0.
- url_contains: (being replaced by url:)
Unlike Gnip’s 30-day Search API and other historical products (Historical PowerTrack, Replay), some of the data within a Tweet is mutable, i.e. can be updated or changed after initial archival.
Mutable data falls into two categories:
- Metadata around a user/actor object.
- user’s @handle
- bio description
- follower’s count, friends count, listed count, statuses count
- Tweet statistics (i.e. anything that can be changed on the platform by user action) See examples below:
- retweet count
- favorites count
In most of these cases, the Search API will return data as it exists on the platform at query-time, rather than Tweet generation time. However, in the case of queries using select operators (e.g. from, to, @, is:verified), this may not be the case. Data is updated in our index on a regular basis, with an increased frequency for most recent timeframes. As a result, in some cases the data returned may not exactly match the data that was queried for, but matches data at the time it was last indexed.
Note: this issue of inconsistency only applies to queries where the operator applies to mutable data (e.g. user and user-bio related operators and counts-based operators). We will not be supporting many such operators initially, but will offer more in the future. The most problematic example is indeed filtering for usernames, and the best workaround would be to use userIDs rather than @handles for queries including the from, to or @ operators.
The table below contains version differences that we plan to address in the near future. As updates occur, we will update the Change Log on the Search API 2.0 home page.
|Truncated Retweets||In the Retweet payload, there is a form of the tweet body that gets "RT @username" added to the beginning. Despite the addition of this text, the tweet body text gets truncated to 140 characters, causing the last several characters of a full 140-character Tweet to be cut off. Therefore, some queries won't match against the text that got cutoff. This most commonly manifests itself with hashtags and urls.||Plans to resolve; Timing TBD|
|Accented and special characters (Diacritics)||Accented and special characters are normalized||TBD|
|"quoted phrase" / punctuation matching||Punctuation is not tokenized and is instead treated as whitespace.||TBD|
|from:, to:, retweets_of, @ operators||Usernames can change on the platform. We recommend use of from:userid, to:userid, retweets_of:userid and @userid where possible||n/a|
Below are sample payload responses of the data and counts endpoint (including the new “totalResults” field).