Replay is a data recovery tool that provides streaming access to a rolling window of recent Twitter historical data. It should be utilized to recover data in scenarios where your app misses data in the realtime stream, whether due to disconnecting for a short period, or for any other scenario where you fail to ingest realtime data for a period of time.
There are different varieties of Replay streams, corresponding to different types of realtime streams that they complement. PowerTrack Replay streams are provided to allow customers using realtime PowerTrack to recover data they miss, using the same types of rules as they use in realtime.
If your account is configured with a Replay stream, your app can make requests to it that operate in the same manner as requests to Gnip’s realtime streams. However, your app must specify parameters in the URL that indicate the time window you are requesting. In other words, a Replay request asks Gnip’s API for “Tweets from time A to time B.” These Tweets are then delivered through your streaming connection in a manner that mimics the realtime stream, but at a faster-than-realtime rate.
Tweets are delivered beginning with the first (oldest) minute of the specified time period, continuing chronologically until the final minute is delivered. At that point, a “Replay Request Completed” message is sent through the connection, and the connection is then closed by Gnip. If your request begins at a time of day where little or no matching results occurred, there will likely be some period of time before the first results are delivered – data will be delivered when Replay encounters matches in the portion of the archive being processed at that time. When no results are available to deliver, the Gnip stream will continue sending carriage-return “heartbeats” through the connection to prevent you from timing out.
Replay is intended as a tool for easily recovering data missed due to short disconnects, not for very long time periods like entire days. If the need to recover data for long periods arises, we recommend breaking longer requests into shorter time windows (e.g. two hours) to reduce the possibility of being disconnected mid-request due to internet volatility or other reasons, and to provide more visibility into the progress of long requests.
Data from Replay streams is available in a rolling 5-day window, with new data becoming available 30 minutes after a given activity is created. You will be able to make requests specifying a timeframe within this window using fromDate and toDate parameters within the request. However, the toDate for a request cannot be within the 30 minutes prior to the time of your request.
All data delivered through Replay is compliant with deleted tweets and other compliance events at the time of delivery.