Data Collector Dashboard
The following video provides an overview of the various portions of the console.gnip.com dashboard.
Upon logging into your Data Collector at the URL provided by your account representative, you will land on the main dashboard. The first time you log in, there will not be any feeds yet, and your dashboard will look like the following:
The info box at the bottom of the main page displays some general information about your machine and its performance, including the machine name, IP address, number of active feeds, the machine load, and capacity usage.
To get started with your Data Collector, click "Start Adding Feeds" and follow the steps in our trial walkthrough. Once you have added some feeds, your main dashboard will look more like this:
The general information provided for each stream on this page includes the stream name, health, number of new activities collected in the last 24 hours, the delay or latency being observed from the source API (how long it is taking to return results), and a chart of the volume collected in the last 24 hours.
Also, note that the "machine status" box is moved to the bottom-right portion of the stream, and a new box is inserted next to it showing samples of the incoming data as it's collected.
Clicking on the stream name for a particular stream in the main dashboard will take you to that stream's interface. The Overview tab includes a chart of your stream's volume, an events ticker with details on recent queries, the current health of the stream, and some overall metrics on the stream's performance.
Event ticker data for recent queries includes 1) the query time, 2) the rule being queried for, 3) the API's HTTP response code (indicating whether it was successful, or whether there were errors), 4) the raw number of activities returned by the API for the given query, and 5) how many of those activities were "new."
A "new" activity is one that has not been collected previously for this rule or any other – deduplication is based on the activity's ID. If the source API returns an activity ID that the stream has collected previously, it will be recognized as a duplicate and excluded from storage in the cache to prevent duplicates from being delivered to you. Where duplicates appear in query results, the event ticker will display a "d" on the far left side of the event, as shown in the first two queries on the screenshot above.
The Data page provides insight into a sampling of queries being executed as they happen, including the actual URL being queried in the source API, as well as some of the raw data being returned. This is an easy way to get a quick look at the raw payloads being collected, as well as error responses being returned by the source API, if errors are occurring.
The Rules page provides helpful metrics regarding the current ruleset being utilized in your stream. Specifically, the page provides details on the rules currently in place that have the highest activity volume, as well as those with the lowest volume. Additionally, the "Metrics" box provides the current number of active rules for the stream, how long it takes the Data Collector to query through the entire list of rules (the "Cycle Time"), and the average time required for each query (Time/poll).
Additionally, if you have rules that have returned errors every time that the Data Collector has queried for them, a third section will be displayed at the bottom of the page with a list of these rules. This is common for feeds such as the Facebook Fan Page Feed, which requires valid page IDs as rules, rather than keywords, and returns an error for invalid page IDs, or where pages have restricted API access.
The Highest and Lowest Activity Volume boxes provide 1) the rule being described, 2) the total number of activities collected for that rules since it was created, 3) the average number of activities collected per query, 4) how many times the Data Collector has queried for the given rule, and 5) how many queries for this rule have returned "full polls."
A "full poll" occurs when a given query returns the most activities possible from the source API, and all of those activities are "new" (i.e. they have not been collected before from this rule or any others). If you are receiving full polls, there is likely extra data that you have been unable to retrieve due to the length of your cycle time, or the broadness of your rule. Creating more specific rules, or reducing your cycle time are ways to combat full polls, and to get better coverage.
The API Help tab on your stream's interface provides the API endpoint URLs for your stream, as well as the Rules API endpoint for the specific stream. In addition, it includes sample curl commands and instructions on how to connect to the stream endpoint, and how to programmatically add, delete, and list rules from your stream's Rules API endpoint.
The Edit tab allows you to manually enter rules for your stream (for filtered streams) and switch the output format. Note that the interface only supports adding up to 1000 rules via this manual method – if you need to use more than 1000 rules, you should manage them programmatically via the API.
Last, you can view your account information and change your password by My Account page on the dashboard.
For more in-depth integration information, see the Data Collector documentation.