Profile Geo 2.0


Millions of users provide information in their social networking profiles about where they live. The Gnip Profile Geo Enrichment significantly increases the amount of geodata in the Twitter Firehose by geocoding and normalizing users’ profile locations where possible and including the [longitude, latitude] coordinates and related place metadata in the Gnip payload. This data is available in the gnip.profileLocations property within the Tweet payload and can be filtered with a combination of PowerTrack rules.

Notes:

  • All Profile Geo coordinates are provided in the [Longitude, Latitude] order.
  • Some of the supporting geodata used to create the Profile Geo enrichment comes from GeoNames.org and is used by Gnip under the Creative Commons Attribution 3.0 License.

Profile Geo data will be included in Gnip’s PowerTrack, Replay, Volume Stream, Search, and Historical PowerTrack APIs.


Profile Geo Data


Original Format Field Name Activity Streams Field Name Example Value Description
user.derived.locations.county gnip.profileLocations.address.country United States The country location for where the user that created the Tweet is from.
user.derived.locations.country_code gnip.profileLocations.address.countryCode US A two letter ISO-3166 country code that corresponds to the country location for where the user that created the Tweet is from.
user.derived.locations.locality gnip.profileLocations.address.locality Birmingham The locality location (generally city) for where the user that created the Tweet is from.
user.derived.locations.region gnip.profileLocations.address.region Alabama The region location (generally state/province) for where the user that created the Tweet is from.
user.derived.locations.sub_region gnip.profileLocations.address.subRegion Jefferson County The sub-region location (generally county) for where the user that created the Tweet is from.
user.derived.locations.full_name gnip.profileLocations.displayName Birmingham, Alabama, United States The full name (excluding sub-region) for where the user that created the Tweet is from.
User.derived.locations.geo gnip.profileLocations.geo See Below An array that includes a lat/long value for a coordinate that corresponds to the lowers granularity location for where the user that created the Tweet is from.

The Historical PowerTrack, Search, and PowerTrack APIs supports filtering based on Profile Geo data. See the appropriate product documentation for more details on what operators are available for filtering on Profile Geo data.


Sample Payload 

Note that in both formats, the coordinates are provided in the [Longitude, Latitude] order.

Original Format:

{
    "user": {
        "derived": {
            "locations": [
                {
                    "country": "United States",
                    "country_code": "US",
                    "locality": "Birmingham",
                    "region": "Alabama",
                    "sub_region": "Jefferson County",
                    "full_name": "Birmingham, Alabama, United States",
                    "geo": {
                        "coordinates": [
                            -86.80249,
                            33.52066
                        ],
                        "type": "point"
                    }
                }
            ]
        }
    }
}

Activity Streams Format:

{
    "gnip": {
        "profileLocations": [
            {
                "address": {
                    "country": "United States",
                    "countryCode": "US",
                    "locality": "Birmingham",
                    "region": "Alabama",
                    "subRegion": "Jefferson County"
                },
                "displayName": "Birmingham, Alabama, United States",
                "geo": {
                    "coordinates": [
                        -86.80249,
                        33.52066
                    ],
                    "type": "point"
                },
                "objectType": "place"
            }
        ]
    }
}

Limitations

  • The Profile Geo enrichment attempts to determine the best choice for the geographic place described in the profile location string. The result may not be accurate in all cases due to factors such as multiple places with similar names or ambiguous names.
  • If a value is not provided in a user’s profile location field (actor.location), we will not attempt to make a classification.
  • Precision Level: If a Profile Geo Enrichment can only be determined with confidence at the country or region level, lower-level geographies such as subRegion and locality will be omitted from the payload.
  • The Profile Geo enrichment provides lat/long coordinates (a single point) that corresponds to the Precision Level of the enrichment’s results. These cooordinates represent the geographic center of the enrichment location results. For example, if the Precision Level is at the Country level, then those coordinates are set to the geographic center of that Country.
  • Combining profileLocations.address operators: The PowerTrack operators provided for address properties (locality/region/country/country code) are intentionally granular to allow for many rule combinations. When attempting to target a specific location that shares a name with another location, consider combining address rules. For instance, the following would avoid matches for “San Francisco, Philippines”: profile_locality:”San Francisco” profile_region:California When building rules that target individual Profile Geo fields, keep in mind that each increased level of granularity will limit the results you see. In some cases when attempting to look at data from a city, you may wish to only rely on a region rule where the region offers significant overlap with the city, e.g. the city of Zurich, Switzerland can be effectively targeted along with surrounding areas with profile_region:”Zurich”.
  • Use with Native Geo Tweets: The Profile Geo enrichment provides an alternative type of geography for a Tweet, different from the native geo value in the payload. These two types of geography can be combined together to capture all of the possible tweets related to a given area (based on available geodata), though they are not conceptually equivalent.