Twitter Geographical Metadata
posted on 12 February 2014 by Jim Moffitt
Gnip’s customers often build products that need to use the location of a Tweet, or the user who posted it. For example, a customer may be interested in public opinion on health care legislation in a specific part of the country, or want to track customer satisfaction in different regions. Or, like me, perhaps you want to research social media communications during extreme weather events.
Customers looking to use or integrate location data into their product face challenges in determining which type of data is best for their use. Factors in this determination include the level of precision and accuracy provided for the different kinds of data, as well as the ease-of-use in filtering for the different types of data.
What geographical metadata comes with a tweet?
Twitter provides an option to ‘geo-tag’ tweets. This geo-tagging can be based on an exact location, assigned a Twitter Place (see HERE and HERE for more information), or both. Twitter Places can be thought of as at the neighborhood level, which provides a “bounding box” with latitude and longitude coordinates that define the location area. This type of geographic metadata, referred to as an “Activity Location” provides the highest level of precision. Activity Locations require no language parsing/processing to access the geographic information. The main drawback to relying on Activity Locations is that only 1-2% of tweets are geo-tagged. Additionally, targeting very large areas (e.g. an entire country) requires the use of a significant array of PowerTrack rules to capture the entire area, where only a “point” is available. Places afford nice options, including the option to filter by country code or place name.
A second source of geographic metadata are mentions of locations in the tweet content. This type of “Mentioned Location” metadata requires parsing the tweet message for location names of interest, including nicknames. One tweet may mention Manhattan, while another may mention the Big Apple. Ease-of-use is fairly high for these types of Tweets, provided you know how people on Twitter refer to the place you care about. You can simply implement keywords or phrases to look for those terms. On the other hand, accuracy is likely lower, as it’s a less-reliable indicator of the user’s precise location.
Finally, every Twitter Profile has a “Location” setting that can be filled out by the account owner. These Profile Locations provide the largest source of geographic metadata. Not everyone provides this information, and it can contain any phrase the user wants. One Twitter account could have its location set to “Living in the Colorado foothills”, while another could be set to a less helpful “My parents’ basement.” This type of reference is a middle-ground – it isn’t a definite geo-point, validated by GPS, but it is being designated by the user as their location, which provides a extra boost to the expectation of reliability. The options for filtering on this type of data are abundant, and are discussed below.
In summary, there are three metadata sources for geo-referencing tweets:
1. Activity Location: tweets that are geotagged with an exact location or Twitter Place.
- Exact location with long/lat coordinates: -85.7629, 38.2267
- Twitter Place with a name (“Louisville Central”) and four pairs of lat/long coordinates that define a “bounding box.”
2. Mentioned Location: parsing the tweet message for geographic location.
- “If you are in Louisville, check out the pizza place off main”
- “I’m in Louisville and it is raining cats and dogs”
3. Profile Location: parsing the account-level location for locations of interest.
- “I live in Louisville, home of the Derby!”
- “I live in Louisville, the one in beautiful Colorado.”
For example JSON that illustrates how this metadata is delivered in the tweet payload, along with details on how to filter on it, see this article.
How can I use this metadata to geo-reference tweets?
Gnip PowerTrack provides many ways to filter on these types of geographic metadata. These filters, or rules, are built using the more than fifty PowerTrack Operators (see complete documentation HERE).
See our Filtering Twitter by Location article for an introduction to the PowerTrack Operators that can be used to filter on Activity Locations and Profile Locations. Since Profile Locations are by far the largest source of Twitter geographic metadata, Gnip provides the Profile Geo enrichment.
Since Profile Geo vastly increases the amount of geographic data, there has been quick and wide adoption of this enrichment. For introductions to the power of the Gnip Profile Geo data enrichment see our documentation HERE.