• Home /
  • Articles /
  • Visualizing Twitter Geo Data

  • Visualizing Twitter Geo Data

    posted on 03 December 2014 by Brian Lehman


    This article describes the processes involved in plotting geotagged tweets on a map built in d3, which is useful for web based visualization.

    Drawing

    For an in depth overview of the Gnip+Twitter geo data types please visit this geo article. Three types of geographic data can exist in the payload of a tweet:

    1. Gnip enrichment: Profile Geo
      Since 2013, Gnip enriches some Tweets with formal geographic information based on the user’s profile location – i.e. where the user may call “home”. For example, a user whose profile lists their location as “Boulder, CO” might get country, state, and city information added, as well as a central lat-long coordinate for the location.

    2. Place-name
      Starting in 2010, Twitter announced that a place (venue, neighborhood, city, etc.) can be selected to indicate Tweet location.

    3. Geotagged
      Since 2009, users can turn on (off by default) an automatic marker with the exact latitude and longitude from which they send tweets.

    To create a map, we’ll use the following process:
    1. Create geoData.json by transforming tweetData.json into GeoJSON format.
    2. Create world.json by transforming and combining shapefiles into topojson format.
    3. Visualize the data using a javascript library known as d3.

    Tweet Data to GeoJSON

    Map based visualizations often use the GeoJSON data format. In this format, each set of geo-tagged tweet coordinates are represented as a feature with any number of properties:

    An example of valid GeoJSON is below:

    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-122.09,37.64 ]
      },
      "properties": {
        "name": "@BrianLehman",
        "tweet_body": "I freaking love maps!",
      }
    }
    

    To create a valid GeoJSON collection of points, we need to parse relevant information from each Tweet’s payload. The JQ parser can be used to extract relevant sections of json format records. See this tutorial and support article for further information on JQ.

    Using an example data set tweetData.json, we can select only those tweets that are geotagged:

    $ cat tweetData.json | jq 'select(.geo)'
    

    We can also use JQ to extract only the coordinates:

    $ cat tweetData.json |  jq 'select(.geo).geo.coordinates'
    

    Viewing the output of the above script, notice that the geo data contains individual arrays of coordinates [latitude, longitude]:

    [
      37.64,
      -122.09
    ]
    [
      3.18,
      101.60
    ]
    

    We swap the coordinates [longitude,latitude] and append each of the features into a valid FeatureCollection:

    echo '{"type":"FeatureCollection","features":'"$(cat tweetData.json | jq 'select(.geo)| {type: "Feature", geometry:{type:"Point", coordinates:[.geo.coordinates[1],.geo.coordinates[0]]},properties:{tweet_body:.body,handle:.actor.displayName}}' | jq -s .)"'}' > geoData.json  
    

    The result of the above code is geoData.json, which is in valid GeoJSON format where each point is in a feature:

    {
        "type": "FeatureCollection",
        "features": [
            {
                "type": "Feature",
                "geometry": {
                    "type": "Point",
                    "coordinates": [
                        -122.09,
                        37.64
                    ]
                },
                "properties": {
                    "tweet_body": "I freaking love maps!",
                    "handle": "BrianLehman"
                }
            },
            {
                "type": "Feature",
                "geometry": {
                    "type": "Point",
                    "coordinates": [
                        101.6,
                        3.18
                    ]
                },
                "properties": {
                    "tweet_body": "I love you @grandma_smarmy",
                    "handle": "uncle_smarmy"
                }
            }
        ]
    }
    

    Shapefile to Topojson

    An obstacle in the way of building d3 maps is the process of transforming shapefiles into smaller, more manageable formats.

    The first step is to download the land and country files from Natural Earth. Next, we use GDAL and topojson packages to transform the shapefiles into GeoJSON and then into topojson. (see Mike Bostock’s “Let’s Make a Map” for a superb introduction and details on installation of these tools)

    # unzip both files in main/
    
    #cd main/ne_110m_land/
    ogr2ogr -f GeoJSON countries.json ne_110m_admin_0_countries.shp  
    
    #cd main/ne_110m_admin_0_countries/
    ogr2ogr -f GeoJSON land.json ne_110m_land.shp 
    
    #cd main/
    topojson -o world.json  -- ne_110m_land/land.json ne_110m_admin_0_countries/countries.json
    

    We have just built world.json. This file is a valid topojson file that contains two feature sets known as land and countries.

    Visualize it!

    The complete d3 code contains two files (index.html, map_points_basic.js) that should be downloaded and saved in the same directory as world.json and geoData.json. The javascript is heavily commented to explain the code. Once these files are organized, just run python -m SimpleHTTPServer 8080 in our working directory and point a browser to http://localhost:8080/index.html to view our map!

    Mapbox and Gnip
    Geography of Tweets
    Topograph of Tweetes
    NYC Geotagged Tweets