How Far Does a Tweet Travel?

UPDATE: Blog post too long? Have a look at the animation!

Twitter is unmistakingly the poster boy of social media in 2009. Twitter became a household name this year when celebrities started joining the bandwagon. Meanwhile, the world was watching how Twitter played an important role in helping the people of Iran organize demonstrations against the unfair presidential elections.

But Twitter is more than social media, it’s about how people sharing stories and building communities. But above all, it’s a gigantic real-time playground for studying phenomena such as information diffusion, fads and hypes.

One thing that intrigues me about Twitter is whether people use it to organize and communicate locally or more globally. In this blog post I report on an ongoing research project that I started a couple of weeks ago and focus on how far tweets travel.

An important use of Twitter is the act of retweeting someone else’s tweet. This retweeting can be seen as voting: important information and stories are being retweeted to make sure the information disseminates quickly. But where do the people live who retweet?

To answer this question I have collected over 10.000 retweets and the original tweets including all involved users. For each of these users, I parsed their location information (if available) and geocoded this to longitude and latitude information using the publicly available geonames.org website. Most people provide their location at the city level, although increasingly people are giving their latitude and longitude coordinates using their cell phones.

To create variance in distance for people from the same city, I randomly select latitude and longitude coordinates for that city. For example, many of the Twitter users are from San Francisco.  To make sure that the distance between two people from SF is not zero, I query all geocodes that cover SF and then randomly select one pair of coordinates for that particular user.

I excluded all users who have set their city to Tehran / Iran and Quito to prevent bias in the results. Many people on Twitter reset their location to Tehran during the demonstrations as their were indications / rumors that the Iranian government started tracking people from Iran who where using Twitter. To obfuscate their attempts, many people changed their location to Tehran and they might not have reset their location. In addition, for a while there was a bug in Twitter’s user registration process where it would assign as time zone Quito.

To identify all retweets of a particular tweet I used the following procedure:

  • I only include retweets that contain either the ‘RT’ or ‘via’ indication.
  • I only include retweets that contain a link to a website but I exclude social networks sites such as MySpace and Facebook as I cannot retrieve the actual page without logging in.
  • I visit each page mentioned in the (re)tweet and parse the title of the story.
  • Then, I search for all tweets and retweets containing the original link of the story.

Of course, as soon as the geotagging of tweets goes live, then I don’t have to make any assumption or parse strings and geocode manually.

Of the 10.228 users, I was able to geocode 6.424 users. I assume that the missing geo-coordinate data is randomly distributed among Twitter users. My database consists of 13.399 (re)tweets comprising 285 original stories. I am able to create 1.758 dyads where I know both the geocoordinates of the original  sender and the person who retweeted it.

The first graph shows the histogram of elapsed time between the original tweet and the retweets. More than 60% of all retweets happen within the first hour, after that the probability of a tweet being retweeted quickly fades to almost zero and in my limited sample, nothing is retweeted after 24 hours.

blip-time-histogram

The second graph shows the histogram of distance between the original sender and the person who retweets the story. To improve the readability of the histogram, I take the log of the distance. An average retweet travels 10^2.98 = 955 kilometers, while the median distance is 10^3.23 = 1698 kilometers.

blip-distance-histogram

These data suggest that Twitter makes the world smaller but not hugely. On the other hand, the average and median distance are too large to speak of local communication. A random retweet connects different pockets in Twitterverse  possibly in a similar way as small world networks do: people bridging different regions of a network make the network smaller and the same seems to apply to Twitter: people who are part of different conversations re-broadcast this information to Twitter regions that haven’t heard the information yet. This suggests that there are opportunities for information arbitrage: people can capitalize on early access to information, re-broadcast this information and thereby developing a reputation as an expert on a particular topic. Through this mechanism, we start developing a common pulse, by sharing information and joining conversations that ultimately will make the world smaller.

I am not quite sure if this is the true story behind the data, but I am curious to figure it out.

Looking for the real data? Take a look at my first attempt of building a Processing applet that visualizes the distance traveled by retweets.

, , , , ,

Comments

Links for FSOSS 2009 Conference Attendees

Today, I presented some of my work on bug fixing in the Firefox community at the FSOSS 2009 Open Source Symposium. I mentioned a number of tools / extensions in my presentation and here are the links:

  1. Jetpack Add-On to Predict Likelihood of Bug Fix
  2. Jetpack Add-On the Crowdsource Flamy Bug Reports
  3. Crowdsourcing Mozilla Developer Handles

Some people asked me to put the slides online, so they are available at SlideShare.

Comments

Compiling Cairo 1.8.8 for iGraph Visualization Support

I have been working with iGraph 0.5.2 and I am really happy with the speed and diversity of algorithms. Of course, networks need to be visualized as well. iGraph does offer visualization capabilities but you need Cairo installed. Unfortunately, installing the python bindings for Cairo requires a little bit of hacking, especially if you do not want to upgrade to Python 2.6.

So, here we go to enable Cairo support for iGraph using Python 2.5. (Probably it’s way easier to use Macports of Fink but I like to compile by hand).  Grab the following libraries:

and run for each library the sequence:

  • ./configure
  • make
  • make test

    (not required and not every library supports this)

  • (sudo) make install

Cairo also supports PDF and SVG output but that will require additional libraries and compiling. This is the bare minimum to get Cairo to run. If you run make test on the Cairo package you are likely to have a bunch of tests failed, as far as I can tell that doesn’t really matter for iGraph but I am sure that some features of Cairo won’t work.

Now, let’s fix pycairo-1.8.8. There are two issues:

  1. Pycairo-1.8.8 requires Python 2.6 or higher
  2. Pycairo might look for the PPC shared libraries which it can’t find.

First, open configure in a text editor that does not mess with the linebreaks. I use Textwrangler for this, I tried nano first but that gave me this error

./configure: bad interpreter: No such file or directory

Open configure and go to lines: 11116 and 11150, it will read:

minver = list(map(int, '2.6'.split('.'))) + [0, 0, 0]

and replace 2.6 with 2.5. Close the file and save it.  Now we need to fix setup.py, so open it in a text editor and do the following:

  1. Add at the top of the file:
    from __future__ import with_statement
  2. Comment import io by adding # in front of it
  3. Go to line 76, it reads:
  4. if sys.version_info < (2,6):

    and replace 2.6 with 2.5

Save the file and close it. Now, we need to compile pycairo:

  1. ./configure LDFLAGS="-arch i386"

    (this will disable PPC support)

  2. make
  3. (sudo) make install
  4. (sudo) python setup.py install

If everything went smooth then fire up Python and enter:

import cairo

If you don’t get any errors then you have succeeded!

, , , ,

Comments