Where do followers of Australian sport on Twitter come from?

This entry was posted by Laura on Wednesday, 20 October, 2010 at

For the past two, almost three months, I’ve been focused on Twitter to the exclusion of almost all other social networks. This is because of the size of the Twitter audience and because of the complexities of trying to determine the location of those Twitter followers. I’d really like to move on but I’ve ended up in a perfectionist loop where I can’t seem to find the command for breaking it.

My over riding goal of sorts, in regards to Australian sport on Twitter, is to determine the size and location of the Australian sport community on Twitter as expressed by following Australian sport athletes, clubs, leagues and organizations. The methodology is relatively simple: 1) Develop a list of the major Australian sport related Twitter accounts. 2) Get a list of all the followers for those accounts. The list should include as much profile information as possible, including the user inputted location and time zone. 3) Translate user inputted locations into actual locations. When no user inputted location exists, attempt to use the time zone field to get this location.

The first step is a manual step. I’ve got to look around at other people’s lists, check follow lists and develop that list. My current list is around 450. It is constantly in a state of flux as people and organizations occassionally delete their accounts or get new ones with new names. (This leads to massive drop offs that I can’t always explain or notice in the moment. Sources to document the disappearance of accounts don’t always exist. Asking a month or two after the fact means that people can’t always recall what happened.)

The second step is automated. I’ve had a friend (@Hawkeye7) develop a tool that pulls that information for me. It is relatively simple once executed but its success largely depends on the input from step one. From October 14 to October 18, we got the follower list for every single account on that list of 450 people. It took four days because there are API limits on Twitter. We’d run across them, wait an hour and then get the next page of followers or next person’s followers. In the case of @warne888, this took at least 8 hours.

The third step relies both on manual input and automation. I’ve spent probably in the neighborhood of 60 hours working on creating a list of about 85,000 variants of user inputted locations. The tasks for doing this included reverse geocoding, creating lists of city/state/country patterns for Australia based on observations for patterns that I saw as repeating, individually evaluating user inputted locations to try to make an accurate guess as to what the user meant in terms of their location. My focus was on getting as much of the city/state/country information for Australia, New Zealand, Canada, the United States and Ireland. For countries other than those listed, I just tried to identify that the account came from a country. The selection of countries for most specific location data was based on where I saw Australian sport followers as coming from and my own understanding of a country’s geography. I don’t understand the geography of South Africa and Costa Rice as much as I could. I didn’t want to spend the time to completely understand it to better label those accounts. The time to overcome my learning curve felt better used by continuing to improve the list of unknowns to relatively known. Despite the huge amount of time spent on this, I have a list of over 50,000 user inputted locations that I haven’t begun to look at. As a perfectionist, this drives me absolutely nutters. I can’t possibly translate every user inputted location into a usable location. (I’m still working on improving it anyway.) The more accurate this list, the better my results. (And given the size of the data sets involved, I have my own user input errors that I don’t always find until I run the locations.)

To give this perspective, let’s look at @warne888. He had over 198,000 followers. I found over 50,000 user inputted locations which did not have a location I had appended to that variant. This, when it was pulling from a list of 65,000 variants. India turned out to be the big problem in that I didn’t have that many variants for cities in the country. Out of a list of 50,000 followers for @warne888, 17,000 of those followers listed neither a user inputted location nor a useable time zone for determining their location. Roughly 34% of his followers I’m never going to determine where they are from.

That said, understanding the conditions of this data set, let me get to the actual results. These were found by combining all the follower results from all 450+ different accounts. By combine, only the UserID (A unique number Twitter has assigned for every account) and the country were put in the new file. The duplicate rows were then removed to insure that people who followed multiple accounts were only counted once. This should begin to give an idea as to the comparative distribution globally of Australian sport fans.

Country Count
Australia 77682
India 46174
United States 26515
Unknown 19717
United Kingdom 17547
South Africa 8456
New Zealand 4806
Canada 2004
Ecuador 1819
United Arab Emirates 1325
Ireland 1318
Brazil 1139
Chile 1101
Bangladesh 1085
Pakistan 1062
Argentina 909
Singapore 850
Germany 760
France 712
Japan 670
Indonesia 579
Philippines 535
Spain 430
Netherlands 413
China 310
Sri Lanka 280
Thailand 275
Malaysia 257
Hong Kong 230
Mexico 214
Italy 190
Portugal 159
Qatar 156
Belgium 152
Sweden 140
Kuwait 139
Saudi Arabia 136
Nepal 132
Switzerland 124
Iran 121
Bahrain 119
Norway 119
Venezuela 119
Turkey 108
Colombia 105
Israel 96
Denmark 90
Peru 90
Egypt 73
Costa Rica 72
Romania 67
Russia 62
Greece 61
Puerto Rico 59
Poland 56
Austria 54
Uruguay 48
Finland 44
Jersey 41
Vietnam 38
Serbia 33
Oman 32
South Korea 32
Barbados 30
Cyprus 29
Taiwan 28
Malta 26
Fiji 24
Kenya 24
Mauritius 22
Paraguay 22
Slovenia 22
Panama 21
Vatican City State 21
Nigeria 20
Maldives 19
Morocco 19
Croatia 18
Ghana 18
Guatemala 18
Ukraine 18
Czech Republic 17
Jamaica 17
Jordan 17
Latvia 17
Papua New Guinea 16
Gibraltar 15
Guernsey 14
Guyana 14
Bermuda 13
Botswana 13
Brunei 13
El Salvador 13
Hungary 13
Namibia 12
Dominican Republic 11
Bulgaria 10
Macedonia 10
Mongolia 10
Lebanon 9
Lithuania 9
Luxembourg 9
Honduras 8
Iceland 8
Monaco 8
Tonga 8
Afghanistan 7
Slovakia 7
Tanzania 7
Tunisia 7
Zimbabwe 7
Bolivia 6
Estonia 6
Iraq 6
Kazakhstan 6
Mozambique 6
Nicaragua 6
Trinidad and Tobago 6
Uganda 6
Algeria 5
Cayman Islands 5
New Caledonia 5
Swaziland 5
Antarctica 4
Aruba 4
Bosnia and Herzegovi 4
Cambodia 4
Haiti 4
Isle of Man 4
Samoa 4
Sudan 4
Zambia 4
Andorra 3
Armenia 3
Cameroon 3
French Polynesia 3
Guam 3
Laos 3
London 3
Seychelles 3
Albania 2
Azerbaijan 2
Belarus 2
Bhutan 2
Djibouti 2
Guadeloupe 2
Malawi 2
Marshall Islands 2
Solomon Islands 2
Suriname 2
Angola 1
Ballarat 1
Belize 1
Bendigo 1
Benin 1
Burkina Faso 1
Chad 1
Christmas Island 1
Congo 1
Cook Islands 1
Cuba 1
Ethiopia 1
Faroe Islands 1
Georgia 1
Greenland 1
Grenada 1
Ivory Coast 1
Kyrgyzstan 1
Liberia 1
Libya 1
Liechtenstein 1
Madagascar 1
Moldova 1
Nauru 1
Norfolk Island 1
Northcote 1
Reunion 1
Rwanda 1
Senegal 1
Sierra Leone 1
Somalia 1
Syria 1
Turks and Caicos Isl 1
United Kindom 1
Vanuatu 1
Virgin Islands 1
Western Sahara 1

Related Posts:

  • http://profiles.yahoo.com/u/ANYPWFYQMNG7NRB55Q7C3PR6C4 Adelaide La Blanche-Dupont

    South African geography has been slippery for me in the recent past.

    (certainly at the province/region level).

    India probably would have a LOT of city variants: especially when you factor in the (about) 12 commonly spoken and written languages.

  • http://www.fanhistory.com LauraH

    The other complicating factor is knowing how things are organized regionally. (And even then, it doesn’t always help.) In Australia, most people do a variant of the following schema:

    CITY
    CITY, STATE
    CITY, STATE, COUNTRY
    CITY, POSTCODE
    CITY, COUNTRY
    STATE, COUNTRY
    CITY, MAJOR CITY/URBAN HUB
    CITY, MAJOR CITY/URBAN HUB, COUNTRY
    POSTCODE

    The United States and Canada largely follow that model. Ireland is pretty close. Substitute state for county, where you have to know the counties and take a guess if they mean County Dublin or City of Dublin… and I know enough Irish geography to get that one. England, Scotland and Wales? Forget it. I get lost with all the shires, etc.

    India is pretty messy. I didn’t even want to figure it out: Delhi, New Delhi, Bombay, Calcutta and all the English spelling variants… The whole thing is almost a mini train wreck. My local world geography is really limited at times.

  • http://profiles.yahoo.com/u/ANYPWFYQMNG7NRB55Q7C3PR6C4 Adelaide La Blanche-Dupont

    Hopefully you’ll have some friendly Indian followers.

    These variations for Australia are terrific. Painful for collators though!

  • http://twitter.com/clustrmaps ClustrMaps

    Neat stuff

  • http://twitter.com/clustrmaps ClustrMaps

    Neat stuff, Laura – we use automated IP-to-geolocation conversion to allow any ClustrMaps user to get their blog or site visitor locations, but that has its own tradeoffs as you know: It’s very good for preserving privacy, in fact, as it only yields city-level info, and with a granularity that people are comfortable with. User-provided info is great, but, as you say, there are gaps to be filled! Drop us a line via the contact page on www.clustrmaps.com or DM on twitter – happy to discuss this in more depth!

blog comments powered by Disqus