Archive for August 19th, 2010

Twitter #hashtags and the Australian election

Posted by Laura on Thursday, 19 August, 2010

Can you learn about an election by its Twitter #hashtags ? I’m curious to see if the #hashtag use on Twitter matches with what people see as the major issues that the newspapers and news media covered. (I’m also interested in seeing what #hashtags they ReTweeted but that will likely have to wait for another post.) There are a couple of challenges when doing something like this… and be wary of anyone not spelling out their methodology because if you don’t know their methods, you can’t fairly evaluate their results and subsequent conclusions. Social media research is very much creative research that takes place in the moment. While you may not ever be able to get the exact same results, when you repeat a person’s methodology, deviations should be explainable. Anyway, the challenges and assumptions with doing a #hashtag analysis of the Australian elections include:

  • Incomplete tweet set for keywords: I only get what is available from Searchtastic.
  • Incomplete tweet set based on keyword limitations: Even supposing I could get all the tweets related to a keyword, I can’t find every reference to the Australian elections as there are too many possible words that could reference the elections and all the candidates running nation wide.
  • Incomplete tweet set because of time: Some keywords were searched for earlier and some later. Not all keywords were looked at over the same period.
  • Irrelevant tweets: Keywords like #greens may refer to the Green Party in the United States. Liberals may refer to American or British liberals. Unless every tweet is examined, irrelevant tweets will remain in the data set.

To offset some of these problems, a large data set was acquired. (Because a lot of people are tweeting about the elections. I see way more election tweets than footy tweets.) These tweets were acquired using Searchtastic and the following keyword set: #Abbott, #alp, #alp vote, #arbib, #aus2010, #AusLabor, #ausvotes, #ausvotes abbott, #ausvotes gillard, #boatphone, #GILLARDTINED, #greens, #laborfail, #masterchef abbott, #masterchef debate, #masterchef election, #masterchef gillard, #myliberal, #nocleanfeed , #NPC, #ozvote, #qanda, #spill, #tcot, #workchoices, @AustralianLabor, @LiberalAus, @TonyAbbottMHR , abbott math, Abbott-Gillard, asylum boat, australia vote, canberra election, debate abbott, debate gillard, debate winner, Following: @JuliaGillard, Following: @SenatorBobBrown, Following: @TonyAbbottMHR, Gillard, greens brown, Greens Canberra, greens election, Gruen Report, Gruen transfer, hockey liberals, immigration australia, Julia Gillard, JuliaGillard, Krudd Labor, KRudd Liberals, Labor darwin, labor leaks, Labor Liberals, Labor Tasmania, Liberals Tasmania, libs abbott, libs darwin, Libs Howard, marginal electorate, marginal seats, NSWLabor, perth vote, preferences vote, Rudd Liberals, Sex Party, stimulus liberals, sydney elections, Tony Abbott, tony boat phone, tonyabbottmhr, Truss LNP, Truss nationals, Warren Truss, Work Choices Australia, WorkChoices .

These keywords represent the major parties and candidates, some of the major issues, #hashtags that I saw on my Twitter feed, and different geographic areas around the country. Searches were run between July 19 and August 19. A total of 57,977 tweets were collected. The elections were called on July 17. To make sure that the collection of tweets pertain to the elections, all tweets made before July 1 were removed from the data set. (Methodology: Sort tweets by date on Excel. Remove those tweets not between those dates.) By then, everyone knew the elections had to be called and conversation regarding them had started. This takes the total tweets in the data set down to 21,071. That’s still a fairly large collection of tweets to work from. The next step is to remove duplicate tweets from the data set. (On Excel, Data -> Filter -> Advanced Filter -> Unique records only.) This brings the total tweets down to 18,462. That’s still a lot of tweets.

The next step is to extract #hashtags. To do this, I copy and pasted all the tweets to Notepad. I ran a find and replace for [space]# and replaced with [tab]#. I copy and pasted these back, removed all cells that did not start with a #. After this was done, the data set was copy and pasted back to Notepad. Another find and replace was done, this time, [space] was replaced with [tab]. This was pasted back to excel and all cells that did not start with # were deleted. When this was done, 33,857 total hashtags were found. Symbols like , ! . – ? were removed from those #hashtags. This was done to make that #labor. and #labor were treated the same for counting purposes.

A list of unique hashtags was then attained of which there were 2,678. The following table includes all #hashtags that appeared 250 or more times on the list:

Tweet Count
#tcot 5157
#ausvotes 2449
#tlot 1816
#p2 1761
#teaparty 1704
#GOP 1172
#ocra 950
#Libertarian 920
#News 867
#ucot 705
#politics 687
#Israel 536
#sgp 536
#iamthemob 367
#jcot 308
#roft 304
#qanda 303
#cdnpoli 293
#Twisters 261
#USA 209
#Obama 194
#MyLiberal 187
#debate 183
#flotilla 171
#Gaza 155
#masterchef 154
#energy 149
#hhrs 147
#green 134
#rpn 121
#912 111
#aus2010 111
#NPC 108
#rootyq 101
#AUSlabor 95
#oilspill 94
#topprog 94
#nocleanfeed 92
#fb 88
#rightriot 80
#laborfail 79
#spill 76
#US 76
#jews 75
#ALP 74
#terrorism 74
#Greens 72
#cspj 70
#tpp 70
#BP 69
#antisemitism 68
#Gillard 67
#p21 67
#openinternet 66
#FF 65
#FollowFriday 65
#oil 65
#Muslim 64
#ireland 63
#Abbott 62
#UK 61
#Australia 58
#Iran 58
#Palin 57
#Blog 55
#Europe 54
#quote 54
#dnc 52
#fail 51
#iranelection 51
#jlot 51
#hcr 47
#MentalHealth 47
#justsayin 46
#rs 46
#eco 44
#boatphone 43
#glennbeck 43
#islam 42
#zionism 42
#NBN 41
#palestine 41
#jobs 40
#environment 39
#Autism 36
#Hamas 36
#Health 36
#military 36
#tiot 36
#dem 35
#AZ 34
#Lebanon 34
#vote2010 34
#dems 33
#ausdebate 32
#bonjovi 32
#climate 32
#judaism 32
#lateline 32
#MoFo 32
#videos 32
#foxnews 31
#theview 31
#730report 30
#hasbara 30
#nz 30
#travel 30
#tweetcongress 30
#YWC 30
#conservative 29
#Gulf 29
#patriottweets 29
#ampat 28
#dublin 28
#property 28
#beck 27
#Free 27
#IDF 27
#ronpaul 27
#acon 26
#jewish 26
#twibbon 26
#bds 25
#CNN 25
#Obamacare 25
#rush 25

Looking at this list, there are some phrases that are likely not Australian or not uniquely Australian. This includes #tcot, which stands for Top Conservatives On Twitter. The term was amongst those searched for because it appeared in a few tweets that also included the #ausvotes #hashtag. A google search for #tcot #ausvotes only brings up 8,120, which further supports the idea that this isn’t really an Australian election term. #tlot was not deliberately searched for in terms of trying to include it. If you put #tlot #ausvotes into a google search, you get 244,000 results which suggests heavy Australian usage. You could probably remove #teaparty, #GOP, #Libertarian, #Israel, #Twisters, #USA, #Obama, #flotilla, #Gaza, #912, #oilspill, #US, #jews, #BP, #antisemitism, #oil, #Muslim, #ireland, #UK, #Iran, #Palin, #Europe, #dnc, #cdnpoli, and #iranelection. They are unlikely to do with the Australian elections.

If that’s agreed upon, then it looks like top issues based on #hashtags … the internet and its openness? It doesn’t look like there was any large scale usage of #hashtags around issues. Instead, it appears that #hashtags were used to label tweets that discussed the election, were used to discuss specific candidates and to discuss specific parties. Issue based discussion may have been secondary to Twitter discussion.

And if that’s true, and going further with that idea, it could validate the messaging used by Labor and the Liberals to largely mount attacks on each other. People on Twitter are heavily engaged in discussing politics but not the issues. It may also justify the work of GetUp!, which strives to bring attention to specific issues in Australia.

If you want access to the Excel file with all the tweets, please comment or send me an e-mail. The file is about 24meg so I didn’t upload it.

Related Posts:

What domains in #afl related tweets get the most [email protected] ReTweets?

Posted by Laura on Thursday, 19 August, 2010

This post is a follow up to What #afl related hashtags get the most RT @ ReTweets? and Want to be Retweeted? Add URLs to Your Tweets!. This post looks at the what domains get the most retweets in tweets related to the AFL. It borrows the methodology (and assumptions) from the previous post with two major differences. First, the tweet collection had 10,412 total tweets instead of 8,523 at the start. (More recent tweets and more tweets.) Second, a new tool was needed to lengthen short urls. The tool that was used was Long URL please, a FireFox extension. Short links were copy pasted from Excel to a draft document, urls were lengthened and then pasted back. (When URLs were not lengthened, they were manually visited to get the real url.)

That out of the way, there 10,412 total tweets that could be examined. The list of keywords used can be found at using my raw data. Duplicate Tweets were removed bringing the total to 6,313 unique tweets. Tweets that did not include a url were removed. This brought the Tweet total down to 2,892 tweets. (About 46% of AFL related Tweets include a URL. This compares to 52% of AFL related Tweets with #hashtags.) There were 2,941 urls in these Tweets, with 2,582 unique urls in these Tweets. Of these, there were 623 unique domains. For all AFL related Tweets with urls, the following domains were the most popular:

Domains Count 248 140 115 112 101 67 67 65 64 60 51 49 47 41 37 36 34 32 27 26 26 25 24 23 21 19 19 19 19 18 18 18 17 17 17 15 14 14
http://https: 14 13 13 13 13 13 12 12 12 12 11 11 10 10 10 10 10

Of the Tweets with urls, only 342 of them were ReTweets. That means about 5% of AFL related ReTweets contain domains. (Compare that to ReTweets containing #hashtags: 8%.) There were 150 domains mentioned Amongst those, the following domains were the most popular:

Domains Count 23 17 15 13 10 10 9 8 8 7 7 7 6 6 5 5 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2

The domains that are ReTweeted are interesting. It is really easy to link spam on Twitter and some people and organizations do. If you look at some accounts, they post view TwitterFeed. Beyond that, you can tell when looking at a URL list that automated link posting is happening because the account may be using a URL shortener that isn’t as popular and the URLs tend to follow a sequence. The AFL and various news services put out URLs and they just aren’t getting ReTweeted at the same rate that they are getting posted. What is getting ReTweeted? People’s pictures and club related links.

Looking at both the domain list and the Domain ReTweet list, one thing that stands out for me is the lack of Wikipedia links. Wikipedia ranks really high on Google and other search engines. The AFL related content is highly ranked. The AFL articles are often really, really, really good and they are updated frequently. They can be really useful when looking for historical information. Thus, I’m surprised that there were only 3 references to Wikipedia. What gives? Why isn’t Wikipedia being cited more?

Moving on, 488 accounts were mentioned in the 343 ReTweets with URLs. The following table gives an idea as to who the most popular mentions were:

Mentioned Count
@essendon_fc 22
@sydneyswans 20
@hawthornfc 12
@stkildafc 12
@AFL 11
@superfooty 8
@Footyfree 7
@mashable 6
@3AW693 5
@adelaide_fc 5
@blackpolitics 5
@Collingwood_FC 5
@StevenBaker10 5
@GoldCoastFC 4
@heraldsun 4
@PAFCNews 4
@PatDollard 4
@watoday 4
@antonKart 3
@Carlton_FC 3
@DemonsHQ 3
@iamdiddy 3
@mydogateart 3
@myfabolouslife 3
@northkangaroos 3
@redcafesd 3
@reggie_bush 3
@ruanji 3
@Sportsnewsfirst 3
@TeamHippo 3

That Essendon and Sydney were amongst the most mentioned doesn’t come as a surprise: Their domains were highly ranked. What is a bit surprising that the Blues aren’t highly ranked; they ranked highly for mentions with #hashtags. St. Kilda ranked highly on both lists. I half want to attribute this to the idea that their audience may be more Twitter savvy than other clubs’s fans. I’d have thought players might have been higher on this list but they really don’t appear to be a factor. This may suggest that players are not Tweeting urls or that content isn’t what their fans are interested in.

The lesson of these two posts is that if you want your AFL related tweets reposted, be partisan in your support and link to official club content that the news services haven’t yet written about yet.

Related Posts: