Archive for June 28th, 2010

Problematic gathering of Foursquare data

Posted by Laura on Monday, 28 June, 2010

One of the challenges of social media metrics is identifying what numbers matter, how to get those numbers ,  how to organize that data in a way that facilitates quickly getting the data and making it useful, and making sure your data set is complete.  The latter can be problematic as people can always create new communities, groups, hashtags, accounts, etc. If you don’t organize your data in a useful way and regularly update it, you can create such a huge mess as to make your data almost unusable.

This is a problem I’ve run into with my AFL and NRL Foursquare data.  When I first started gathering this data in late April, I spent a day or two looking for all the venues.  Slightly problematic issue arose in that not all venues had been created.  I never went back to regularly check to see if these venues had been created.  What this means is that my NRL data has several huge holes in it, because venues don’t exist or the venue that does exist was not the more popular of the ones created.

Another problem was the data was not collected in a way that I found entirely logical when I revisited it to try to create a table to show the average number of checkins at home and away matches for the NRL.  (I wanted to do the NRL first, before I tackled the AFL because I’m focusing on the AFL and trial and erroring on the NRL seemed wiser.)  I gathered the total checkins and unique visitors every Thursday through Monday night for all venues that played NRL and AFL games that I had identified.  In hindsight, this wasn’t the best way to go about this.  I should have identified everything by games being played as it would have made processing the data much, much easier and I wouldn’t have as much “garbage” data that I have to wade through.  I’ve spent most of the morning correcting this mistake by identifying games and venue locations so I can more easily and efficiently track total checkins for AFL games and some NRL games.  (Later, I can try to do this when the A-League, W-League and NBL start up.)

Looking through existing Foursquare data though, I really don’t know if I will want to process it.  I’d almost rather go through the last quarter of the season, where I know I have a complete data set than try to piece together the data dating back to late April.  I probably won’t do that but I’ll likely have to figure out what to do.  It isn’t pretty and I’m really kicking myself for what could have been a lot of time wasted each data gathering data that I can’t use.

My gowalla data faces similar issues.  The big difference there is I’ve always known I haven’t had a complete data set and it was through processing World Cup data on Gowalla that I realized my collection issues with Foursquare data collection.  I’d love to use Gowalla for AFL/NRL analysis but it isn’t going to happen.  I mean, it really isn’t going to happen, especially as Foursquare is the bigger priority as it has greater penetration in Australia and I never had a complete venue list for Gowalla.

Related Posts: