There has been a lot of discussion recently about the effect fake Twitter accounts have on brands trying to keep track of social media engagement. A recent tweet spam attack offers an instructive example.
On the morning of October 1, the delegates attending the Strata Conference in London started to notice that a considerable number of spam tweets were being sent using the #strataconf hashtag. Using a tool developed by Bloom Agency, with data from DataSift, an analysis has been done that sheds light on the spam attack directed at the conference.
The following diagram shows a snapshot of the Twitter conversation after a few tweets had been received containing the #strataconf hashtag. Each red or blue line represents a connection between two Twitter accounts and shows how information flowed as a result of the tweet being sent. By 11 a.m., individual communities had started to emerge that were talking to each other about the conference, and these can clearly be seen in the diagram.
The diagram below shows a further visualisation, this time after 30 minutes of listening to the conversation. In an organic conversation, developing of its own accord, you would expect to see lots of random connections and a number of communities spread across the network.
If we zoom into the network to seek out the spammers the tool has identified, we start to see some different patterns, as shown in the diagram below.
The spammers are not involved in the conversation, but exist on the fringe of the conversation. They aren’t able to get a message directly to the people tweeting about #strataconf, as those accounts don’t follow the spammers, but by putting #strataconf at the beginning of their tweet, the clear intention is that those searching for tweets about the conference will pick up on their content.
If we pull out just those accounts we identify as spammers, we see a far-from-random pattern emerging. These patterns are well known to the researchers at Bloom Agency and are used to train the tool to identify and spot potential spammers. The spammers’ network is too highly organised and shows too much structure. There isn’t enough randomness in this network: it has clearly been generated for a purpose, and likely by a computer.
By identifying spamming accounts through how much structure they bring to the network, the tool can produce a list of true influencers or a list of true followers, rather than including a list of fake accounts.
For example, at 11:15 on the Monday morning during the conference, a tweet from @MarieBoyd14 was flagged as suspicious. It said:
“#strataconf Can not believe I ran across this kind of http://t.co/79fGWudr”
If you search for @MarieBoyd14 right now, you’ll find the account has been suspended. The account was seemingly suspended within minutes of posting the tweet.
The same shortened URL was posted six times in quick succession, between 11:15:43 and 11:17, before the account was suspended.
The first tweet picked up here, at 11:15:43, was shown as the user’s 78th tweet: the account had not been active for very long. By the time the sixth tweet featuring this shortened URL was observed, the tweet count was up to 93. Even the most prolific of conference tweeters couldn’t manage 15 tweets in less than two minutes, unless their finger got stuck on the “tweet” button.
Another tweet that began with the hashtag was received five seconds after @MarieBoyd14’s, at 11:15:48. The tweet was from @RosalindaKline8. Again, if you search for this account, you’ll find it’s been suspended. The tweet said:
“#strataconf I can’t believe this… Is the real deal? http://t.co/GKc4rnr5”
Although the format of the t.co link is different, this link directs the user to the same domain: the
@RosalindaKline8 tweeted this link, with different text, seven times between 11:15 and 11:19. This account fits the same profile as the @MarieBoyd14 account, where the account is relatively new, posts up to 100 tweets very quickly, and is then suspended.
Two clear patterns emerged. First, the accounts being used to generate the messages were named after females with a number at the end of the account. Next, the messages all started with the conference hashtag.
In a 30-minute period, 424 tweets were recorded from 140 different accounts, at a rate of 14 tweets per minute. On deeper investigation, it was found that all the spammer accounts had IDs starting with 85613, suggesting the accounts had all been created around the same time. The accounts were all seemingly suspended within a few minutes of the last tweet being sent.
In the 30-minute time period being discussed here, there were 750 tweets recorded, from 306 different accounts, at a rate of 25 tweets per minute. More than half the tweets were from spammers: discounting the spammers, the rate would have been around 10 tweets per minute.
Another link being propagated by these accounts was to the URL: http://yourson999.tk/rivers.php. On investigation, it was found that this site is generating headers with the HTTP 203 response, rather than the 200 or 301 header response we expected. This suggested something unusual was going on. Upon further inspection, it was found that the URL was directing traffic to different end points, seemingly at random. Each time the URL was generating traffic to third-party ecommerce sites, and each time with an affiliate referrer attached. This was likely an attempt to direct traffic to ecommerce websites while securing affiliate referrer fees for the organisation or individual behind the attack.
On the surface, a tweet spam attack may seem like a limited hindrance, but there’s an important repercussion that needs to be considered. The spammers had a big impact on the basic metrics used to measure the spread of the #strataconf hashtag. Without a spam filtration embedded within a social media listening tool, the tool is in danger of giving inflated figures to the organisation using it. If these figures are used by brands to make decisions about future campaigns, the spammers can change the numbers so much that the wrong decisions could be made.
Peter Laflin is Head of Data Insight at Bloom Agency, an integrated marketing agency based in Leeds, UK. Peter is interested in using big data to predict how consumers behave and how predictive modelling can be used to gain a commercial edge.