Snapchat - Into the Breach
NOTE: This blog has been updated with more information on whether non-US accounts could be affected
For those of us who track mobile security developments, the recent claims of hacks within Snapchat, the popular picture sending app have been of interest. However they have just got potentially much more damaging. An unwelcome New Year's guest is a website: snapchatdb.info, that has made available the usernames and partial phone number of ~4.6 million North American Snapchat users.
While the method of how this was done, and potential motive for doing it, are covered in depth elsewhere, in the mobile security industry we are concerned on what specific negative impact this disclosure would have now. The answer is considerable, especially if the full dataset is made available.
First of all the full numbers themselves, if made available, could be used as potential targets for mobile spam attacks. As we have discussed before, lists of active mobile phone numbers are worth money within the spam industry and are sold between mobile spammers. In addition, any additional types of personalised information per number allows spammers to tailor specific spam to make it more meaningful for the target - turning a blunt ‘standard’ spam message into a phishing attack - and thus increase the spam’s conversion rate. In this case, there is the obvious tactic of using the snapchat username, but there are also less indirect sources of information available just by appearing on this database, such as potential demographic, that could further refine any future attack.
Secondly even the current data set made available has the potential to do harm. Though the authors obfuscate the last 2 digits, the extant numbers can be used to ‘guide’ spammers to active mobile numbers, which are not always easily known. In this particular case, from analysing the dataset, the vast majority of mobile numbers (over 92%) have been from just 6 states, namely California, Texas ,Illinois, Colorado, Florida and Massachusetts, with just a smattering from other states. The lop-sided geographic spread of the dataset made public is of some interest - the scarcity of Snapchat numbers from high population states such as Texas means that I judge it very unlikely that this truly is the 'vast majority of the Snapchat users' as the snapchatdb.info authors state (previous estimates of the userbase made this unlikely anyway) - but its practical effect is it gives spammers a guide to potentially spamming within those areas. In this case it would have been far better to obfuscate the last 4 digits of the numbers, if the authors truly wanted to minimise spam and abuse.
For a county by county view of the accounts affected, click here, (note the above maps are derived from north american phone numbering allocations, not from the location field in the SnapchatDB dataset, which is of uncertain origin.)
Finally, like any other data breach, the other obvious implication is the re-use of the usernames and numbers by hackers for other purposes, such as attempting to log in to other types of accounts. It remains to be seen what could potentially emerge from this area.
It is greatly hoped that the database is not made public or shared with mobile spammers (the snapchatdb.info website made reference of making the full database available for those who request it), but while we can hope for the best we plan for the worst. Since the database went public, we haven’t seen an anonymous increase in spam in these states, or any Snapchat-themed spam, but it is still early, and something that we will monitor with our North American mobile operator customers.
As always, protect your data, never post your phone number on a forum or group where you think it is going to be made public, and report any spam you receive to your operator.
I have been getting some questions and reading some interesting comments on why this breach only has North American ( actually, 99.8% US numbers). The exploit will work for non-US numbers, but it is theoretically much easier to retrieve valid with US Snapchat accounts than the rest of the world. This is because the researchers got a head start with the numbering plan in the US, something that isn’t commonly available in the rest of the world.
How the breach works is via the find_friends exploit; by submitting a group of phone numbers - if user details are returned then the phone number is a Snapchat user. However you have to start with a bunch of numbers from somewhere, and this is where the North American numbering plan really helps. Gibson security, the researchers who originally identified this vulnerability, suggest iterating through a phone number sequence like (XXX) YYY-ZZZZ, where you cycle through the Z’s. Technically the (XXX) YYY of the number above is called an NPA-NXX exchange. From AdaptiveMobile’s own research, there are roughly 160 thousand NPA-NXX exchanges in the US. However a certain percentage of these are landline only or are unallocated. This landline amount can be very hard to figure out, but one very rough estimate (see  below) is that 32% of these are landline, so these can be ignored. This means, that there are about 1.087 billion potential cell numbers which need to be queried, i.e.
160,000 active phone exchange x 0.68 wireless ratio x 9999 possibilities per exchange = 1.087 billion
This 1.087 billion is allocated between the 314 million US population (giving a ‘ratio’ of about 0.28 numbers per person). That means that a brute force method of cycling through the Z’s is possible, and as the breach is shown, feasible. Once you identify the local areas you want, you can cycle through the number with a fairly good expectation of success.
Compare that with countries like the UK or Ireland. Ireland has 5 dedicated mobile codes (083, 085, 086, 087, 089), each of length 7 digits. That gives up to 50 million phone numbers, for a population of 4.6 million, meaning a potential mobile number to person ratio of 0.092. The UK has 5 dedicated codes (074, 075, 077, 078, 079) each of 8 digit length, giving potentially 500 million phone numbers, for a population of 63 million – a ratio of 0.126. Now, of course a lot of those numbers are likely to be invalid, but knowing what range is valid or not is much more difficult for these countries, not only are the odds against you, but it’s harder to know where to start when you don’t have a guide like the North American Numbering Plan to help you.
So based on this, you can see that just based on numbers alone it is roughly 2.5/3 times easier to find a Snapchat user in the US , than it is in the UK or Ireland. In addition the practical guide of using known active area codes and exchanges to find the numbers can’t be over-estimated, without it in other countries you are simply querying in the dark – this is what really makes things harder. Finally, the popularity of Snapchat varies per country, meaning that it may be harder again to find Snapchat users outside the US. Altogether it is many times harder to find a snapchat user in the UK or Ireland, but still possible.
Incidentally, I think you can actually see this logic in action in the distribution of affected users by county level. Here, it is clear that some states got heavy amount of queries (like California, Colorado), but whereas Colorado has only relatively few exchanges per the whole state, allowing the whole state to potentially be queried, in California the researchers seems to have just queried the Bay Area and Los Angeles, due to the sheer number of friends (exchanges) required to be queried.
Let’s hope that Snapchat deal with this issue quickly, so that no more friends are queried!
UPDATE 3-1-2014 13:15 GMT
The State map of affect Snapchat accounts has been updated with slightly increased counts. The previous map was generated with exchange information which did not take into account new recent exchanges allocated, meaning that some counts were slightly lower. Apologies
 ITU stats give 303,052,000 mobile cellular subscriptions and 140,989,000 Fixed landline numbers, for the US in 2012, this gives a (very) rough 2:1 ratio. Assuming that all numbers within assigned exchanges are allocated similarly, that means that 68% of them are wireless. Feel free to query this or suggest better values.