The role of big data in securing online identities


A lot is being said about the benefits big data and analytics. Everywhere companies are using big data and analytics to better understand customer behavior and improve products and offer better customer service (and display ads that are more likely to influence the viewer and produce better click through rates). Analytics are being used to make better market predictions, improve efficiency and utility, decrease costs and consumption levels…

Big data is also riddled with privacy issues, which is being talked about as well. Things such as habits, personal information and sensitive health information being collected by service providers without the consent of consumers have become common controversial issues. Just recently, Oculus Rift was brought to highlight because of its murky data collection policy. And don’t get me started on the privacy issues about privacy issues introduced by the Internet of Things.

But little is being said about how big data can and is transforming the authentication landscape. In fact, in my opinion, the security benefits of big data are relevant enough to dwarf the privacy issues (although that doesn’t mean those concerns shouldn’t be addressed). Big data is achieving something that no other authentication mechanism has done before: define the digital identity of the user in a way that can’t be stolen or replicated by malicious hackers. In this post, I’ll describe how analytics-based authentication systems are helping improve online identity security without causing unnecessary friction and frustration.

The problem with password-based authentication

Since the dawn of computers, we’ve been protecting our digital data and identities with passwords. And at the same time, hackers with malicious intents have been stealing or breaking our passwords to gain access to that information.

So we learned to use complex passwords.

And hackers developed tools to break complex passwords.

So we had to create even more complex passwords. Nowadays, there are so many criteria involved in creating safe passwords that a considerable number of people forgo respecting them, hoping against hope that their account doesn’t catch the attention of hackers. At present, a good password has to be at least eight characters long, involve both uppercase and lower case letters, numbers and symbols. It must not be a variation of any single word (such as p@$$w0Rd) because those can be discovered as well (you should use something like “H0w+oTr@!nY0urDra9on” a variation of “HowToTrainYourDragon” – that’s not my password, by the way). It must not be based on information that can publicly be found on the internet – such as your birthday or your favorite music band (it’s funny, in this regard, big data is contributing to the complexity of passwords).

And oh, you shouldn’t share your password across different accounts, because if one of them gets hacked, other can be compromised with the same password. So if you have ten online accounts (which most people do), you should have ten completely different passwords. Memorizing so many complex passphrases is frustrating for most average users, so you’ll either have to forget about it, or you’ll have to keep a list of your passwords somewhere.

Did you store them on a file? Big mistake if you did. You’ve just created a juicy target for hackers to go after. If you stored it on a piece of paper, that’s not very good as well because papers tend to get lost and found by the wrong person.

I can go on for days, but I think you get the point: passwords per se aren’t enough to protect you.

The problem with traditional two-factor authentication

One of the earliest remedies that was created for the password problem was two-factor or multifactor authentication. The basic idea behind two-factor is to authenticate users with something they know (the password) and something they have. The second factor can be anything such as physical keys, fingerprints, eye scans, or a one-time passcode sent through SMS to a mobile device registered with the account.

The advent of 2FA/MFA helped improve the security of accounts dramatically and made account hacking a lot more difficult. But it still had its own set of problems. For one thing, many of the methods used have their own set of challenges: physical keys can be lost or stolen, fingerprints can be lifted off a glass touched by the user, eye scans can be taken from hi-res pictures, phones can be stolen, SMS messages can be forwarded to other phones, etc.

But above that, 2FA introduced a lot of friction and unwanted complexity to the user experience. So instead of just typing their passwords and accessing their accounts, users were required to go through two or more steps to prove validated their identity. While they might find 2FA fun and James-Bondish the first few times, most users will get frustrated and annoyed very soon when every access to their email account will require them to enter their password, pull out their phone, read the SMS number and type it again in their browser.

That’s why a considerable number of users forgo the 2FA options in order to keep the experience fast and smooth (and consequentially insecure).

How big data solves the problem

Where big data shines is to help improve account security while also avoiding to cause disruptions and frustrations to the user.

Authentication through data analysis is also known as Adaptive Authentication or Risk-based Authentication. The basic idea is to collect data about the user habits and environment in order to establish reliable criteria for normal user behavior and to spot risky or potentially-malicious behavior, and to prevent the compromise or hijacking of accounts – all without requiring any intervention from the user.

Once integrated in a software or online service, adaptive authentication systems sit in the background and start collecting information without bothering the user. The collected data comprises user behavior and device profiles. User behavior includes keystroke habits, mouse dynamics, user interface interaction patterns, times of day and intervals accounts are accessed, etc. Device profiles include IPs of devices that access the account, types of devices, operating systems, geographical locations, times of day or days of week where each device accesses the account, etc.

The gathered information enables the platform to create a unique profile, or a digital fingerprint, that defines the owner of the account. Once the profile is defined, every interaction with the account is recorded and evaluated against the defined boundaries and criteria, and is subsequently given a risk score according to its conformance to the established rules. Lower risk factors indicate the actual owner is accessing the account, and no obstruction will be made to the user experience. High risk factors indicate the account might have been compromised, and are followed by according actions, such as challenging the user with a 2FA or blocking access to the account altogether.

For instance, let’s say an account is usually accessed from the U.S. at noon on weekdays. If an attempt is made to log into the account from China on Sunday at 1:00 am, the risk of the account being compromised is very high, and the user is challenged with a fingerprint or physical token authentication, or whatever else that has been defined by the owner or organization. It may be that it’s the actual owner trying to access the account while being on a business trip, in which case there’ll be no problem responding to the challenge. But if it’s a hacker who has somehow gained access to the entry password from thousands of miles away, they’ll have a hard time producing the physical key that is residing in the actual owner’s pocket to continue their illicit access to the account.

The best thing about authentication through big data and analytics is that it’s extremely hard to become compromised. There’s virtually no way a hacker can impersonate you by replicating your habits, IPs, geographical location, devices… all at the same time.

There no such thing as absolute security, but for the moment, this is as safe as it gets. Already, adaptive authentication is being offered in several flavors and is being used in one way or another by several major online service providers. Hopefully, its evolution and widespread adoption will help prevent the repeat of the countless account thefts and data breaches we’ve witnessed in previous years.


One comment on “The role of big data in securing online identities

  1. […] technologies to deal with the different facets of online fraud, and hopefully make it impossible. Advances in big data, analytics and machine learning are a major contributing factor that are helping develop models that help detect fraud in a much […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s