Captcha another way it’s called Completely Automated Public Turing Test to Tell Computers and Humans Apart. We are going to describe captcha based on its versions from V1(oldest) to V3(latest) how it gets here and how the arms race has begun.
Table of Contents
Back to the history
Well, We all are familiar with these images. It annoys us when we see them. But they are necessarily evil. Because it helps us to stay secure by distinguishing between bot and human.
It all started in the 1980s originated as a method of preventing content from being easily searchable. It worked around obstacles like profanity filters a youth that is still common to this day. But modern CAPTCHA didn’t come around until the late 1990s.
When the popular search engine AltaVista was trying to find a way to prevent BOTS or automated computer programs. They wanted to avoid tons of spam and malicious URLs to their linked database. According to the situation, they put some kind of barrier in place and approach the problem by thinking about something. Both humans and computers were good at namely optical character recognition(OCR) introducing elements. That made the task much more difficult for computers.
While keeping it fairly easy for humans and since then computers of the day could only recognize clear easy to read the text. AltaVista’s engineers forced the users or the bots to read a puzzle with distorted misaligned text with random lines. The other feature was “mark to submit a URL” to the database.
Continues to be quite popular along with audio CAPTCHA for the visually impaired. That was a similar being typically includes spoken letters that were somewhat garbled to defeat automated sound analysis. As many website administrators are aware that VPNs are a popular tool that scammers can use to conceal them. Scammers can be used to conceal their IP address is more likely to trigger a CAPTCHA prompt. But there’s a bit more to bot with confusing image capture scripts also need to be written securely. So that the correct answer isn’t available to the bot through a backdoor.
A few years later a version named reCaptcha came along. which is now owned by Google. But it is free for use now on our websites.
There are 3 versions of reCaptcha available. Version 2 mostly using or sometimes we may find ourselves using. But some of us don’t even know. Though some of us now using version 3 and it works completely different way which is explained along with another version below.
This version of reCaptcha may take many forms. But here are the most common one. This Captcha used to help scan old books and newspapers. It worked as the developer team would scan text from some books and test. They would send one word that they knew about with adding some distortion on it.
In other word, scanning systems weren’t sure about. But the user had typed in both the known words and the unknown words. The known word simply gets checked to make sure it’s a human by answering the captcha. Then comes the unknown one was verified based on the log of a dozen people who agreed to what the word was. Then Google ended up buying The reCaptcha V1.
With reCaptcha, the arms race was well on the way. The bot makers took reCaptcha as a challenge. First, they could train computers to read those messed up words. Even a fairly low-end system could solve reCapture well enough.
Bots makers were creating fake accounts and sending spam. Sometimes the test was still too difficult though they could just send those back to humans to solve. The bot makers set up automated system BOTS would fill in all the details ready to send spam. Then when the capture appeared the bots would show it to human operators hired from countries. People sit there and solve Captcha to have some income. Even in somewhere we could even outsource by Captcha solving.
Dozen of companies who all competed on price and accuracy. We still can or we could get unsuspecting members of the public to solve captures. We could set up a website with some images that some people might want to see. Before they could visit they would have to prove they were human by solving a capture. This was copied straight from whatever the site BOTS were trying to get into. So, Google then released recapture version 2.
Where we present a single checkbox that we have to click on to prove we are not a robot nor any BOTS. Presented with that box would be honest clicking the box by completing one of these.
New CAPTCHAs extra data is sent and Google is very cautious about what that data is. Because everything they reveal is a clue for the people trying to break it but that box is loaded into our browser from google.com. That means it can look at any login cookies that Google already have in. Our browser certainly clear cookies if we are way more likely to get that secondary check. That asks us to identify buses or fire hydrants.
Sometimes it identifies how cursors move the moments before clicking on the box in the exact position. Length of time bunch of other things that Google all feeds into their giant machine learning system. The only people that know for sure are the designers and they aren’t telling the Captcha solving services.
Of course, they are already offering a cost per thousand to solve these in maybe harder. But it’s not unbreakable. Using machine learning BOTS can be trained to pass those secondary checks themselves. To hide as a human well identifying the correct sections of the presented images.
We have sometimes fight fire with fire. Well, ReCaptcha v3 is quite like it.
At the end of 2019, Google released reCaptcha version 3 and we may already have passed or failed one of these without knowing it. No box to take no puzzles to solve when we browse around a site with Recaptcha version 3.
It works in the background and watches what we do by the time. When we post our comment or signing up, it’s already assigned us a score based on how likely we are. To be human and again Google is being very careful about saying how they’re working that out with the answer very likely.
It’s a machine learning system they’re throwing everything into it and they don’t know how it works either. But form the known part it gives them a score base on our behavior. The more we score the more likely we’re a human to the reCatcha.