Thieves who want to illegally website scrape can overcome many a company’s defenses.
Take CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), those annoying challenge-response tests that ask you to recognize number/letter combinations on many websites. They want to be sure that you are a human and not an automated machine or bot; (they can’t read them). This has not stopped clever techies who can easily get around the CAPTCHA problem and continue to website scrape easily.
The answer? The creation of CAPTCHA farms in which humans answer the CAPTCHA and allow bots to enter and easily website scrape. Optical Character Recognition (OCR) software unfortunately lets the bot read CAPTCHA; its software analyses and picks out writing or images on the document, and if it looks similar to a letter in a font installed on the computer, it creates it. The website scrape works by tracking and noting typical features of letters and characters.
The point? CAPTCHA cannot prevent the website scrape and theft of your precious data.