As the commenter said, they rotate IPs. It is not that easy. I've also been on the other side of a sophisticated attack like this. The really savvy adversaries do the following, at least:
1. Rotate through several thousand to several hundred thousand noncontiguous, geographically distributed, residential IP addresses,
2. Associate each IP address with a single user agent and suite of cookies,
3. Associate each IP address with a particular target username,
4. Only attempt a few incorrect logins at a time, and a somewhat random (albeit realistic) number at that, within a given time interval,
5. Use random, apparently human delays between successive requests,
6. Issue requests using extremely high fidelity simulacra of web browsers, customized to the sequence and structure of HTTP requests on the website.
When the stakes are high this is the kind of opposition you'll get. Bank account takeover, social media account takeover, ticket scalping, automated sneaker buying, financial research, market research, etc.
Recaptcha introduces unpleasant user friction, but it usually works well. To invert a popular turn of phrase, it makes stopping simple attackers easy and hard attackers possible. The most sophisticated attackers will still lease reputable Google accounts and mechanical turk time to bypass Recaptcha challenges, but it will be expensive for them.
Technical sophistication is only one dimension of this game. The other is making adversaries spend more money than they can gain from being successful.
Most likely related to high-demand, limited run sneaker "drops", which people then resell on the secondary market. Sneaker-scalpers, if you will. It's a problem because it prevents legit buyers from getting in on the sale.
Edit: OK, I know, limited editions. Like numbered and signed prints. But it's arguable that people who want them the most will get them. Even if it's just for resale. Doesn't seem like the seller's responsibility.
The real problem is supply. Popular tickets are scalped because there's only so many tickets. Then unpopular tickets are scalped because it was so easy to scalp the popular ones.
There's only so many sneakers that can be made: making more chews up the supply chain for something which isn't _truly_ being consumed.
To be clear, I don’t have a horse in this particular race. I’m neither condemning nor condoning the market dynamics of sneaker arbitrage here. It’s just an example I’m very familiar with because I used to write scrapers and I’ve been offered silly amounts of money to make them for sneaker trading groups. Not as much as hedge funds will pay for writing crawlers for market research, but still more than you’d probably expect just so they can flip Supreme shirts and Yeezys faster than competitors. It’s ridiculous, but this is the world we live in. The point is simply that when the stakes are high (particularly when there is money to be made), stopping adversaries will be really, really difficult.
Back to the point at hand, I don’t like recaptcha in principle. But given my view from both sides of the table, it’s one of very few things that consistently works for sophisticated adversaries. It’s about as close to a silver bullet as they come, with the additional upside that it’s the absolute easiest thing to implement - in both an absolute sense and relative to the return. And once you have, most of what you can implement beyond recaptcha has diminishing returns in comparison.
All of that being said, I would be inclined to agree that most websites and apps don’t need recaptcha, simply because most of them aren’t worthwhile targets for the types of attacks recaptcha is singularly effective against.
Extremely common. Most hedge funds buy what's called "alternative data" from vendors who aggregate it, like 7Park. The data is collected by providers who collect it from location telemetry, web scraping, satellite imagery, etc. Scraping from web applications is one of the more common forms.
The more successful quant funds will often build out internal research teams to do this. For example, both Two Sigma and Millennium have (not so well advertised) research teams devoted to this kind of data collection internally.
Other commenters have basically answered already, but to be clear Luminati is not the only provider, just the most infamous. It’s very easy to find others of greater or lesser reliability. Search “residential IPs proxy” and you’ll find many vendors.
But yes, the whole cottage industry is sketchy. Almost all providers are leasing users’ computer with outright malware or shady TOS. The savvy play is to release a free game, app or even SDK which will then opportunistically route requests from the control server through the user’s device.
Recaptcha solving APIs are frequently bundled with the more reliable and premium services of this kind. They introduce a lot of latency since there’s a real mechanical turk across the world solving it for you, but they basically work.
> My name is Lior and I'd like to offer you a new way to make money off your software. The Luminati SDK provides your users the option to use your software for free by contributing to the Luminati proxy network.
> We will pay you $3,000 USD a month for every 100K daily active users.
> No collection of users' data, no disruption of user experience.
> I'd like to schedule a 15 minute call to let you know how we can start. Are you available tomorrow at 12:30pm your local time?
Users of free mobile apps (mostly games) are offered the option of allowing use of their devices as proxies as an alternative to being interrupted by ads.
Sounds like the answer is to increase the response size for failed login requests. At $12.5/G, if you blow up your response to a mega byte, they'll spend about a cent per try - close to the rate they'll need to pay to have recaptchas solved by humans.
Most criminals are not using services like Luminati - they are using actual botnets made up of compromised computers. In that case, their bandwidth costs are far cheaper than yours.
I don't know how expensive it is with Google/AWS, but I'm paying about $1.50 per TB at my non-cloud-host (vs their $12500/TB), so if it comes down to it, they need to outspend me by multiple magnitudes. Sucks, but still cheaper than losing customers due to hyper-annoying Recaptchas, and I doubt that somebody is willing to stomach $12500 cost to make me suffer $1.50 ... I'm sure there would be more efficient attacks ;)
>The most sophisticated attackers will still lease reputable Google accounts and mechanical turk time to bypass Recaptcha challenges, but it will be expensive for them.
You can also just pay people to solve recaptchas all day.
However that still contributes to security. If you make the expense of bypassing it greater then the worth of accessing the site then you have stopped them.
1. Rotate through several thousand to several hundred thousand noncontiguous, geographically distributed, residential IP addresses,
2. Associate each IP address with a single user agent and suite of cookies,
3. Associate each IP address with a particular target username,
4. Only attempt a few incorrect logins at a time, and a somewhat random (albeit realistic) number at that, within a given time interval,
5. Use random, apparently human delays between successive requests,
6. Issue requests using extremely high fidelity simulacra of web browsers, customized to the sequence and structure of HTTP requests on the website.
When the stakes are high this is the kind of opposition you'll get. Bank account takeover, social media account takeover, ticket scalping, automated sneaker buying, financial research, market research, etc.
Recaptcha introduces unpleasant user friction, but it usually works well. To invert a popular turn of phrase, it makes stopping simple attackers easy and hard attackers possible. The most sophisticated attackers will still lease reputable Google accounts and mechanical turk time to bypass Recaptcha challenges, but it will be expensive for them.
Technical sophistication is only one dimension of this game. The other is making adversaries spend more money than they can gain from being successful.