Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

All comes down to uptime.

One box can’t be distributed across multiple racks in the data center to guard against downtime if a switch crashes. Never mind that—one box can’t be deployed across multiple data centers. If you deploy to multiple DCs you can fail over if one DC starts having issues.

Then there’s deploys. Do you canary your deploys? Deploy the next release to a subset of production nodes, watch for regressions and let it ramp up from there? Okay, I’ll give you that one, it could be done on one big box.

In any case, payments aren’t CPU intensive but it’s a prime case of hurry-up-and-wait. Lots of network IO, so while you won’t saturate the CPU with millions of transactions on the same box, I could easily imagine saturating a NIC. Deploying to shared infrastructure? Better hope none of your neighbors need that bandwidth too.

One transaction likely involves checking account and payment method status, writing audit logs, checking in with anti-fraud systems and a number of other business requirements.

(I lead a payments team, not at Uber but another major tech company)



> One box can’t be distributed across multiple racks in the data center to guard against downtime if a switch crashes. Never mind that—one box can’t be deployed across multiple data centers. If you deploy to multiple DCs you can fail over if one DC starts having issues.

Wouldn't you just have multiple NICs on one box for redundancy there? With any backups being sent a database write-log for replication?

> n any case, payments aren’t CPU intensive but it’s a prime case of hurry-up-and-wait. Lots of network IO, so while you won’t saturate the CPU with millions of transactions on the same box, I could easily imagine saturating a NIC.

If you're vertically scaling, wouldn't you just have the main database server host the database files locally, using fast NVMe SSDs (or Optame), in the box itself, instead of going over the network?

Enterprise NVMe drives can perform 500,000-2,000,000 IOPs, with about 60us latency. And Optane is about 4x faster. Why would a database server need to saturate network bandwidth?

Anyways, I'd love to see the actual SQL query for one of their transactions...


Wouldn't you just have multiple NICs on one box for redundancy there?

What happens when the FBI raids the DC to confiscate the servers of another person, and also takes yours? https://blog.pinboard.in/2011/06/faq_about_the_recent_fbi_ra...


I'm largely referring to RPC calls, not DB queries. Many of those calls won't even be to services you control and may well be HTTP calls to other companies.


All comes down to uptime.

20 years ago we had 1000+ days uptime on DEC kit. No one was even impressed by 500 days. Nowadays people build all sorts of elaborate contraptions to do what used to be entirely ordinary


By uptime people usually mean availability to the end users, not a literal uptime. Which also includes availability of an entire datacenter infrastructure, connectivity, internet infrastructure, making it pretty much impossible to have high availability in a singe datacenter.


Heh, I guess. In my scenario the users actually got that uptime too, ‘cos they were connected over LAT...


Doesn't do much good if you have to fail out of an entire data center.


You can with VMScluster. There are multi-site clusters with 15+ years uptime.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: