我们在单个虚拟机上处理数百万个请求

GameTorch
发布时间:2025-07-04 12:29:10    浏览次数:0
How We Serve Millions of Requests on a Single VM
我们如何在单个VM上满足数百万要求

Here's how we serve millions of requests each month on a single, standard VM on Google Cloud. First, I'll provide a concise Table of Contents. Then, we'll go in-depth about each point and how it is applied to the practice of software engineering concretely.
这是我们每个月在Google Cloud上单个标准VM上提供数百万要求的方式。首先,我将提供简洁的目录。然后,我们将深入了解每个点以及如何将其应用于软件工程的实践。

Tried and True Technologies I/O, I/O, I/O: Aggressive Caching >500 ms Necessitates Async Flat Data Models
经过尝试的真实技术I/O,I/O,I/O:激进的缓存> 500 ms需要异步平面数据模型

Tried and True Technologies
久经考验的技术

This part of the post is probably the most boring. You can skip it if you already know what it's about. The point here is simple: save a ton of time and headache by using technologies that have already been battle-tested in the most extreme rivers of internet traffic.
帖子的这一部分可能是最无聊的。如果您已经知道它是什么,则可以跳过它。这里的重点很简单:通过使用已经在互联网流量最极端的河流中进行战斗测试的技术节省大量时间和头痛。

You are highly unlikely to encounter legitimate bugs in these pieces of software. If you do stumble upon an error message or confusing behaviour, chances are thousands of other developers have faced the same thing and posted about it online. Getting stuck is virtually impossible with this set of tools. And if you ever reach the point where you do need to look under the hood and do some real engineering work, you are already serving truly Herculean amounts of traffic and your company should have more than enough revenue to cover those engineering costs.
在这些软件中,您极不可能遇到合法的错误。如果您确实偶然发现了错误消息或令人困惑的行为,那么其他成千上万的开发人员可能会面对同样的事情并在线发布。这组工具几乎不可能被卡住。而且,如果您需要在引擎盖下进行一些真正的工程工作的地步,那么您已经提供了真正巨大的流量,并且您的公司应该拥有足够的收入来支付这些工程成本。

Our stack is comprised almost entirely of tried and true technologies: Nginx, Redis, PostgreSQL, Debian, and Flask. Only two components in our stack break this theme: the web-servers behind Nginx are Rust Rocket servers and all the HTML is produced with the server-side rendering framework Maud. We only use Flask for internal services.
我们的堆栈几乎完全由久经考验的技术组成:Nginx,redis,Postgresql,Debian和Flask。我们堆栈中只有两个组件中断了这个主题:Nginx背后的网络服务器是Rust Rocket服务器,所有HTML都是使用服务器端渲染框架Maud生产的。我们仅将烧瓶用于内部服务。

I/O, I/O, I/O: Aggressive Caching
i/o,i/o,i/o:激进的缓存

The first time anyone formally learns about performance measurement and optimization, they are almost always told that input output (I/O) is the most important thing to consider, generally speaking. This is true. While you should always take an empirical approach and measure first, optimize second,"IO is the bottleneck" is a very useful guess to have in your back pocket.
通常,任何人首次正式了解性能衡量和优化时,几乎总是被告知输入输出(I/O)是最重要的事情。这是真实的。虽然您应该始终采用经验方法并首先进行测量,但优化第二,“ IO是瓶颈”是一个非常有用的猜测。

But what exactly do we mean by I/O in this context? Starting at the highest, most zoomed-out level, consider the flow of data in your application. Maybe it starts as a request on your user's mobile device, makes its way to your server, then your server hits a database somewhere to authenticate the user, and then your server fires off a request to an AWS or GCP bucket to fetch some data and return it to the user. What's the I/O here? At the highest level, it's the network requests. This is intensive input and output: data leaves some computer, be it the user's mobile phone or your server, and travels over the internet to another server somewhere, either yours or the blackbox of AWS or GCP. These network requests take time. The fastest ones can be quick, maybe 15 milliseconds, which is essentially unnoticeable by users. But the slowest ones can take 500 or more milliseconds, easily noticeable by the majority of people.
但是,在这种情况下,I/O到底是什么意思?从最高,大多数缩放级别开始,请考虑应用程序中的数据流。也许它是在用户的移动设备上的请求开始的,进入您的服务器,然后您的服务器访问数据库以对用户进行身份验证,然后您的服务器向AWS或GCP存储桶发出请求,以获取一些数据并将其返回给用户。这里的I/O是什么?在最高级别,这是网络请求。这是密集的输入和输出:数据留出了一些计算机,无论是用户的手机还是您的服务器,并且可以通过Internet转移到您的某个地方,无论是您的AWS还是AWS或GCP的BlackBox。这些网络请求需要时间。最快的可能是快速的,也许是15毫秒,用户本质上没有引起人们的注意。但是最慢的人可能需要500毫秒或更多的毫秒,大多数人很容易引起人们的注意。

So what can we do here? We can use aggressive caching to avoid making certain network requests altogether. Cloud buckets can be slow, especially depending on their configuration. Most of the time, the data stored in these buckets is static and immutable — it doesn't change. This means that if we run the request to fetch data from the bucket once, then we could store it in memory on our server and just read it again from memory later instead of having to go over the network again. We just saved 500+ milliseconds in a lot of cases! Fantastic!
那我们在这里做什么呢?我们可以使用积极的缓存来避免完全提出某些网络请求。云桶可能会很慢,尤其是取决于其配置。在大多数情况下,存储在这些存储桶中的数据是静态且不可变的 - 不会改变。这意味着,如果我们运行一次从存储桶中获取数据的请求,那么我们可以将其存储在服务器上的内存中,然后稍后再从内存中再次读取它,而不必再次浏览网络。在很多情况下,我们刚刚节省了500毫秒!极好的!

There are two final points I want to make here. First, programmers should be aware of cache invalidation bugs, which are notoriously destructive and easy to let infest your programs. Remember this quote:
我想在这里提出两个最后一点。首先,程序员应意识到缓存无效的错误,众所周知,这些错误是破坏性的,易于让您的程序感染。记住这句话:

There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors.
计算机科学中只有两件事:缓存无效,命名事物和逐个错误。

— Phil Karlton
- 菲尔·卡尔顿(Phil Karlton)

Second, programmers should be aware of the really cool fact that this idea applies to all levels of I/O, not just network requests. Most classical computers are composed CPUs, RAM, and hard disks. Those pieces of hardware were listed in increasing order of latency — it takes longer to hit RAM than the registers of your CPU, and it takes longer to hit a hard disk than RAM. That's a hierarchy of access latencies to which we can apply a hierarchy of caching. Furthermore, each of those pieces of hardware have a hierarchy of caches within them! Isn't that cool?!
其次,程序员应该意识到这一想法适用于I/O的所有级别,而不仅仅是网络请求。大多数古典计算机都是CPU,RAM和硬盘。这些硬件的延迟顺序越来越多 - 击中RAM的时间比CPU的登记板需要更长的时间,并且击中硬盘要比RAM所需的时间更长。这是我们可以应用缓存层次结构的访问潜伏期的层次结构。此外,这些硬件中的每一块都有库中的层次结构!那不是很酷吗?

>500 ms Necessitates Async
> 500 ms需要异步

I follow the general rule of thumb that users should not have to wait more than 500 milliseconds (half a second) for a response from a web server. Sometimes though, things take longer than that. Even if they do, this rule still applies. So what's the solution? Tell the user that their request is being worked on and tell them that within the 500 millisecond rule.
我遵循一般的经验法则,用户不必等待超过500毫秒(半秒)才能获得Web服务器的响应。有时候,事情要花费的时间更长。即使他们这样做,此规则仍然适用。那么解决方案是什么?告诉用户他们的请求正在处理,并告诉他们在500毫秒规则内。

To implement this in practice, you need to use some form of asynchronous programming. What we do is push things into queues. For example, we generate a lot of spritesheets:
为了实践,您需要使用某种形式的异步编程。我们要做的就是将事情推入排队。例如,我们生成很多喷雾:

→ walking to the left
→向左走

When a user generates an animation from a sprite and a prompt, they can expect to see that their animation has started generating within 500 ms. In practice, this number is usually as low as 50 ms, but it depends on their network connection, of course. We just push the job into a redis queue and then, on the other side of the redis queue, a separate worker binary pulls out the job and begins working on it. When that job is done, we update the database to mark the job as done and the next polling request updates the web page for the user!
当用户从精灵和提示中生成动画时,他们可以期望看到他们的动画已经在500毫秒内开始生成。实际上,这个数字通常低至50 ms,但当然取决于他们的网络连接。我们只是将工作推入了Redis队列,然后在Redis队列的另一侧,一个单独的工人二进制拉出工作并开始努力。完成该作业后,我们更新数据库以将作业标记为完成,下一个轮询请求会为用户更新网页!

Flat Data Models
平面数据模型

How you model your data affects your engineering speed as much as it affects your application’s speed. Keeping your data model simple, flat, and reproducible means you can write new code and debug and fix old code at a faster pace. It also often translates to performance gains for numerous reasons. Your database queries are less complicated. Your code lends itself more to free hardware gains: you can take advantage of caching boosts enabled by memory alignment; more of your code can implicitly avoid copies and sometimes even leverage “zero copy” — structs can be read directly from memory (for example, this could be in a network buffer, file, some IPC buffer, etc.) instead of being copied into the format that your program uses, because these are one in the same.
您的数据模型如何影响您的工程速度,并影响您的应用程序速度。保持数据模型简单,平坦和可重现意味着您可以以更快的速度编写新代码,调试并修复旧代码。由于许多原因,它也经常转化为性能提高。您的数据库查询不那么复杂。您的代码更多地借给了硬件收益:您可以利用内存对齐来启用的缓存提升;更多的代码可以隐式避免副本,有时甚至利用“零副本” - 可以直接从内存中读取结构(例如,这可以在网络缓冲区,文件,某些IPC缓冲区等中),而不是被复制到您程序使用的格式中,因为这些是在同一中。

This is where my most controversial opinion enters the article: ORMs are harmful and should be avoided nowadays. ORMs serve the main purpose of allowing you to define all of your types in your program and avoid forcing you to write any database queries yourself. This can help the pace of development because it enforces a single source of truth for your data models. But it can lead to disaster because it obfuscates away complexity which ought to be looked at by a competent programmer. You can easily write code that ORMs will translate into hideously complex, sometimes bug-ridden database queries. Often times this code doesn’t have a correctness bug, but often times it does have a performance bug. You would then have to write some other incantation to reveal hideous monster of a query whose performance is not satisfactory and manually intervene. Nowadays, we have the technology to check your SQL queries at compile time. We also have the technology to instantly write all the boilerplate involved in adding a new column to database table, the new field to the corresponding struct, and updating all the database calls in your program. Because we have compile time guarantees about your database queries, even if the boilerplate-writer screws up, that’s okay; your code won’t ship if it’s broken. For an example of such a setup, see the sqlx rust crate.
这是我最有争议的意见进入文章的地方:ORM有害,应避免。ORMS的主要目的是允许您定义程序中的所有类型,并避免强迫您自己编写任何数据库查询。这可以帮助开发的速度,因为它为您的数据模型执行了一个真实的来源。但这可能导致灾难,因为它使胜任的程序员应对的复杂性陷入困境。您可以轻松地编写ORM会转化为丑陋的复杂,有时是错误的数据库查询的代码。通常,此代码没有正确的错误,但是通常情况下它确实具有性能错误。然后,您将不得不写一些其他咒语,以揭示一个可怕的查询怪物,其性能并不令人满意和手动干预。如今,我们有技术可以在编译时检查您的SQL查询。我们还拥有一项技术,可以立即编写在数据库表中添加新列的所有样板,将新字段添加到相应的结构,并更新程序中的所有数据库调用。因为我们有编译时间保证您的数据库查询,即使样板编写者搞砸了,也可以。如果您的代码破裂,您的代码将不会发货。有关此类设置的示例,请参见SQLX Rust Crate。

Flat data models categorically avoid mistakes that cost engineering time and program wall time. They also implicitly take advantage of decades of hardware boosts engineered by PhDs and leading members of industry. They enable us to serve millions of requests each month on a single, standard VM.
平坦的数据模型明确地避免了使工程时间和程序墙时间的错误。他们还隐含地利用了数十年来由博士和行业领先成员设计的硬件提升。它们使我们能够每月在单个标准VM上提供数百万个请求。

Review & Conclusion
审查与结论

These four standard techniques—battle-tested tech, aggressive caching, asynchronous work for anything >500 ms, and flat data models—let us serve tens of millions of requests per month on a single, standard VM without breaking a sweat.
这四种标准技术 - 经过胜过的技术,激进的缓存,对任何> 500毫秒> 500毫秒的异步工作以及平坦的数据模型 - 使我们每月在单一的标准VM上提供数千万的请求,而无需汗水。

— Tom, Creator of GameTorch
- 配子的创造者汤姆

最新文章

热门文章