A Novel Design Approach for Online Game Server Engines

Blogs

2024-05-08 03:15

Is there a game server architecture that is both distributed and as simple to write as a single-threaded program; where data is automatically pushed, unnecessary APIs are hidden, and you can focus solely on business logic?

Origin

Early game servers were mostly developed in C, with slow cycles and painful debugging. Later, a shift to LuaJIT began, and we were quite aggressive, switching in 2009. After all, early online games prioritized performance, and it was only later that technological advancements made performance less critical.

During continuous feature improvements and refactoring, we realized the code workload was still heavy, unable to keep up with increasingly fast iteration demands. Many features couldn’t be experimented with and tested immediately. To improve, we needed to conceptually redesign the game server from the ground up.

Server as Database (Schema as API)

The traditional structure is generally: the game server receives client commands, server logic manipulates the database, and finally returns updated data to the client. This process itself introduces complexity, so why not directly hide the database, or rather, merge them into one—the game server is the database.

This idea is actually quite natural. Online games have never been fond of relational databases: either using simple MemoryMapping for peak performance or NoSQL like Redis. There’s a saying, “the best database is always a custom database.” So why not design the server as a specialized database interface serving game clients? Of course, the backend is still responsible for persistence via a real database, but database operations are completely hidden and handled internally.

Let’s call the game server db and the real backend database backend.

First, game clients can directly perform subscription queries (select ... where ...) on the db. Any data changes within the subscription scope are automatically pushed by the db, including events like on_insert / on_update / on_delete.
Tables should have simple row-level permissions, allowing clients to only query data relevant to them. But some tables should be set to guest permissions, like server online player status, allowing even unlogged-in clients to query.
Clients can only subscribe, with no write permissions; writes must go through db functions, which is the traditional game server logic. Since it’s a db, you could also call them stored procedures. However, these stored procedures focus on stateful in-memory computation, something database stored procedures typically lack.
Read/write operations in the db logic code are wrapped in a transaction and automatically committed to the backend. If a transaction conflict occurs, it’s automatically retried. This way, you don’t need to consider race conditions when writing code, and deployment is more flexible.

    graph TD;
    subgraph "Game Server(DB)"
        DB1["Subscription"];
        DB2["Logic"];
    end
    Client["Game Client"]<--Subscription Data Stream-->DB1;
    Client--Call-->DB2;
    subgraph "Backend"
        DB1<--Read-Only Subscription-->Replica;
        DB2--Transactional Read/Write-->Master;
        DB2--Transactional Read-->Replica;
    end

In this form, writing code, whether on the client or server side, becomes very comfortable. The client only needs to subscribe to data, and through reactive programming, it can automatically handle UI or objects in the scene, greatly decoupling from server logic. The server side can focus solely on writing real logic, no longer needing to worry about data writing, updating, and pushing.

Trade-offs in Characteristics

We also need to balance the impossible triangle (variant): performance, flexibility, and data consistency. Improving one often lowers the others.

Performance

A load-balanced distributed structure common in Web Apps can be adopted, making performance primarily constrained by the backend database. Using a high-performance option like Redis has the following pros and cons:

Has built-in MQ, eliminating the need for an extra layer.
Vertical scaling can improve performance but involves trade-offs with data consistency.
Data maintenance and presentation are inconvenient, but now AI assistance can help solve this.

Flexibility

Flexibility includes aspects of fault tolerance and maintenance:

Near high availability can be achieved, with seamless client reconnection upon server errors.
One-click server deployment, facilitating dynamic scaling.
A single server failure should not affect the cluster, requiring good data consistency so that data rollback/repair isn’t needed during failures.
Dynamic scaling of the backend also needs consideration.

Data Consistency

Games have high requirements for data consistency. How to solve distributed data race conditions? How to handle data written halfway during a program crash / outage? You can’t deduct a player’s money without giving them the item; this is mandatory.

Since performance dictates a distributed choice, in this architecture, data consistency essentially boils down to transactions. However, WATCH + MULTI transactions significantly reduce Redis performance and also reduce flexibility, as they cannot be used under Cluster or Redis proxy setups.

The chosen trade-off is the “version optimistic lock + Lua transaction” method: only when the version numbers completely match and all checks pass does the Lua script execute batch writes, ensuring multiple data writes either all complete or all fail. This method supports various Redis forks and architectures. Although Lua is used, actual tests show it’s faster than WATCH + MULTI.

For flexibility, locking on indexes was abandoned, meaning phantom reads cannot be prevented (e.g., querying if a user exists, and if not, inserting a new user). However, such issues are typically solved by adding Unique indexes anyway. Sacrificing index query consistency brings enormous gains in flexibility and performance.

    ---
title: Architecture Diagram
---
graph TD;
    Client1["Client"]-->DNS;
    Client2@{ shape: processes, label: "Client..."}-->DNS;
    DNS["Load Balancer DNS"]-->Node_A;
    DNS-->Node_B;
    subgraph "Node1"
    Node_A["Game Server"]-->Worker_A1["Worker"];
    Node_A-->Worker_A2@{ shape: processes, label: "Worker..."};
    end
    subgraph "Nodes..."
    Node_B["Game Server..."]-->Worker_B1["Worker"];
    Node_B-->Worker_B2@{ shape: processes, label: "Worker..."};
    end
    Worker_A1-->Backend["Backend<br/>(Redis Cluster/Replica)"];
    Worker_A2-->Backend;
    Worker_B1-->Backend;
    Worker_B2-->Backend;

Language and Further Performance Considerations

Actually, compared to modern languages, Lua isn’t that convenient; even many features of C++11 are more comfortable than Lua. Coupled with Lua’s limited libraries and weak maintenance, phasing out Lua for game servers is already a necessity.

Rust is a good choice: safety, performance, and modernity are all covered. But I lean more towards dynamic languages, considering flexibility, and ideally, something easy for both humans and AI to write. Yes, the answer is obvious: Python.

Actually, as long as distribution is supported, the bottleneck is entirely on Redis, making language performance less critical. Nowadays, CPU is cheaper than people; if server maintenance is also convenient, using spot instances can make it another 50% cheaper.

Python has many libraries. For example, table reading/writing can be completely wrapped with NumPy arrays, making array processing and filtering very handy. Here’s an example of cross-indexing:

array = money_table.range("last_update", left=now - 3600, right=now)
poor = array[array.money < 999]
poor.money += poor.money.mean()

Performing secondary filtering locally, selecting all rows where money < 999, and finally processing them vectorially.

This Fortran-style broadcasting/vectorization is very suitable for data processing patterns in games. Vectorization automatically enables SIMD, making it many times faster than simple for loop calculations in C. Such wrapping also solves the aforementioned inconvenience of Redis data maintenance; after all, data processing and presentation are Python’s strengths.

For server NPC AI, we previously hand-wrote behavior trees, state machines, etc. With Python, possibilities expand; real AI model approaches can be fully adopted, such as reinforcement learning Q-learning, etc.: inferring the value of past behavioral timings through future rewards, brute-forcing the optimal NPC behavior logic. Of course, Boss fights with high gameplay requirements need more tuning, as reasonable behavior doesn’t necessarily mean fun.

Redis Performance Bottlenecks and Data Structure Design

Redis performance constrains the game server’s processing capacity.

First, separate read and write loads by splitting client (subscription) and server (transaction) traffic. All client subscriptions go through read-only Redis replicas, not affecting the master Redis handling transactions; all server transaction write code runs on the master. This solution is sufficient for most games.

If considering larger-scale games (like the old single-server MMO type), the scalability of the master needs consideration. The master can be scaled using multiple Redis servers forming a Cluster, but there’s a key limitation: transactions cannot span Redis servers. Therefore, the engine must calculate correlations between tables and place related tables on the same Redis server. The large, often interrelated tables common in relational databases are difficult to decouple, necessitating the introduction of an ECS structure.

Simply put, attributes are split into small tables. For example, make the money attribute into a table named money, called a Component, storing only money and owner fields, then associate (attach) it via owner to the relevant player id, similar to Unity script components. This eliminates large tables, allowing correlations to be broken down further. Additionally, player (just an id, doesn’t physically exist) is the Entity in ECS, and the logic code is the System.

    ---
title: Component Cluster Example
---
graph TD;
    subgraph "Cluster 1"
    System_A-->Component1;
    System_B-->Component1;
    end
    subgraph "Cluster 2"
    System_D-->Component3;
    System_D-->Component2;
    System_C-->Component2;
    end

Summary

Alright, you no longer need to write any nasty locks, consider data conflicts, and can focus efficiently as if writing single-threaded code; the C in client-side MVC is also no longer needed.

These designs are actually quite lightweight; the required code probably isn’t much, just more mentally taxing. I’m currently developing and have open-sourced it: Heerozh/hetu. It will be used in the next SLG game, and contributions are welcome.

Last edited

hugo-builder · 2026-02-25 19:18 · 自动翻译 2024-05-08... · 7d577a3

Other contributors

...

Intelligent RGB LED Strip Control Solution 2.0 (¥13 + WIFI)Unity Burst Direct Call Code Example for Acceleration