Technical Details

Resiliency

Federation servers in Honse Farm maintain a distributed architecture where each server holds remote copies of critical data including user data, permissions, and syncshells. This distributed model enables the federation to operate without a central authority, but it introduces significant challenges for data consistency. When a server goes offline or becomes unreachable, it misses federation messages that contain essential commands for keeping its data synchronized with the rest of the network.

These missed messages can include user permission changes, syncshell updates, amongst others. Without a mechanism to handle these gaps, servers would gradually drift out of sync when they go offline for things like maintenance or upgrades, leading to inconsistent permission enforcement, outdated user information, and weird behavior in the players' user interface. A server that's been offline for hours or days could return with a stale view of the federation, creating confusion for both server owner and players that see unexpected results. This is why a robust resiliency mechanism is essential for maintaining the integrity and security of the entire federation.

Queueing Mechanism

To address the challenge of maintaining federation consistency, Honse Farm implements a sophisticated message queueing system that ensures critical commands are eventually delivered to all servers, even after temporary outages.

Deferrable Commands

Certain federation messages, called FederatedCommands, are marked as "Deferrable" (should always be sent). These are commands that are critical for maintaining data consistency across the federation. When these commands fail to be delivered to a target server, they are not simply discarded. Instead, the PendingMessageService captures and persists these failed commands along with their current status and error information.

Automatic Retry Logic

The system includes a dedicated IHostedBackgroundService implementation called the PendingMessageRetryService. This background service continuously monitors the queue of pending messages and attempts to redeliver them using an exponential backoff strategy. This means that retry attempts start with short delays and gradually increase the time between attempts, preventing the system from overwhelming a recovering server or wasting resources on a server that remains offline.

Server Announcement Protocol

When a previously offline server comes back online, it sends an announcement to other servers in the federation. Upon receiving this announcement, a server will automatically attempt to send all queued, unprocessed commands to the newly available server. This ensures that returning servers can quickly catch up on missed federation events without manual intervention.

Message Expiration

To prevent the queue from growing indefinitely for servers that may never return, the system implements a 72-hour expiration policy. If a server has not announced itself after 72 hours and messages still cannot be delivered, those messages are removed from the queue. This balances the need for eventual consistency with practical resource constraints.

Eventually Consistent Model

The overall design philosophy follows the "eventual consistency" model from distributed database theory. While the federation may not be perfectly synchronized at every moment, the queueing mechanism guarantees that all servers will converge to the same state given enough time and network stability. This approach provides the best balance between system reliability, performance, and operational simplicity for a decentralized federation architecture.

Previous
Architecture Overview