Worker Resiliency & Debouncing

Struktural utilizes a fleet of asynchronous background workers (implementing .NET's BackgroundService) to handle high-throughput operations. These workers guarantee At-Least-Once Delivery and Eventual Consistency while protecting the primary SQL database from connection pool exhaustion.

1. The Error Flush Service (Circuit Breaking)

When a massive scheduled workflow triggers a Fan-Out (e.g., processing 100,000 invoices), a poorly written C# script or a downed external API could cause all 100,000 executions to fail simultaneously.

If the engine attempted to log 100,000 failure traces synchronously to the Struktural_Sys_WorkflowInstance table, it would cause a database deadlock and take down the entire application.

The Channel Debouncer

Struktural mitigates this using the ErrorFlushService:

  1. Buffering: When a workflow node fails, instead of writing to the DB directly, the thread pushes the error payload into an unbounded, thread-safe System.Threading.Channels.Channel.
  2. Debounce Window: The ErrorFlushService reads from this channel, waiting for either 100 errors to accumulate or a 2-second time window to pass.
  3. Bulk Upsert: It groups the errors by Tenant (AppId) and performs a single, highly optimized bulk upsert into the database.

Deferred Acknowledgments (ACKs)

To guarantee that no error logs are lost if the pod crashes before the buffer flushes, the worker passes a TaskCompletionSource into the channel. The original Kafka/DB polling thread awaits this completion source. The message is only acknowledged (and the Kafka offset advanced) after the ErrorFlushService successfully commits the batch to the database.

2. Workflow Orchestration Workers

Workflow processing is split into two specialized services to balance immediate execution with long-term wait states.

Workflow Worker Service (PUSH Model)

The WorkflowWorkerService acts as the primary consumer.

Workflow Resume Service (PULL Model)

Workflows that hit a WaitByDuration or WaitUntilDate node are suspended to disk (Status = "Suspended").

3. ACL Propagation Worker (Eventual Consistency)

Struktural supports Hierarchical Row-Level Security (Materialized ACLs). For example, if an Administrator revokes a user's access to a Root Folder, that revocation must cascade down to the thousands of Documents inside that folder.

Performing this recalculation synchronously during the HTTP PUT request would block the user's browser for minutes.

Instead, the system relies on the AclPropagationWorker:

  1. Event Publishing: When the parent entity is modified, the WorkflowEventProcessor detects the change in the Foreign Key and publishes a RecalculateAcl event to the struktural-acl-{appId} topic.
  2. Fan-Out: The worker picks up the event, recalculates the effective security string for the parent, and then recursively queues update events for all child records.
  3. Eventual Consistency: The UI returns immediately. The child records update asynchronously in the background. Until the background process finishes, the old security rules remain in effect.