Worker Resiliency & Debouncing
Struktural utilizes a fleet of asynchronous background workers (implementing .NET's BackgroundService) to handle high-throughput operations. These workers guarantee At-Least-Once Delivery and Eventual Consistency while protecting the primary SQL database from connection pool exhaustion.
1. The Error Flush Service (Circuit Breaking)
When a massive scheduled workflow triggers a Fan-Out (e.g., processing 100,000 invoices), a poorly written C# script or a downed external API could cause all 100,000 executions to fail simultaneously.
If the engine attempted to log 100,000 failure traces synchronously to the Struktural_Sys_WorkflowInstance table, it would cause a database deadlock and take down the entire application.
The Channel Debouncer
Struktural mitigates this using the ErrorFlushService:
- Buffering: When a workflow node fails, instead of writing to the DB directly, the thread pushes the error payload into an unbounded, thread-safe
System.Threading.Channels.Channel. - Debounce Window: The
ErrorFlushServicereads from this channel, waiting for either 100 errors to accumulate or a 2-second time window to pass. - Bulk Upsert: It groups the errors by Tenant (
AppId) and performs a single, highly optimized bulk upsert into the database.
Deferred Acknowledgments (ACKs)
To guarantee that no error logs are lost if the pod crashes before the buffer flushes, the worker passes a TaskCompletionSource into the channel. The original Kafka/DB polling thread awaits this completion source. The message is only acknowledged (and the Kafka offset advanced) after the ErrorFlushService successfully commits the batch to the database.
2. Workflow Orchestration Workers
Workflow processing is split into two specialized services to balance immediate execution with long-term wait states.
Workflow Worker Service (PUSH Model)
The WorkflowWorkerService acts as the primary consumer.
- It subscribes reactively to the Tenant's Event Bus (
struktural-wf-{appId}). - It continuously monitors the
EngineBootstrapperto see if new tenants are created or hot-reloaded. If a tenant comes online, it instantly spins up a subscription. If a tenant is deleted or degraded, it cancels the subscription cleanly to free up resources. - It handles execution routing, dispatching the payloads into the
WorkflowDispatcher.
Workflow Resume Service (PULL Model)
Workflows that hit a WaitByDuration or WaitUntilDate node are suspended to disk (Status = "Suspended").
- The
WorkflowResumeServicewakes up every 30 seconds to poll theStruktural_Sys_WorkflowInstancetable. - It looks for instances where
ResumeAt <= DateTime.UtcNow. - Rather than executing them directly, it simply pushes their IDs back into the Event Bus, allowing the
WorkflowWorkerServiceto pick them up securely and distribute the load across multiple cluster nodes.
3. ACL Propagation Worker (Eventual Consistency)
Struktural supports Hierarchical Row-Level Security (Materialized ACLs). For example, if an Administrator revokes a user's access to a Root Folder, that revocation must cascade down to the thousands of Documents inside that folder.
Performing this recalculation synchronously during the HTTP PUT request would block the user's browser for minutes.
Instead, the system relies on the AclPropagationWorker:
- Event Publishing: When the parent entity is modified, the
WorkflowEventProcessordetects the change in the Foreign Key and publishes aRecalculateAclevent to thestruktural-acl-{appId}topic. - Fan-Out: The worker picks up the event, recalculates the effective security string for the parent, and then recursively queues update events for all child records.
- Eventual Consistency: The UI returns immediately. The child records update asynchronously in the background. Until the background process finishes, the old security rules remain in effect.