Experimental

Swarm

Distributed Multi-Agent Task Orchestration

Submit a natural-language goal. Swarm builds a dependency-aware task plan, fans work out across your existing agents in parallel, and adapts in real time when things go wrong.

What Swarm Does

FabrCore Swarm is a hierarchical multi-agent orchestration system built by the Vulcan365 AI team. It manages three things:

Planning

Decomposes a goal into a directed acyclic graph of tasks, each assigned to the best available agent.

Execution

Dispatches tasks in dependency-resolved waves, passing results forward as context for downstream work.

Recovery

Detects failures, consults experts, retries, skips, or replans mid-execution when the plan hits a wall.

Swarm is not a standalone product. It is a library that sits on top of FabrCore and treats your pre-existing agents as the workforce. You bring the domain experts; Swarm brings the coordination. Client agents do not need to know they are part of a swarm — no SDK to implement, no interface to adopt, no protocol to follow.

The Swarm Constellation

Six permanent agents form the coordination layer. None perform domain work — they plan, delegate, monitor, and recover.

AgentRoleUses LLM?
Orchestrator Owns the full plan lifecycle. Receives the user’s goal, gates approval, drives execution, handles failure decisions, and enforces termination policy. Yes — for failure reasoning
Planner Builds and revises the task plan. Discovers available agents, creates tasks, links dependencies, and optionally consults subject-matter experts before committing. Yes — for plan construction
Supervisor Manages task dispatch. Receives batches of ready tasks, pushes them to the correct workers, collects results, and escalates roadblocks. No — pure state machine
Workers One per client agent. Permanently paired with a domain agent. Delegates tasks, validates completion, and manages shared-state subscriptions. Yes — for delegation and validation
Blackboard Per-plan shared key-value store with push-event subscriptions. Workers read and write results here; downstream tasks consume them as input context. No — pure state machine
Factory Registry discovery. Exposes the live agent registry so the planner can see what agents are available and what they can do. No

How It Works

Phase 1: Planning

When a user submits a goal, the planner queries the live agent registry to discover which agents are online, what capabilities they advertise, what tools they have loaded, and what model they run. Capabilities are projected in real time from class-level metadata and runtime health state — the planner always sees ground truth.

With that information, the planner constructs a task graph: each task has a description, an assigned agent, and optional dependencies. Dependencies are data flows — completed results pass forward as context. Tasks with no unmet dependencies form a wave and execute in parallel.

Once the plan is built, it goes to the user for approval. Nothing executes until the user says go.

Phase 2: Execution

After approval, the orchestrator provisions a per-plan blackboard and starts a recurring drive loop. Each cycle identifies ready tasks, propagates context from upstream results, dispatches through the supervisor to workers, and collects results.

Workers delegate to their paired client agents. The client runs its own tools and logic — completely unaware that it is part of a swarm. Workers then validate that the response actually accomplishes the task, re-delegating if the result is partial or analysis-only.

All of this happens across parallel waves. While one set of tasks executes, the orchestrator is already identifying the next wave.

Phase 3: Recovery

Recovery happens at five levels, escalating only when cheaper options are exhausted:

  1. Worker re-delegation — if a client returns a partial result, the worker tries again with clearer instructions.
  2. SME consultation — before escalating, workers and the orchestrator can consult subject-matter expert agents.
  3. Supervisor escalation — tracks roadblock fingerprints to prevent retrying the same failing approach.
  4. Orchestrator reasoning — the orchestrator consults its own LLM to decide: retry, skip, replan, or ask the user.
  5. Human escalation — when automated recovery is exhausted, the user receives a specific, contextualized question.

The orchestrator prefers retries and skips over replanning, and replanning over asking the user — minimizing human interruption while guaranteeing stuck situations get resolved.

Key Capabilities

Zero-Registration Discovery

Client agents register with FabrCore normally. Add the alias to a swarm definition and you’re done. The planner discovers capabilities from live metadata and health state automatically.

DAG Parallel Execution

Tasks form a directed acyclic graph. The dependency resolver partitions the plan into waves that execute in parallel with automatic context propagation between them.

Mid-Execution Replanning

When the original plan fails, the planner receives full context — what succeeded, what failed, why — and revises the task graph. Execution resumes from where it left off.

Completion Validation

Workers validate client responses before completing a task. Partial or analysis-only responses trigger re-delegation with clarified instructions.

Blackboard Coordination

Per-plan shared key-value store with push-event subscriptions. Workers get notified immediately when upstream results are available — no polling.

Full State Persistence

All swarm state persists through Orleans grain persistence. If the host restarts mid-execution, the swarm resumes from where it left off.

Termination Guarantees

Autonomous systems must terminate. Nine configurable guards ensure every plan completes.

GuardWhat It Limits
Wall-clock timeoutTotal plan execution time
Per-task timeoutIndividual task duration
Task retry limitMaximum attempts per task
Drive-loop iteration limitMaximum execution cycles per plan
Roadblock limitMaximum total roadblocks before failure
Replan attempt limitMaximum plan revisions
Worker delegation guardMaximum re-delegation attempts per task
Completion validation guardMaximum validation attempts
Roadblock fingerprintingRepeated identical roadblocks are escalated, not retried

Human-in-the-Loop by Design

Swarm is not fully autonomous. It is designed with explicit human checkpoints.

Plan Approval

No tasks execute until the user reviews and approves the plan.

Task-Level Gates

Individual tasks can be flagged as requiring approval before dispatch.

Roadblock Escalation

When automated recovery fails, the user gets a specific, contextualized question — not a generic error.

Progress Notifications

Throttled status updates keep the user informed without overwhelming them.

Experimental

Interested in Swarm?

Swarm is experimental and actively evolving. If you’re building multi-agent systems on FabrCore and want to coordinate work across agents at scale, we’d love to talk.