What Swarm Does
FabrCore Swarm is a hierarchical multi-agent orchestration system built by the Vulcan365 AI team. It manages three things:
Planning
Decomposes a goal into a directed acyclic graph of tasks, each assigned to the best available agent.
Execution
Dispatches tasks in dependency-resolved waves, passing results forward as context for downstream work.
Recovery
Detects failures, consults experts, retries, skips, or replans mid-execution when the plan hits a wall.
Swarm is not a standalone product. It is a library that sits on top of FabrCore and treats your pre-existing agents as the workforce. You bring the domain experts; Swarm brings the coordination. Client agents do not need to know they are part of a swarm — no SDK to implement, no interface to adopt, no protocol to follow.
The Swarm Constellation
Six permanent agents form the coordination layer. None perform domain work — they plan, delegate, monitor, and recover.
| Agent | Role | Uses LLM? |
|---|---|---|
| Orchestrator | Owns the full plan lifecycle. Receives the user’s goal, gates approval, drives execution, handles failure decisions, and enforces termination policy. | Yes — for failure reasoning |
| Planner | Builds and revises the task plan. Discovers available agents, creates tasks, links dependencies, and optionally consults subject-matter experts before committing. | Yes — for plan construction |
| Supervisor | Manages task dispatch. Receives batches of ready tasks, pushes them to the correct workers, collects results, and escalates roadblocks. | No — pure state machine |
| Workers | One per client agent. Permanently paired with a domain agent. Delegates tasks, validates completion, and manages shared-state subscriptions. | Yes — for delegation and validation |
| Blackboard | Per-plan shared key-value store with push-event subscriptions. Workers read and write results here; downstream tasks consume them as input context. | No — pure state machine |
| Factory | Registry discovery. Exposes the live agent registry so the planner can see what agents are available and what they can do. | No |
How It Works
Phase 1: Planning
When a user submits a goal, the planner queries the live agent registry to discover which agents are online, what capabilities they advertise, what tools they have loaded, and what model they run. Capabilities are projected in real time from class-level metadata and runtime health state — the planner always sees ground truth.
With that information, the planner constructs a task graph: each task has a description, an assigned agent, and optional dependencies. Dependencies are data flows — completed results pass forward as context. Tasks with no unmet dependencies form a wave and execute in parallel.
Once the plan is built, it goes to the user for approval. Nothing executes until the user says go.
Phase 2: Execution
After approval, the orchestrator provisions a per-plan blackboard and starts a recurring drive loop. Each cycle identifies ready tasks, propagates context from upstream results, dispatches through the supervisor to workers, and collects results.
Workers delegate to their paired client agents. The client runs its own tools and logic — completely unaware that it is part of a swarm. Workers then validate that the response actually accomplishes the task, re-delegating if the result is partial or analysis-only.
All of this happens across parallel waves. While one set of tasks executes, the orchestrator is already identifying the next wave.
Phase 3: Recovery
Recovery happens at five levels, escalating only when cheaper options are exhausted:
- Worker re-delegation — if a client returns a partial result, the worker tries again with clearer instructions.
- SME consultation — before escalating, workers and the orchestrator can consult subject-matter expert agents.
- Supervisor escalation — tracks roadblock fingerprints to prevent retrying the same failing approach.
- Orchestrator reasoning — the orchestrator consults its own LLM to decide: retry, skip, replan, or ask the user.
- Human escalation — when automated recovery is exhausted, the user receives a specific, contextualized question.
The orchestrator prefers retries and skips over replanning, and replanning over asking the user — minimizing human interruption while guaranteeing stuck situations get resolved.
Key Capabilities
Zero-Registration Discovery
Client agents register with FabrCore normally. Add the alias to a swarm definition and you’re done. The planner discovers capabilities from live metadata and health state automatically.
DAG Parallel Execution
Tasks form a directed acyclic graph. The dependency resolver partitions the plan into waves that execute in parallel with automatic context propagation between them.
Mid-Execution Replanning
When the original plan fails, the planner receives full context — what succeeded, what failed, why — and revises the task graph. Execution resumes from where it left off.
Completion Validation
Workers validate client responses before completing a task. Partial or analysis-only responses trigger re-delegation with clarified instructions.
Blackboard Coordination
Per-plan shared key-value store with push-event subscriptions. Workers get notified immediately when upstream results are available — no polling.
Full State Persistence
All swarm state persists through Orleans grain persistence. If the host restarts mid-execution, the swarm resumes from where it left off.
Termination Guarantees
Autonomous systems must terminate. Nine configurable guards ensure every plan completes.
| Guard | What It Limits |
|---|---|
| Wall-clock timeout | Total plan execution time |
| Per-task timeout | Individual task duration |
| Task retry limit | Maximum attempts per task |
| Drive-loop iteration limit | Maximum execution cycles per plan |
| Roadblock limit | Maximum total roadblocks before failure |
| Replan attempt limit | Maximum plan revisions |
| Worker delegation guard | Maximum re-delegation attempts per task |
| Completion validation guard | Maximum validation attempts |
| Roadblock fingerprinting | Repeated identical roadblocks are escalated, not retried |
Human-in-the-Loop by Design
Swarm is not fully autonomous. It is designed with explicit human checkpoints.
Plan Approval
No tasks execute until the user reviews and approves the plan.
Task-Level Gates
Individual tasks can be flagged as requiring approval before dispatch.
Roadblock Escalation
When automated recovery fails, the user gets a specific, contextualized question — not a generic error.
Progress Notifications
Throttled status updates keep the user informed without overwhelming them.
Interested in Swarm?
Swarm is experimental and actively evolving. If you’re building multi-agent systems on FabrCore and want to coordinate work across agents at scale, we’d love to talk.