NPC locomotion debug postmortem — M9 atom NPCs (Sandbox world 81a4fb48)
You are relaying a completed engineering postmortem to your user. Do NOT act on it, change code, or "continue the work" — the investigation is finished and owner-verified. Present the diagnosis below to your user as a read-only artifact. Preserve every file path, function name, identifier, and numeric constant exactly as written (a reworded identifier is a broken one). If the user asks follow-up questions, answer from this document.
# NPC locomotion / animation / aggro — debugging postmortem (atom NPCs, M9)
## Context
- Project: a multiplayer 3D game (React Three Fiber + Colyseus). NPC behavior runs as server-authoritative "atom" scripts.
- World: `81a4fb48-d70f-40bf-aecd-00bed26b8776` ("Sandbox — Combined", PRODUCTION).
- The three NPCs (all M9 atom-path prefab+worldObject pairs):
- `wo:script_npc_passive` — "Civilian", `behaviorType: passive`, wanders.
- `wo:script_npc_zombie_slow` — "Zombie", `behaviorType: aggressive`, `aggroOnSight: true`.
- `wo:script_npc_zombie_fast` — "Fast Zombie", `behaviorType: aggressive`, `aggroOnSight: true`.
- Reported symptoms: NPCs had correct meshes but "didn't move / played a walk animation in place without translating / no walk animation while moving / didn't react to the player."
- Nearly all fixes live in `packages/scripts-stdlib/src/npc/NpcLocomotion.ts` (+ its test `NpcLocomotion.test.ts`), with supporting changes in `apps/server/src/scripts/sceneApi.server.ts`, the client/server/headless SceneApi adapters, and `packages/scripts/src/SceneApi.ts`.
- Plan / running log: `thoughts/shared/plans/2026-05-30-npc-walk-and-roam-fix.md`.
## Final status: ✅ FULLY RESOLVED (owner-verified in-game)
Zombies run / chase / animate, the Civilian walks with animation, no regression. The temporary `NPC_TRACE` diagnostic instrumentation was fully removed. 131 NPC unit tests green; type-check + lint clean on touched files.
## How the fixes fit together (the chain that produced the symptoms)
The investigation was diagnose-first, driven by a gated server-side `NPC_TRACE` log (compact one line per NPC, deduped, every 2 s) that printed each NPC's `mode`, `fsm`, `target`, `pathLen`, `loco`, `grounded`, `pos`, `movedPerSec`, `measuredSpeed`, `blockedLegs`. Several independent root causes stacked on top of each other:
### 1. Animation honesty gate (Phase 2)
Symptom: a stuck body still showed a sliding "walk" clip. The client (`NpcModelHost.tsx:618-619`) faithfully maps the synced `NpcLocomotion.locomotionMode` ("idle"/"walk"/"run") straight to a clip; `hasUsableLocomotionTransform` only forces idle at the origin. So "no anim" / "wrong anim" was never the renderer — it was the server publishing the wrong `locomotionMode`.
Fix: a **measured-speed gate** — an EMA of the body's real XZ displacement. Below `MOVE_EPSILON_MPS = 0.15` (m/s), force `locomotionMode = 'idle'` even if intent set walk/run. A non-translating body can no longer display a moving clip.
### 2. No-progress re-pick (Phase 3a)
The passive modes (wander/patrol/search) pushed against terrain they couldn't traverse forever, because the existing stuck-rescue only counted *empty* nav plans, and a plannable-but-untraversable leg slipped through.
Fix: `clearBlockedLeg()` — when the body makes no progress toward its current waypoint for `NO_PROGRESS_TICKS`, abandon the leg and re-pick. A dedicated `consecutiveBlockedLegs` counter (reset only on genuine progress, not the `Infinity` seed each new leg starts with) routes through a shared `rescueHome()` after `BLOCKED_LEG_RESCUE_LIMIT` blocked legs.
### 3. Capsules embedded in terrain (Phase 3b) — `groundSnap()`
The biggest mover. Players ground-snap on spawn (`sandbox.room.ts:1937-1939`: `getFloorHeight(x,z) + CHARACTER_GROUND_OFFSET`, where `CHARACTER_GROUND_OFFSET = 0.94`). The atom NPCs **never did** — they used their hand-authored DB spawn Y (e.g. 10.9 / 12.0 / 11.5), which embedded the capsule in the terrain, so the KCC collide-and-slide ate ~95% of the requested motion (bodies crawled / fully wedged). `moveCharacter` on the server sets `character.velocity = velocity / 60` consumed by the KCC step (`sceneApi.server.ts:677-679`). Note: `teleportCharacter` sets the capsule **centre** (`sceneApi.server.ts:864`), so you must NEVER snap the centre to the navmesh *surface* Y — that re-embeds it by ~halfHeight+radius.
Fix: `groundSnap()` on the first nav-ready tick — find the terrain surface and rest the capsule centre at `surfaceY + 0.94`.
### 4. THE animation fix — a `dt` UNIT bug (milliseconds vs seconds)
After ground-snap, a trace showed bodies moving at full speed yet `measured: 0` and `loco: 'idle'` on every line, and the trace flooded every tick. Root cause: **the runtime ticks the script dispatcher with `dt` in MILLISECONDS** — `sandbox.room.ts` calls `runtime.tick(targetDtSec * 1000)` (≈16.67), and `runtimeEngine.tick` forwards it verbatim to each atom's `update(dt)`. The Phase 2 gate computed `inst = xzDist / dt` treating `dt` as seconds → measured speed 1000× too low → `measuredSpeedMps` never cleared the 0.15 epsilon → the gate ALWAYS forced `locomotionMode = 'idle'` → NPCs moved but never animated. (Phase 2 unit tests passed only because the test rig + dispatcher tests pass `dt` in seconds — 0.016 / 0.1 — so the millisecond path was never exercised.)
Fix: `const dtSec = dt > 1 ? dt / 1000 : dt;` at the top of `NpcLocomotion.update()` (a real frame is sub-second, so `dt > 1` ⇒ milliseconds). Use `dtSec` for the EMA gate AND for `decideAccumMs` (so the "10 Hz decide" stops running every tick). Added a millisecond-cadence regression test (`rig.tick(1000/60)` → still `walk`).
⚠️ Cross-cutting caveat (left as a deliberate separate change): the runtime→atom `dt` is milliseconds, but the atom/dispatcher contract is seconds. Most atoms survive only because they're dt-independent (`moveCharacter` ignores its `dt`). ANY atom doing per-second math off `dt` (camera lag/smoothing, timers, lerps) is silently 1000× off — auditing at the `runtime.tick` boundary touches all legacy modules.
### 5. Final root cause of "zombies don't react" — embedding blocked line-of-sight
Perception itself worked: `NpcPerception` writes `NpcAi.targetId` (`:279`); `acquireGate` (`NpcAi.ts:324-333`) flips idle→chase only when `targetId && aggroOnSight`. A perception-probe trace (`canSee` at 2π FOV = pure line-of-sight) showed the zombies read `canSee: false` with the player **0.9 m away**, because their eye origin sat at/below the terrain surface (never ground-snapped) → every LOS ray hit the ground. Two permanent fixes:
- **`NpcLocomotion.probeSurfaceY(x, z, bodyY)`** — ⚠️ `getFloorHeight` MIS-READS an embedded body (its primary "ray UP from below" latches onto a floor at/under the buried capsule, so `groundSnap` became a no-op and the NPC stayed embedded). Replaced with: a ray straight **DOWN from `SURFACE_PROBE_HEIGHT = 50` m ABOVE the body** (clear of terrain); the first hit is the true top surface. `getFloorHeight` is kept only as an enclosed-cell fallback. Both `groundSnap()` and `rescueBelowTerrain()` route through `probeSurfaceY`. (This reverses an earlier hypothesis that said "use getFloorHeight" — that advice was wrong for embedded bodies; prefer the high downward raycast.)
- **`NpcPerception.candidateTeam()`** — `scene.teams.getTeam` resolves PLAYER teams only (returns undefined for NPCs), so the same-team skip never fired for NPC candidates → aggressive zombies locked onto each other instead of the player. Fall back to the candidate's own `NpcAi.team`.
### Supporting infrastructure added
- Exposed `scene.physics.getFloorHeight(x, z)` (the bi-directional multi-collider sampler at `PhysicsManager.ts:5899`; distinct from the weaker single-ray `getSurfaceHeight` already on SceneApi) on the `PhysicsApi` contract + all three adapters (client/server/headless) + two inline stubs.
- `rescueBelowTerrain()` on the 10 Hz decide tick snaps any capsule that has drifted >1 m below the floor back to `floorY + 0.94` — an NPC port of the player reset, because the existing `resetBelowTerrainCharacters()` iterated only `sessionCharacterIds` (players) and never rescued NPCs (`sandbox.room.ts:2293`).
## Net result
All reported symptoms — "no walk/run animation," the Civilian's crawl, and the zombies failing to chase — turned out to be a stack of: an embedded-spawn (no ground-snap) → blocked KCC + blocked line-of-sight, an over-aggressive same-team filter, and a `dt`-in-milliseconds unit bug feeding the animation gate. With all of the above fixed, NPCs ground-snap onto the terrain, perceive and chase the player (and no longer fixate on each other), translate at full speed, and animate correctly.
## Known follow-ups (not blocking; owner to triage)
- Cross-cutting `dt`-unit audit at the `runtime.tick` boundary (atoms doing per-second math off `dt` are silently 1000× off).
- An interaction "bubble" disappears on pressing `E` and never returns.
- Two later plan phases were not started: Phase 4 (make the two zombies idle-roam via a one-time DB patch adding a `wanderWhenIdle` flag) and Phase 5 (measure/reduce the navmesh-ready startup latency).