All posts

The Man Who Kissed Everything

What Three Hours of Chaos Taught Me About My Own Game

6 min read
playtesting
episode-system
ai-narrator
post-mortem
Caption text here
Caption text here

The first external tester got access to Chronicles of Terros before the full debugged build was deployed. He was supposed to test the opening scenario, Midnight Cargo, set in Port Harmony’s docks. Instead he played adversarially for three hours across two sittings, kissed most of the NPCs he met and several objects, accused arriving elves of jewel theft with zero evidence, and asked his shadow companion to kiss both a magical crystal and an elf leader.

He kissed a dwarf foreman on the cheek as a greeting. He kissed a halfling rogue on the cheek to seal a business partnership. He kissed a desperate mother on the cheek to get her to talk. He kissed a magical Crystalbloom and whispered for it to tell him its secrets. He kissed the inside of a crate. He kissed the void.

It was the most valuable testing session I have had.

The Episode System

Chronicles of Terros is designed around episodes. A play session runs for roughly sixty minutes, the narrative wraps to a natural stopping point, and the game compresses what happened into a structured summary before the next episode begins. The AI narrator operates within a short memory window during gameplay because it does not need the entire campaign history. It only needs the current episode. Between episodes, code handles the compression and continuity.

The episode pacing system tracks both wall-clock time and turn count. Around the fifty-minute mark, the AI receives instructions to start steering toward a natural stopping point. At sixty minutes, it gets firmer direction. At sixty-five, it is told to wrap up immediately. At seventy, the system forces the episode to end.

A D&D session at a real table does not run for five hours without a break. The DM reads the room, finds a moment of rest or a cliffhanger, and calls it. The episode system does the same thing, except it cannot read the room, so it uses time and turn count as proxies.

What Happened

The first sitting lasted about thirty-four minutes. The tester came back roughly five hours later for a second sitting that ran for two and a half hours. By the time the second sitting started, the episode pacing system had already escalated to its final stage: four deferrals used, wrap-up urgency set to critical, the episode end event fired.

The game kept going.

The episode end was a one-shot signal. The server sent the “episode is over” payload on a single API response. If the frontend missed that response for any reason, the signal was gone. Every subsequent request was processed as a normal turn because the server only checked whether to fire the episode end. It never checked whether the episode had already ended.

The tester’s second sitting generated roughly two hundred and fifty more turns after the episode was supposed to be over.

The Cascade

Three hundred turns through a system designed for thirty-five produced exactly the failures you would expect.

The AI narrator sees the last few turns and generates the next response. For a sixty-minute episode, this works. The AI does not need to remember turn five at turn thirty because the episode is almost over and the compression system handles continuity.

At turn two hundred, the narrator had no memory of anything before turn one ninety-seven. A character named Mira arrived at turn twenty-eight carrying her unconscious son Rowan in her arms, his skin covered in crystalline growths. Over a hundred turns later, when the tester asked about Rowan, the narrator treated the boy as being at a clinic somewhere else entirely. The tester caught the contradiction and challenged the narrator through the out-of-character system across five separate exchanges, quoting the original text back at it. The narrator defended the wrong answer every time because it could not access the original scene. It was generating from a three-turn window and filling gaps with plausible invention.

The session recap had its own problems. During the second sitting, the tester interrogated an NPC about “the night of August 3rd” and “last Thursday at 8pm.” The game’s world has its own calendar. Neither August nor Thursday exist. The AI accepted the real-world dates without translating them into the game’s calendar, and the recap reproduced them. The prose quality had also drifted into stacked metaphors and overwrought descriptions, because the recap was running through its own generation pipeline, separate from the live narrative system that enforces voice rules.

The Work

Three separate problems traced back to one missing boundary.

The server now checks whether the episode has already ended before processing any action. If it has, every subsequent request returns the episode summary instead of generating a new turn. The frontend receives the signal regardless of whether it caught the original one.

The recap pipeline now runs through the same voice constraints as live narrative. The game world’s calendar is injected into the prompt with actual in-world dates computed from the game state, so the AI has canonical dates to work with instead of accepting whatever the player types.

The game already had a session compression system that summarized older turns into structured memory blocks. The compression was generating good summaries and storing them correctly. Nothing was reading them back. A function existed to format the compressed memories for the AI’s context window. Nothing called it. That function is now connected to both the narrative prompt and the out-of-character response handler.

The Kisser

The tester told me afterwards he was “playing unhinged to see how it handled it.” Every kiss surfaced something. Kissing Thorgrim the dwarf foreman produced a reaction that felt right: bewildered revulsion, wiping his cheek, muttering about “Valenhall decency.” Kissing Finnick produced warmth with boundaries: a faint blush, then a clarification that this was an equal partnership, not a pledge of servitude. Kissing the Crystalbloom produced atmospheric narration that sounded good but resolved nothing mechanically. Asking his shadow companion Murk to kiss the elf leader produced a flat refusal from Murk, who instead watched Finnick and tracked a sound from nearby crates. Murk had his own priorities.

Where the AI handled the chaos well, it handled it because the companion system gave NPCs and companions individual responses grounded in their personalities. Where it handled it badly, it handled it badly because it had lost the thread of what had already happened. The companion reactions held up. The world continuity did not.

The episode pacing system was designed, built, and working. It stopped one function call short of being enforced. Every system behind it degraded on schedule because they were all sized for sixty-minute episodes. When that assumption broke, everything downstream broke with it.

Polite sixty-minute playtesting by someone following the intended path would not have found any of this. Sometimes the most useful tester is the one kissing the inside of a crate.