Track

Memory

Window · Summarize · Retrieve · Evict

LLMs have a hard limit on how much text they can process at once — the context window. Once the conversation grows beyond that window, older messages get dropped. This is fine for a single Q&A, but agents often need to maintain context across many turns, or remember facts from earlier in a long session.

Memory systems solve this by selectively deciding what to keep, what to compress, and what to discard. Instead of one fixed window, you build a tiered system: recent messages stay intact, older ones get summarized, and important facts get pinned so they're never evicted.

The Sliding Window

The simplest memory strategy is a sliding window: keep the most recent messages within a token budget, always preserve the system prompt, and drop the oldest non-system messages when the budget is exceeded.

When the budget fills, the oldest messages are evicted. The agent can still see the recent ones plus the system prompt. This keeps memory bounded and predictable.

Beyond Recency

Sliding windows work by recency, but sometimes the most relevant piece of information isn't the most recent. Query-based retrieval lets the agent search memory by meaning rather than time.

The agent stores facts as they accumulate, and when it needs something specific, it queries the store. This turns memory from a simple FIFO queue into a searchable knowledge base.

Concrete Example

Sliding Window Render

class SlidingMemory:
    def __init__(self, max_tokens, count_tokens):
        self.max_tokens = max_tokens
        self.count_tokens = count_tokens
        self.messages = []

    def add(self, role, content):
        self.messages.append({"role": role, "content": content})

    def render(self):
        system = [m for m in self.messages
                  if m["role"] == "system"]
        others = [m for m in self.messages
                  if m["role"] != "system"]

        budget = self.max_tokens
        for m in system:
            budget -= self.count_tokens(m["content"])

        kept = []
        for msg in reversed(others):
            tokens = self.count_tokens(msg["content"])
            if budget - tokens >= 0:
                budget -= tokens
                kept.append(msg)

        return system + list(reversed(kept))

System messages are pinned — they're never evicted. Non-system messages are scanned newest-first; each fits within the remaining budget or is dropped. The result is always within max_tokens while retaining the most recent context.

Key Ideas

Sliding Window

Keep the newest messages within a token budget; drop oldest first.

System Pinning

System prompts are always preserved and never evicted.

Summarization

Compress dropped context into a compact summary before removal.

Query Retrieval

Search memory by semantic relevance, not just recency.

TTL Eviction

Automatically expire facts that are too old, keeping the store fresh.

Problems in this track

6 problems. Sign in to start solving.

TitleDifficultyAcceptanceEst.

Sliding Window Conversation Memory

Keep the last N turns plus a pinned system prompt within a token budget.

Medium62%25m

Summarize Dropped Context

Replace evicted older turns with a summary while keeping the active conversation intact.

Medium52%25m

Deduplicate Repeated Memory Facts

Remove repeated facts keeping the latest occurrence, normalizing before comparison.

Medium58%20m

Retrieve Relevant Memories by Query

Score memories against a query and return top-k matches with deterministic tie-breaking.

Hard40%30m

Pin Important Messages

Support pinned messages which survive pruning and eviction.

Medium56%20m

Evict Old Memories with TTL

Remove items older than a time-to-live threshold using timestamps.

Hard38%30m

The Sliding Window

When the budget fills, the oldest messages are evicted. The agent can still see the recent ones plus the system prompt. This keeps memory bounded and predictable.

Beyond Recency

Sliding windows work by recency, but sometimes the most relevant piece of information isn't the most recent. Query-based retrieval lets the agent search memory by meaning rather than time.

The agent stores facts as they accumulate, and when it needs something specific, it queries the store. This turns memory from a simple FIFO queue into a searchable knowledge base.

Sliding Window Render

class SlidingMemory: def __init__(self, max_tokens, count_tokens): self.max_tokens = max_tokens self.count_tokens = count_tokens self.messages = [] def add(self, role, content): self.messages.append({"role": role, "content": content}) def render(self): system = [m for m in self.messages if m["role"] == "system"] others = [m for m in self.messages if m["role"] != "system"] budget = self.max_tokens for m in system: budget -= self.count_tokens(m["content"]) kept = [] for msg in reversed(others): tokens = self.count_tokens(msg["content"]) if budget - tokens >= 0: budget -= tokens kept.append(msg) return system + list(reversed(kept))

Key Ideas

Sliding Window

Keep the newest messages within a token budget; drop oldest first.

System Pinning

System prompts are always preserved and never evicted.

Summarization

Compress dropped context into a compact summary before removal.

Query Retrieval

Search memory by semantic relevance, not just recency.

TTL Eviction

Automatically expire facts that are too old, keeping the store fresh.