Track

RAG

Chunk · Embed · Rerank · Ground

An LLM knows what it was trained on — and nothing else. Ask it about recent events, proprietary documents, or your own codebase, and it will either guess wrong or say it doesn't know. Retrieval-Augmented Generation (RAG) solves this by giving the LLM access to external data at query time.

The idea is simple: when a question comes in, search a knowledge base for relevant documents, stuff them into the LLM's context window, and let the LLM answer from that material. The LLM doesn't need to know the answer — it just needs to read.

The Retrieval Pipeline

RAG follows a series of stages: Query → Retrieve → Rerank → Ground → Answer.

The user's question is first optionally rewritten for better search recall. Then we search a document store for candidate chunks using keyword or vector search. A second-stage relevance model reranks the candidates, putting the most relevant ones first. The top chunks are formatted with citations so the LLM can reference them. Finally, the LLM reads the grounded context and produces an answer with source attribution.

Reranking for Precision

Initial retrieval (especially from vector search) returns candidates ranked by similarity, but the top results aren't always the most relevant. A reranker applies a more expensive, more accurate model to re-score the top-K candidates.

Score every candidate, sort by score descending, take the top N. The reranker looks at the full query-chunk pair together, giving more accurate relevance judgments than embedding similarity alone.

Concrete Example

Chunking Documents

def chunk_text(text, chunk_size=500, overlap=50):
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = start + chunk_size
        chunk = words[start:end]
        chunks.append(" ".join(chunk))
        start = end - overlap
    return chunks

def rerank(query, candidates, reranker, top_n):
    scored = [(reranker(query, c["text"]), c)
              for c in candidates]
    scored.sort(key=lambda x: x[0], reverse=True)
    return [c for _, c in scored[:top_n]]

Chunking splits documents into overlapping pieces so no information falls at boundaries. Reranking scores each candidate with a cross-encoder, sorts by relevance, and returns the top N. Together they ensure the LLM gets the most relevant content within its window.

Key Ideas

Chunking

Split documents into overlapping pieces that fit the LLM's context window.

Embedding Search

Convert chunks and queries to vectors, find nearest neighbors by similarity.

Reranking

Apply a second-stage relevance model to sharpen precision at the top of results.

Query Rewriting

Reformulate vague queries for better retrieval recall.

Grounding

Include retrieved chunks with source citations for attributable answers.

Problems in this track

6 problems. Sign in to start solving.

TitleDifficultyAcceptanceEst.

Rerank Retrieved Chunks

Given top-k vector hits and a reranker, return the top-n most relevant.

Medium54%30m

Split Documents into Overlapping Chunks

Split long text into overlapping chunks for retrieval-friendly document processing.

Medium59%20m

Build Retrieval Context with Citations

Combine retrieved chunks into a context block with source citations.

Medium61%20m

Rewrite Queries for Better Recall

Expand a user query with retrieval-friendly terms while keeping the original intent.

Hard42%30m

Merge Hybrid Search Results

Merge ranked results from lexical and vector search into a single deduplicated list.

Hard44%30m

Filter Retrieved Chunks for Relevance

Keep only the most relevant chunks by applying a minimum score threshold.

Medium57%20m