MIT licensed · OKF v0.1 conformant · zero LLM required

Stop feeding your AI agent
the whole codebase.

Scans any repo, outputs one markdown file per function, class, module, and dependency. Your agent runs okf lookup <Name> instead of re-reading 600 lines to find one signature. No LLM required to build it.

pip install okf-generator Live Demo ↗ View source ↗

~/my_project

The problem

Reading a whole file to find one function is expensive

Cloud models with huge context windows hide this cost. Local models on a laptop run out of memory immediately.

Before — entire file as context

~14,000

tokens to find one 12-line method

→

After — exact concept only

CLASS: WorldBankConnector
Description: Fetches World Bank
development indicators.
Methods: get_indicator, search
Signature: class WorldBankConnector

~140

tokens. exact answer, zero guessing.

The pipeline

Three commands, one workflow

Generate once. Look up forever. Regenerate whenever the code changes.

01 — scan

okf generate

Tree-sitter and native AST parsers walk the repo and extract every function, class, module, and manifest dependency, with cross-referenced calls and imports.

02 — retrieve

okf lookup

Instant, zero-LLM concept search by name, type, tag, or file. Returns signature, docstring, params, callers, and callees in milliseconds.

03 — integrate

okf install

Wires the bundle into Claude Code, Cursor, Copilot, Windsurf, Cline, or OpenCode so the agent checks the bundle before touching source.

Agent integration

Works with the agent you already use

One command wires the bundle into your agent's rules. One more exposes it over MCP.

MODEL CONTEXT PROTOCOL

Speak MCP? So does the bundle.

Run okf mcp ./okf_bundle and any MCP-compatible client — Claude Desktop, Claude Code, or a custom agent — can query the knowledge graph directly, no CLI wrapper needed.

$ okf mcp ./okf_bundle
MCP server listening on stdio…

okf install claudeClaude Code skill

okf install cursor.cursorrules

okf install opencode/lookup command

okf install copilotCopilot instructions

okf install windsurf.windsurfrules

okf install cline.clinerules

Or set up every detected agent at once: okf install all

Runs on-device too

Built for local SLMs, not just the cloud

Cloud models mask the cost of huge context windows. Local models — Gemma, Llama, Phi — running on a laptop don't have that luxury; feed one the whole repo and it runs out of memory. okf lookup sends a ~50-token query and gets back a ~200-token concept card. No embeddings, no vector DB, no RAG pipeline.

local llama.cpp

OKF_ENRICH=1 \

OKF_BASE_URL="http://localhost:8080/v1" \

OKF_MODEL="gemma-3-4b-it-qat-GGUF:Q4_0" \

$ okf generate ./my_project ./okf_bundle

Coverage

10 languages, 17 manifest formats

Deterministic extraction — no LLM call needed to index a codebase.

Python JavaScript TypeScript Go Java Rust Ruby C C++ C# SQL

pip / Python

requirements.txtpyproject.tomlpoetry.lock

npm / JS

package.jsonyarn.lockpnpm-lock.yaml

cargo / Rust

Cargo.tomlCargo.lock

go / Go

go.modgo.sum

maven & gradle / Java

pom.xmlbuild.gradle

bundler / Ruby

Gemfile

composer / PHP

composer.json

swiftpm / Swift

Package.swift

other

project.cljmix.exs

What's inside

Built for agent workflows, not just documentation

okf_bundle/

├── SUMMARY.md ← bird's-eye view

├── index.md

├── _dependencies/

│ └── pip/npm/cargo/…

└── src/connectors/

├── economic_data.md ← Module

└── economic_data/

├── WorldBankConnector.md

└── get_indicator.md

Layout mirrors your source tree

No flat functions/ / classes/ buckets. Every concept file sits where its source file sits, plus a domain-organized _dependencies/ tree. Diff-friendly, git-friendly, and safe to commit alongside the code it describes.

Zero-LLM extraction

Tree-sitter + AST parsing. Nothing calls an API unless you turn on enrichment.

Cross-reference linker

Imports → dependencies, calls → callers/callees. Resolved across every supported language.

Interactive visualizer

One HTML file. Tree nav + local graphs. Opens in a browser, no server.

Bundle diff

Added / removed / changed concepts between two bundle versions, by content hash.

MCP server

okf mcp — bundle over Model Context Protocol, any client.

Training pairs

Bundle → JSONL. codegen, QA, doc, summarize, crosslink pair types.

Reference

CLI at a glance

okf generate	Scan a codebase and write an OKF v0.1 bundle
okf lookup	Search the bundle — by name, type, tag, or source file
okf init	Interactive bundle setup wizard
okf diff	Compare two bundles: added, removed, changed concepts
okf summarize	Regenerate the bundle's SUMMARY.md map
okf visualize	Generate a self-contained interactive HTML explorer
okf serve	Launch a local server and auto-open the visualization
okf mcp	Expose the bundle over Model Context Protocol
okf pairs	Convert a bundle into JSONL fine-tuning pairs
okf install	Wire up Claude Code, Cursor, Copilot, Windsurf, Cline, or OpenCode

vs. the alternatives

Not RAG. Not embeddings. Exact lookup.

RAG retrieves by semantic similarity — approximate, and it can miss exact symbols. okf indexes real functions, classes, and dependencies by name.

	okf-generator	RAG / vector search	Read whole file
Zero-LLM required	Yes	No — needs embeddings	Yes
Exact symbol match	Yes	Approximate	Yes, if you find it
Vector DB / infra	None needed	Required	None needed
Token cost per query	~140 tokens	Chunk-dependent	Whole file
Works fully offline	Yes	Depends on embedder	Yes
Git-diffable output	Plain markdown	Opaque vectors	N/A

FAQ

Common questions

Does this require an API key or internet connection?

No. Core extraction (okf generate) is fully offline and deterministic — no LLM call is made unless you explicitly enable OKF_ENRICH=1.

What happens if my language isn't supported?

Unsupported files are skipped, not dropped silently — log.md records what was scanned. Adding a language is a self-contained tree-sitter grammar mapping; it's a listed good-first-issue.

Does this work on monorepos or very large codebases?

Yes — the bundle mirrors your source tree, so scanning is linear in file count. For very large repos, scope okf generate to a subdirectory if you only need part indexed.

Is the bundle safe to commit to git?

Yes — that's the intended workflow. Bundles are plain markdown, diff cleanly, and version alongside the code they describe.

Get started

One install, works with any agent

pip install okf-generator

macOS / Linux one-linercurl -fsSL raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash

With LLM enrichmentpip install "okf-generator[llm]"

Stop feeding your AI agentthe whole codebase.