Stop feeding your AI agent
the whole codebase.
Scans any repo, outputs one markdown file per function, class, module, and dependency.
Your agent runs okf lookup <Name> instead of re-reading 600 lines to find
one signature. No LLM required to build it.
Reading a whole file to find one function is expensive
Cloud models with huge context windows hide this cost. Local models on a laptop run out of memory immediately.
Description: Fetches World Bank
development indicators.
Methods: get_indicator, search
Signature: class WorldBankConnector
Three commands, one workflow
Generate once. Look up forever. Regenerate whenever the code changes.
okf generate
Tree-sitter and native AST parsers walk the repo and extract every function, class, module, and manifest dependency, with cross-referenced calls and imports.
okf lookup
Instant, zero-LLM concept search by name, type, tag, or file. Returns signature, docstring, params, callers, and callees in milliseconds.
okf install
Wires the bundle into Claude Code, Cursor, Copilot, Windsurf, Cline, or OpenCode so the agent checks the bundle before touching source.
Works with the agent you already use
One command wires the bundle into your agent's rules. One more exposes it over MCP.
Speak MCP? So does the bundle.
Run okf mcp ./okf_bundle and any MCP-compatible client — Claude Desktop, Claude Code, or a custom agent — can query the knowledge graph directly, no CLI wrapper needed.
MCP server listening on stdio…
okf install claudeClaude Code skillokf install cursor.cursorrulesokf install opencode/lookup commandokf install copilotCopilot instructionsokf install windsurf.windsurfrulesokf install cline.clinerules
Or set up every detected agent at once: okf install all
Built for local SLMs, not just the cloud
Cloud models mask the cost of huge context windows. Local models — Gemma, Llama, Phi — running
on a laptop don't have that luxury; feed one the whole repo and it runs out of memory.
okf lookup sends a ~50-token
query and gets back a ~200-token concept card. No embeddings, no vector DB, no RAG pipeline.
10 languages, 17 manifest formats
Deterministic extraction — no LLM call needed to index a codebase.
requirements.txtpyproject.tomlpoetry.lock
package.jsonyarn.lockpnpm-lock.yaml
Cargo.tomlCargo.lock
go.modgo.sum
pom.xmlbuild.gradle
Gemfile
composer.json
Package.swift
project.cljmix.exs
Built for agent workflows, not just documentation
Layout mirrors your source tree
No flat functions/ /
classes/ buckets.
Every concept file sits where its source file sits, plus a domain-organized
_dependencies/ tree.
Diff-friendly, git-friendly, and safe to commit alongside the code it describes.
Zero-LLM extraction
Tree-sitter + AST parsing. Nothing calls an API unless you turn on enrichment.
Cross-reference linker
Imports → dependencies, calls → callers/callees. Resolved across every supported language.
Interactive visualizer
One HTML file. Tree nav + local graphs. Opens in a browser, no server.
Bundle diff
Added / removed / changed concepts between two bundle versions, by content hash.
MCP server
okf mcp — bundle over Model Context Protocol, any client.
Training pairs
Bundle → JSONL. codegen, QA, doc, summarize, crosslink pair types.
CLI at a glance
| okf generate | Scan a codebase and write an OKF v0.1 bundle |
| okf lookup | Search the bundle — by name, type, tag, or source file |
| okf init | Interactive bundle setup wizard |
| okf diff | Compare two bundles: added, removed, changed concepts |
| okf summarize | Regenerate the bundle's SUMMARY.md map |
| okf visualize | Generate a self-contained interactive HTML explorer |
| okf serve | Launch a local server and auto-open the visualization |
| okf mcp | Expose the bundle over Model Context Protocol |
| okf pairs | Convert a bundle into JSONL fine-tuning pairs |
| okf install | Wire up Claude Code, Cursor, Copilot, Windsurf, Cline, or OpenCode |
Not RAG. Not embeddings. Exact lookup.
RAG retrieves by semantic similarity — approximate, and it can miss exact symbols. okf indexes real functions, classes, and dependencies by name.
| okf-generator | RAG / vector search | Read whole file | |
|---|---|---|---|
| Zero-LLM required | Yes | No — needs embeddings | Yes |
| Exact symbol match | Yes | Approximate | Yes, if you find it |
| Vector DB / infra | None needed | Required | None needed |
| Token cost per query | ~140 tokens | Chunk-dependent | Whole file |
| Works fully offline | Yes | Depends on embedder | Yes |
| Git-diffable output | Plain markdown | Opaque vectors | N/A |
Common questions
Does this require an API key or internet connection?
No. Core extraction (okf generate) is fully offline and deterministic — no LLM call is made unless you explicitly enable OKF_ENRICH=1.
What happens if my language isn't supported?
Unsupported files are skipped, not dropped silently — log.md records what was scanned. Adding a language is a self-contained tree-sitter grammar mapping; it's a listed good-first-issue.
Does this work on monorepos or very large codebases?
Yes — the bundle mirrors your source tree, so scanning is linear in file count. For very large repos, scope okf generate to a subdirectory if you only need part indexed.
Is the bundle safe to commit to git?
Yes — that's the intended workflow. Bundles are plain markdown, diff cleanly, and version alongside the code they describe.
One install, works with any agent
pip install okf-generator
curl -fsSL raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bashpip install "okf-generator[llm]"