The desktop inference engine that runs LLMs on your hardware. OpenAI-compatible API. MCP tools. Ollama model reuse. Free forever.
Built for developers and power users who want full control over their AI stack.
Auto-detects GPU, VRAM, CPU cores, and RAM. Optimizes gpu_layers, threads, and context size automatically.
Drop-in replacement. /v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings. SSE streaming built in.
Already have Ollama models? Darksol finds and runs them directly — no re-download, no daemon required.
Browse, search, and pull GGUF models. Hardware-aware fit indicators tell you what runs before you download.
Connect external tools via Model Context Protocol. CoinGecko, DexScreener, Etherscan, DefiLlama — pre-configured.
Access large cloud models through Bankr LLM Gateway when you need them. Manage API keys securely from the app.
Every local inference is $0.00. Track your usage, tokens processed, and savings vs cloud providers in real time.
Real-time GPU/CPU temperature tracking. Know when your hardware is hot before it throttles your inference.
Enable models to call functions, execute code, and access files. Configurable per-session from the app settings panel.
Free. No account required. Your data never leaves your machine.
Same goal, different approach.
Everything works from the command line too.
No cloud. No subscriptions. No data leaving your machine.