Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihf4a3bgw5opcm7aapaetmekkvmwwp3bkujbutrw3qd6brgepd6k4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mehrqu6caxk2"
  },
  "path": "/t/anubis-oss-native-macos-app-for-benchmarking-local-llms-with-real-time-hardware-telemetry-free-open-source/173250#post_1",
  "publishedAt": "2026-02-10T00:09:27.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "issue",
    "discussion",
    "buying me a coffee"
  ],
  "textContent": "# Native, Open Source macOS app for benchmarking local LLMs with real-time hardware telemetry - Anubis OSS\n\nI built a free, open-source macOS app for benchmarking local LLMs on Apple Silicon. It correlates real-time hardware telemetry (GPU/CPU power, frequency, memory, thermals via IOReport) with inference performance, with exportable and stored benchmark results - something I couldn’t find in any existing tool. Extra are was taken to make the app and inferences light-weight- the performance hit with metrics on and off was negligible after extensive tuning - but I only have my lowly 24GB M4 Air to test on. Help me make it better for the community!\n\n\n\n\n\n## What it does\n\n  * **Real-time metrics** - tok/s, GPU/CPU utilization, power consumption (watts), GPU frequency, and memory - all charted live during inference\n  * **Any backend** - works with Ollama, `mlx_lm.server`, LM Studio, vLLM, LocalAI, or any OpenAI-compatible endpoint\n  * **A/B Arena** - compare two models side-by-side with the same prompt and vote on a winner\n  * **History & Export** - session history with full replay, CSV export, and one-click image export for sharing results\n  * **Process monitoring** - auto-detects backend processes and tracks their actual memory footprint (including Metal/GPU allocations)\n\n\n\n## Why I hope this might be useful for the HF community\n\n  * **Quantization comparisons** - comparing Q4_K_M vs Q8_0 vs FP16 on your hardware? Anubis shows the actual power/performance tradeoff - not just tok/s but **watts-per-token**\n  * **MLX users** - works with `mlx_lm.server` out of the box. Just start the server and add it as an OpenAI-compatible backend\n  * **Model cards & benchmarks** - if you’re publishing benchmarks for the community, the image export gives you shareable, branded results with one click\n  * **Apple Silicon insights** - per-core CPU utilization, GPU frequency, ANE/DRAM power - hardware data that no chat wrapper or CLI tool surfaces\n\n\n\n## Links\n\n---\n**GitHub**\n**Download**\n**Requirements**\n**License**\n\n## Looking for feedback\n\nI’d especially love to hear from anyone running MLX or GGUF models locally:\n\n  * What metrics matter most to you?\n  * What backends should I prioritize?\n  * What would make this useful for your workflow?\n\n\n\nOpen an issue or start a discussion on the repo. If Anubis is useful to you, consider buying me a coffee",
  "title": "Anubis OSS — native macOS app for benchmarking local LLMs with real-time hardware   telemetry (free, open source)"
}