#Inference Endpoint

RLHF (Reinforcement Learning from Human Feedback)

RLHF is the training recipe that turned raw language models (good at predicting text) into aligned assistants (good at following instructions helpfully and safely). It was popularized by InstructGPT (…

Sahil Kapoor's Playbook·May 17·3 min read

OpenRouter

A unified API gateway for large language models that lets you call 100+ LLMs from different providers through a single OpenAI-compatible endpoint with automatic fallback and cost routing.

Sahil Kapoor's Playbook·May 17·2 min read

Ollama Vllm Inference Endpoint Langchain

vLLM

vLLM (Virtual LLM) is an open-source inference engine from UC Berkeley that dramatically increases the throughput of serving large language models on GPU hardware. It was introduced in 2023 with Paged…

Sahil Kapoor's Playbook·May 17·3 min read

Helm Argocd Traefik Nginx

Ollama

Ollama makes running open-source LLMs as straightforward as running a Docker container. You pull a model, and it starts serving a local REST API that your code can call, no cloud, no API key, no per-t…

Sahil Kapoor's Playbook·May 17·3 min read

Vllm Langchain Cursor Openhands