Inference EndpointAn inference endpoint is the serving layer for a trained model. After training (or downloading) an LLM, you need infrastructure to accept requests, run the forward pass, and return outputs at scale. T…Sahil Kapoor's Playbook·May 17·3 min readVllmTokenizationOllamaOpenrouter
Prompt EngineeringPrompt engineering is the discipline of communicating effectively with large language models. Because LLMs are trained to predict plausible continuations of text, how you frame a request has an enormo…Sahil Kapoor's Playbook·May 17·3 min readSystem PromptCursorWindsurfLangchain
vLLMvLLM (Virtual LLM) is an open-source inference engine from UC Berkeley that dramatically increases the throughput of serving large language models on GPU hardware. It was introduced in 2023 with Paged…Sahil Kapoor's Playbook·May 17·3 min readHelmArgocdTraefikNginx
OllamaOllama makes running open-source LLMs as straightforward as running a Docker container. You pull a model, and it starts serving a local REST API that your code can call, no cloud, no API key, no per-t…Sahil Kapoor's Playbook·May 17·3 min readVllmLangchainCursorOpenhands
Logan Paul Accused of Scamming After Selling Pokemon Card for $16.5MLogan Paul completed the sale of his 1998 Pikachu Illustrator Pokemon card during a live auction that captured widespread attention across collectibles circles. The rare Pokemon card, bundled with a d…Bitcoin·Feb 17·4 min readpursuedannouncedpostblocked