#Tokenization

Inference Endpoint

An inference endpoint is the serving layer for a trained model. After training (or downloading) an LLM, you need infrastructure to accept requests, run the forward pass, and return outputs at scale. T…

Sahil Kapoor's Playbook·May 17·3 min read

Prompt Engineering

Prompt engineering is the discipline of communicating effectively with large language models. Because LLMs are trained to predict plausible continuations of text, how you frame a request has an enormo…

Sahil Kapoor's Playbook·May 17·3 min read

System Prompt Cursor Windsurf Langchain

vLLM

vLLM (Virtual LLM) is an open-source inference engine from UC Berkeley that dramatically increases the throughput of serving large language models on GPU hardware. It was introduced in 2023 with Paged…

Sahil Kapoor's Playbook·May 17·3 min read

Helm Argocd Traefik Nginx

Ollama

Ollama makes running open-source LLMs as straightforward as running a Docker container. You pull a model, and it starts serving a local REST API that your code can call, no cloud, no API key, no per-t…

Sahil Kapoor's Playbook·May 17·3 min read

Vllm Langchain Cursor Openhands

Logan Paul Accused of Scamming After Selling Pokemon Card for $16.5M

Logan Paul completed the sale of his 1998 Pikachu Illustrator Pokemon card during a live auction that captured widespread attention across collectibles circles. The rare Pokemon card, bundled with a d…

Bitcoin·Feb 17·4 min read

pursued announced post blocked