Longform — Long-form writing on the AT Protocol

RLHF (Reinforcement Learning from Human Feedback)

RLHF is the training recipe that turned raw language models (good at predicting text) into aligned assistants (good at following instructions helpfully and safely). It was popularized by InstructGPT (…

Sahil Kapoor's Playbook·May 17·3 min read

#Lora Low Rank Adaptation

RLHF (Reinforcement Learning from Human Feedback)