Training Processes
Extending the context windowPyTorch Fully Sharded Data Parallel (FSDP)Train Short, Test Long: Attention with Linear Biases Enables Input Length ExtrapolationYaRN: Efficient Context Window Extension of Large Language ModelsSliding Window AttentionLongRoPEReinforcement LearningAn introduction to reinforcement learningReinforcement Learning from Human Feedback (RLHF)Direct Preference Optimization: Your Language Model is Secretly a Reward Model