[Endorsement Request] Halfway Speculative Decoding: Direct Acceptance Rate Optimization with Joint Drafter-Target Training
Hi everyone,
I’m looking for an arXiv endorsement for cs.CL to submit my paper on speculative decoding.
Paper: Halfway Speculative Decoding: Direct Acceptance Rate Optimization with Joint Drafter-Target Training
Abstract: In speculative decoding, direct optimization of the acceptance rate through LK losses has proven to be more effective than the standard approach of training the draft head through KL divergence minimization against the target’s output distribution, as the acceptance rate is what ultimately governs inference speedup. However, prior approaches have either jointly trained the draft head and target model without directly targeting acceptance rate, or optimized the acceptance rate on the draft head alone while keeping the target frozen. We propose Halfway Speculative Decoding, which extends LK loss optimization to both models jointly, using cross-entropy loss to regularize the target model and prevent quality degradation. We study how speculation metrics and target quality scale with training budget by training separate model instances on increasing numbers of training examples and evaluating across several benchmarks.
PDF: https://sergiu-nistor.com/assets/publications/Halfway_Speculative_Decoding.pdf
Endorsement link: https://arxiv.org/auth/endorse?x=TTNLCR
Happy to share more details or discuss the work. Thank you!
Discussion in the ATmosphere