Breaking Changes
- T5Gemma2 model structure (#43633) β Attention implementation now properly set on all sub-configs; manually calls
adjust_attn_implementationin modeling code - Generation cache preparation (#43679) β Sliding window configurations now enforced during generation; models with sliding window attention will limit sequence lengths
- Backbone utils refactor (#43323) β...