9.9.0
This release includes several new reductions including contextual bandits with graph feedback as well as a completely new interaction grounded learning reduction. There are also WASM bindings available for Vowpal Wabbit now.
Contextual Bandits with Graph Feedback
Contextual Bandits (CB) with graph feedback can be used for scenarios where some actions, when taken, reveal information other actions (not taken), or maybe don't reveal any information at all. If there exists prior knowledge of this relationship between actions then that knowledge can be used to make exploration and learning more efficient.
See here for more details.
Contextual Bandits with interaction grounded learning
Interaction grounded learning (IGL) can be used for the scenario where user doesn't have a reward function. It will automatically learn a personalized reward function from user's feedback and optimize directly for the latent user satisfaction
See for more details