Synthesizing Programmatic Reinforcement Learning Policies with Memory-Based Decision Trees

Published in Programmatic Reinforcement Learning Workshop - RL Conference, 2025

Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization, meaning involving higher order constructs such as control loops. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency and very little is known on the theoretical front about programmatic RL. A burning question is studying the trade-offs between sizes of programmatic policies and their performances. Hence, the motivation of this work is to construct programmatic policies with the shortest paths to the target region, ensuring near optimal behavior. Alongside this, we also investigate how we can reason with programmatic policies by using two learning systems: Reward Prediction Error (RPE) and Action Prediction Error (APE). How can we learn programmatic policies that can generalize better? The goal of this paper is to give first answers to these questions, initiating a theoretical study of programmatic RL. Our main contributions are construction a near optimal policy using memory-based decision trees, and studying the generalizability and size performance.

Recommended citation: Shabbar, Aya. (2025). Synthesizing Programmatic Reinforcement Learning Policies with Memory-Based Decision Trees
Download Paper