Language models speed up local search for finding programmatic policies

Sadmine, Quazi Asif; Baier, Hendrik; Lelis, Levi

Q.A. Sadmine (Quazi Asif), H.J.S. Baier (Hendrik) and L. Lelis (Levi)

2024-11-11

Language models speed up local search for finding programmatic policies

Transactions on Machine Learning Research

Encoding policies that solve sequential decision-making problems as programs offers advantages over neural representations, such as interpretability and modifiability of the policies. On the downside, programmatic policies are elusive because their generation requires one to search in spaces of programs that are often discontinuous. In this paper, we leverage the ability of large language models (LLMs) to write computer programs to speed up the synthesis of programmatic policies. We use an LLM to provide initial candidates for the policy, which are then improved by local search. Empirical results in three problems that are challenging for programmatic representations show that LLMs can speed up local search and facilitate the synthesis of policies. We conjecture that LLMs are effective in this setting because we give them access to the outcomes of the policies rollouts. That way, LLMs can try policies encoding different behaviors, once they observe what a previous policy has accomplished. This process forces the search to explore different parts of the space through "exploratory initial programs". Experiments also show that much of the knowledge LLMs leverage comes from the domain-specific language that defines the search space - the overall performance of the system drops sharply if we change the name of the functions used in the language to meaningless names. Since our system only queries the LLM in the first step of the search, it offers an economical method for using LLMs to guide the synthesis of policies.

Additional Metadata
Journal	Transactions on Machine Learning Research
Project	ALIGN4energy
Grant	This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id NWA.1389.20.251 - ALIGN4energy
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Sadmine, Q. A., Baier, H., & Lelis, L. (2024). Language models speed up local search for finding programmatic policies. Transactions on Machine Learning Research.

Free Full Text ( Final Version , 1mb )

Language models speed up local search for finding programmatic policies

Publication

Publication

Address

CWI researchers

Questions or comments?

Language models speed up local search for finding programmatic policies

Publication

Publication

Workflow

Workflow

Add Content