Hacker News

by naminon 11/11/2023, 10:52 PMwith 15 comments

This is a weekend hack that I'd like to further develop as it's working surprisingly well.

Using MCTS, we can explore a space of possible verified programs with an LLM. We check the partial programs at each step, and so steer towards programs that pass the verifier.

https://github.com/namin/llm-verified-with-monte-carlo-tree-...

by ianbutleron 11/12/2023, 6:35 AM

Nice! I had a similar idea to use MCTS to explore various AST generations through an incremental AST parser[0], so I could generate runnable code via LLM at a higher success rate than what we were seeing initially with them. I never got around to it, but this makes me curious to circle back around.

[0] There are few parsers that consider "failed, but could run if the next symbol is valid" and "failed and no way to continue" as separate statuses so you can make progress by enumerating (hopefully intelligently so) the space of valid states. Surprisingly TreeSitter wasn't one of them though, for that it's just "fail".

by wokwokwokon 11/12/2023, 7:01 AM

Anyone who's game able to explain what this actually does?

The main loop looks simple:

    def generate_complete(text, montecarlo):
        text = llm.generate(text, 1)[0]
       score = score_func(text)
        if score is not None:
            if score < 0:
                return None
            else:
                if can_be_solution(text, min_lines, check_fun):
                    montecarlo.solution = text
                return text
        else:
            return generate_complete(text, montecarlo)

...but, the heart of it (1) just basically checks if the dafny syntax is valid by posting to https://dafny.livecode.ch/check

How is 'syntax is valid' a valid scoring mechanism here?

If you look at the output examples (2), all I can see if generating functions and lemmas; this is equivalent to generating a function a bunch of tests for it.

I'm not sure I see what value the MCTS is bringing here.

Anyone get this and care to explain?

[1] - https://github.com/namin/llm-verified-with-monte-carlo-tree-...

[2] - https://github.com/namin/llm-verified-with-monte-carlo-tree-...

by will_byrdon 11/12/2023, 5:57 AM

I love that the code is so short and understandable. Could be turned into a Pearl.

by outlier99on 11/12/2023, 1:54 PM

Could this be combined with something like llama.cpp's constraint-based grammar (https://github.com/ggerganov/llama.cpp/blob/master/grammars/...) to always enforce syntactically correct code output?

by ilakshon 11/12/2023, 2:28 PM

Here is a dumb question: how does it distinguish between a partial solution that is just missing some characters/lines and nonsense?

Show HN: LLM Verified with Monte Carlo Tree Search