After receiving our recent $24M funding round—and being an AI-native company, after all—we might have been tempted to “AI all the things!” However, using AI everywhere is rarely a good strategy. Cost and latency issues aside, the main problem is reliability. AI is wonderful, even necessary, for many tasks, but suboptimal for others. The key to maximizing deliverable value is to interleave AI and traditional techniques. I call this the “AI Sandwich.”
One of our current goals on the Mechanical Orchard R&D team is developing tools to help with mainframe comprehension. As part of this, we wanted to parse and extract the steps that make up a JCL (or Job Control Language) file. We built a simple code parser with traditional techniques which met all of our requirements except for one: including step summaries.
JCL is an extremely procedural and generic language. In order to understand what a given step does, you often need to refer to the comments written in the code. Summaries would mostly come from these comments, but because the comments are often inconsistent, traditional parsing falls short. The proposed solution was to use AI to parse the summaries for each step and then merge the AI generated summaries into the traditionally parsed steps.
When using LLMs for AI generated results as part of an overall pipeline, you need to be able to trust the output. This frequently involves specifying your desired output format in your prompt, then parsing, validating, coercing, and potentially retrying the results, ideally with error hinting. This can get messy and involves a lot of boilerplate code. Code readability and simplicity is very important to me, so I want to share a few tools that can help keep LLM code nice and streamlined.
Instructor, Guardrails and Marvin are all similar tools that lean heavily on Pydantic’s data validation to hide away the aforementioned boilerplate mess. These all guarantee the outputs from LLM calls. Here is an example:
Running parse_jcl_step_summaries will return a list of StepSummary’s that you can pass directly into your other code. You might be thinking that I forgot to write the body of the parse_jcl_step_summaries function. You’d be wrong.
The @ai_fn decorator uses the docstring to construct a prompt that includes the templated values passed into the function (by introspecting the function parameters), as well as output formatting instructions based on a generated schema from the function return type (a Pydantic model in this case). It calls the LLM with this generated prompt, and uses Pydantic to parse the results, and retries with Pydantic validation errors included if necessary.
Congratulations, you just wrote a prompt and called an LLM without actually writing a prompt or calling an LLM – pretty slick!
With all of this functionality at your disposal, interleaving the traditional and AI parts of the code is trivial:
Super clean. This is the AI sandwich. You almost don’t realize AI is involved.
We tried doing it all with AI and it almost worked fine, but not really. The problem was that sometimes it would leave a step out. Or it would only identify two out of three of a step’s dependencies. The worst part is that we would have no way of knowing that information was missing, and wouldn’t be able to trust our output. Fine-tuning might help, but we still couldn’t be 100% sure. By using traditional techniques to parse the steps, we guarantee that we get everything, and then enhance the results with the AI summaries.
Even so, the nondeterministic nature of AI still caused a difficult-to-catch bug. The bug appeared when the traditional and AI passes identified a different number of steps, causing the merging phase to be mismatched or incomplete. Luckily, the tools described above also offer a clean way to solve for this problem by simply adding a Pydantic validator with context to StepSummaries and using that instead of List[Step]:
Now if the AI misses steps it will try again, and the code remains clean, simple and declarative.
The strength of AI comes from using it in exactly the right places to elevate traditional techniques beyond what would otherwise be possible. My rule of thumb for when to use AI is as follows:
With these tools and techniques, writing good AI code is the same as writing good code.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere. uis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Delete