seems kind of simple but input - very cheap llm query system deferring to more expensive

💭 seems kind of simple but input -> very cheap llm/query system deferring to more expensive llm/query system recursively -> output seems good. It's like a rag with more knobs that could hopefully under some parameters minimize cost while ensuring good answers over large contexts. Not sure what shapes the recursive querying could take or under what conditions defer upgrading would be needed, and there's a tradeoff where cheaper systems are worse at knowing when to defer