💭 anchorhead requires a strong llm. Its not an easy game. Testing the long term...

💭 anchorhead requires a strong llm. Its not an easy game. Testing the long term memory requires long sessions. Long session * strong llm = too much money! I don't believe there is an adequate alternative testbed that doesn't risk being a bad proxy and wasting my time (though maybe my standards are too high). This indicates I should give up on trying to automated test the system, and just dogfood it and test it that way (which is unfortunately demoralizing when it fails partway into something hopeful.