💭 should benchmark success at using llms on basic natural language tasks to see if...

💭 should benchmark success at using llms on basic natural language tasks to see if performance and cost are good enough. Eg could you get a cheap llm and input a query and indexed list of sentences and get out a list of sentence indices that are relevant?