💭 are llm tokens case sensitive (yes?), then could you make a model dramatically cheaper to train by lowercasing all training data and input prompts? Imagine you're just working with plaintext