💠when learning something technical from an llm, good to always assume the llm is wrong and bullshitting you and treat it like a puzzle to find the gaps and mistakes in its reasoning
💠when learning something technical from an llm, good to always assume the llm is wrong and bullshitting you and treat it like a puzzle to find the gaps and mistakes in its reasoning
💠when a project is clearly a fools errand, think carefully before doing it anyway
💠I promise not to use it for evil, but I think I have a system in mind that could make an llm optimally impersonate me, at least for short texts with little context
💠I remember struggling with how to record memory about temporary state such that it wouldn't become confusing later when the state became untrue. Maybe such state could be defining and setting a variable (eg location). Still need to avoid reading outdated data. Could try to trigger an update as soon as it becomes untrue somehow but that sounds very heavyhanded. Could try to standardize what kinds of state are tracked so next time a location is noted it updates the value (rather than the old memory needing to be marked no longer true) Could also treat state memories that haven't been set for a while as suspect and ask for them to be confirmed which will hopefully happen while the value change is still in context so it knows the answer.
💠I feel irrationally like generating games from detailed rules descriptions shouldnt be terribly hard. I think it is, but I have some intuition that that's wrong
💠it looks like making a statistical model on eg dominion card attributes is actually hard, at least if you want them to be explainable. And if its not explainable it's going to be very hard for it to give meaningful feedback or iteratively improve on
💠you could recreate discord nitro with a quick browser extension
💠I think tragic characters might be easier to write long plots for, since if a character fixes their flaw its now time for them to get off screen, while if they tragically fail to fix it that can keep demonstrating in different ways and escalating. Though I suppose its important it doesn't feel like the writing made a promise it didnt keep
💠itd be nice if paper notebooks had a better way of reading pages facing the same direction, like both sides of the same sheet. feels pretty stupid taking a picture of a page to read.
💠2am startling revelation that a pineapple does not resemble an apple and its tree doesnt resemble a pine
💠if you can cheaply and effectively score responses does automatic prompt optimization pretty much solve the problem?
💠trying to get good llm rpg players sounds like a fools errand given how many massive sub problems it has
💠hard to tell someone about my interests without it sounding like a string of boring words
💠I enjoy having notebooks full of handdrawn diagrams and things crossed out
💠I enjoy knowing words spellcheck doesnt know
💠llm powered scribblenautsy cheatcode typing game
💠when you make a social blunder, mark your calendar so that you can take an annual moment of silence to recognize it
💠technically that could be done by parsing into json, using some form of embedding and basic ml, but that wouldnt be able to give any interpretable feedback. Could auto/llm generate interpretable input measures and use those instead. Maybe some model where you can predict each input from all the other inputs and directly see how in-distribution each input or combination of inputs is
💠a sort of function stylometric anomaly detection to try to discern which content is official / popular. Self improving like a programmatic gan, and able to give feedback
💠llms are also poor at recognizing good ideas in a sea of poor ideas. It's possible human feedback per generation may be needed for quality assurance which prevents ideas in which the new generated content is surprising or generated mid game without bias.
💠Common knowledge at least to me specifically that if you ask an llm to make content for a game it'll do a trash job, which is no good because I want unlimited content.
How to solve?
You'd teach a human by having them read a guide on good and bad design principles for the game, get them familiar with common bad or overused ideas, and then give them discerning user feedback. Llms aren't well suited since that's a large content dump, and they don't learn from past feedback without context bloat or tuning I'd love to avoid. They'd learn somewhat from examples but it trades off coverage with context bloat.
Llms have an additional problem of clustered sampling where ten parallel prompts will lead to roughly the same ideas each time, and thus meaningful seeding is required to force originality.
💠soldier launcher You and enemies launched by kb from this have 30% less gravity. While rmb is held, changes to +30% gravity. Some downside that doesn't affect kb at all.
Synergy with airshots as enemies are floaty, better rollouts, possibly makes mantread kills doable, probably not good with gardener unless rmb works even while not held
💠train an ai to read my blog
💠Note for predictive image generation that I should try making a custom conditioning node in comfy, being the earliest point to skip text prompting.
💠amazon's dumb little website, aws
💠What if you don't discard action vectors, but rather than trying to predict the set of actions (may be many, complicated to predict and loss, many may be poor actions) we try to predict the top n highest visit count actions, maybe 3. Going to have to fill a lot of notepaper before coding anything or I'll waste more time
💠I keep redesigning muzero and then realizing my new plan doesnt actually make sense I'd just been confident that inner nodes could make a single sample of successor states to MCTS over, but that highly favors high variance states because if you don't make stochasticity an additional after-choosing-action step then you can just choose the actions where you've deterministically decided the luck was good? Edit no, the sampler would learn to produce 'average' states, which don't represent any actual states, like drawing an average of all possible card vectors. I assume that such loss of fidelity would limit the usefulness of inner node exploration?
My first approach was least magical so I should take that as a baseline
Every node, root and inner, uses input action vectors, and the state vector + action makes a sampler for the next child. Root node also has a sampler to go from observed to actual state, which I'm pretty sure I dont need for inner nodes. Raises difficult question of how to get the action inputs for the inner states since you dont have game rules to generate them, which I did with some token generation. Measuring the loss of generated actions to actual actions in that state in the game history was expensive, a hard thing to train, though maybe I just did it badly.
So idea was to do away with that and not need actual action vectors in the inner nodes, just generate successors directly, with dummy actions leading to them. But for muzero to work it needs to be able to choose between actions. If each action has one successor it assumes no stochasticity, problem given in first paragraph. But if the successor is sampled without taking an action how would you even differentiate the different edge samplers. Cant sample a sampler I think partly because all you have to train on is the actual successor state.
💠Do I need 1 observed root node samples latent for actual root nodes (no hidden info), then actual root nodes + action sample successor (due to stochasticity) 2 (observed root node itself + action)s sample successors (one sampler handling hidden info and stochasticity 3 observed root node samples latent for actual root nodes with a 'fixed random seed' (sampling hidden info and stochasticity) and actions from that point have fixed outcomes Would they give the same result?
💠note that I should try again automatically generating Dominion or wiz war tts cards from text or better modeling generating cards from nothing
💠1 have ai generate the image you want (boo hiss) 2 reverse image search to find the most similar non generated image No Ai art is used
💠great job young me for identifying programming as a rewarding activity to pursue. I strongly suspect this is more engaging than the median option I could have considered
there are so many words that mean 'a tensor output, with a little added context' embedding, encoding, latent, hidden
💠I'd like to see someone try to play multiple games at a high level simultaneously, maybe using foot pedals and peripherals to route inputs
💠are llm tokens case sensitive (yes?), then could you make a model dramatically cheaper to train by lowercasing all training data and input prompts? Imagine you're just working with plaintext
💠Blokus would be badly represented with tabular data. It may be an outlier but it shows a lack of universality
💠I just can't think of any simple game that would require graph relationship between entity tokens. I'm pretty confident spirit island and probably mtg would badly want that but anything reasonable to code could do fine without it. I can't properly validate the idea without an environment that requires it.
💠- use debugger to understand how swarmui is generating images and at what point a partial result could be embedded to skip prompts and go straight to vectors.
💠if you provide 99% up time that means you can just turn your servers off for about 7 hours per month
💠I think any mtg rules engine could be modified to make real time mtg, similar to real time chess. I imagine there are other opportunities to add stress and chaos to normally turne based games
💠Inner muzero nodes would have a representation vector and a next state vae, with no information about what actions reach those states. Value can be derived from the representation vector
💠I'm thinking I could just progressive widen likely following muzero states rather than generate actions. I don't know if I'll have time to try that before end of December
💠I suspect there's a good general approach to generating inputs that produce high rated outputs via a system, eg automatically getting image generator prompts trading between information gain and expected rating
💠Pretty concerned I'll end up accomplishing ~nothing novel on the board game ai front. I think so far none of my innovations have been demonstrated to be a good idea, and many parts have been shelved for scope.
Currently working on:
Current status for my implementations:
💠seems also at odds, making the story satisfactory at variable lengths, and making it have a cohesive full story arc. Difference between having a big story circle with smaller circles inside (full arc), vs chaining circles (episodic arcs). Episodic arcs feel tacked on.
💠I'm guessing arc focus and easy extensibility are at odds given arc focus wants to pull things to a close and want events on the way to tie in heavily. There's only so much you can reasonably do to support one arc at which point new content must just be filler. Filler can be fine but is subtracting from the arc.
💠pretty sure some of my favorite books mostly ignore character arcs to focus on problem solving focus. Sorta like deathnote
💠related, need to draw out major character arcs so they finish as close to the final ending since fully developed characters quickly get dull and need to be taken off screen. That seems stupid. Though at least a longer transformation is less likely to feel natural than a Disney sudden character shift when the plot demands
💠how do you make a fiction feel done but also extendable? I think typical answer is either making it episodic and neatly solving each episode or inventing new arcs for each season which tends to feel awfully forced.
💠Sweet beans are made of these ^
💠more like scamazon slime
💠you can of course recompute the state with the next set of action components, which reduces the problem to just being "some actions will take two or three times longer to process due to repeated state computation"? And that somehow it'll need to handle inputs representing partial actions.
💠I realize that compound multiplicative action spaces kill actions as input tokens dead which probably also kills general muzero dead which invalidates a lot of my work. I have a notion that of course it must be possible somehow, but I think that's not actually true.
💠true punk is replying to donotreply emails
💠I learned if you use cross entropy on a smooth target you should calculate and subtract the base entropy from the loss if you plan on adding it to other loss terms. 🎈
💠Hmm what if the llm output a predicted rlhf score for its response to indicate 'confidence this would get upvoted'? I expect it wouldnt be super useful due to limitations of human granted thumbs but it might be better than nothing, and if you somehow had objectively perfect rlhf it'd be a solution?
💠idk how to categorize this error, but errors like Llm says its necessary to do a tournament step after each round of training alphazero to ensure the training has lead to actual improvement
It sounds wise. It can come up with arguments that use words like "collapse," and talk about problems of proxy measurements. It's not true though, since MCTS provably converges to optimal strategy and you're learning to predict MCTS. I could imagine its possible to fall into some collapse state but itd be a wild fluke, and neither alphazero nor muzero do tournament checks (though alphago did).
Still, I guess because the tournament is an associated piece of jargon it adds it to the plan, and then having done so it keeps acting smart and backing itself up. You can ask "Why is it necessary to run a tournament check..." and "Why isn't it necessary..." in fresh chats and it'll fart out incorrect but wise looking answer either way. Obviously hallucination exists, but I'm not sure how to deal with it in these sorts of situations besides already knowing the answer.
💠could probably use llm question generation + information theory wrapper to make a better akinator 20 questions game. I wonder if similar thinking could work for other problems.
💠even microblogging to the void about writing feels like cringey faking, which I should also fix.
💠I'm not hogwild about writing but I want to have broader capabilities than I do, so I'll hard focus a different skill every two months or so and try to break through the skill floors. I have a problem where I get stuck extremely easily: I think of a question I need to answer, I fail to think of a satisfactory answer in a few minutes of brainstorming, and I switch tasks to passive consumption. I've got to kill that somehow for this to work. Debugging and planning programming tasks feel more like following a trail, while these problems feel like trying to generate an answer from nothing.
💠I dont think there's any serious sota breaks in recent llm models, at least for my practical usage. They're good and bad at roughly the same things to roughly the same degree and differences I can notice are mostly standard variance I think.
💠comparing gpt5, 2.5pro, and 3pro on a very technical research question I'm pretty familiar with at this point (how to generate candidate actions for muzero in environments with dynamic action spaces) They all do quite badly.
2.5 pro followed instructions better, gave the most likely to be useful solution (use a vae to generate a fixed number of actions, which could make sense if we also predict the number of actions to generate), and noted one of the errors it made earlier, framing it as a "downside" of the suggestion.
3 pro for some reason thought that any action could be represented as a pair of two entities and built everything off of that. It redesigned muzero to support that in a way that'd be extremely expensive to compute and didn't mention that.
gpt5 was similar to 3, answered a lot more than asked for, and gave a solution with poor scaling, seemed to strategically not look for problems with the solution.
None of them came up with "just generate action vectors until you get a stop token" which seems like the obvious baseline, and all came up with answers with more buzzwords.
💠I think a character being likable comes from the reader (almost said user) being able to empathize with them, rather than anything that normally makes a person likable. Possibly also secondhand liking them on the MCs behalf.
💠now that I've solved self play data generation I get to discover that the training pipeline is too slow.
💠might be a worthwhile exercise to formulate value statements and try to find / llm generate maximally string steelman arguments against them. Maybe only if the value isn't improving your life satisfaction
💠reword: I wonder if they feel productive knowing that work is being done while they're idle as I do
💠I wonder if managers/employers feel productive by telling people to do things in the same way I feel productive by telling computers to do things
💠I'm guessing the answer is technically yes but practically not without weight access
💠I forget the technique which finds an input which maximizes a certain classification in a model. Could the same thing be used to find a system prompt which maximizes the rlhf scores of data, thereby avoiding finetune for rlhf?
💠I suspect we have the technology to randomly generate pretty good board games from nothing
💠okay if something can be decomposed into data and structure then it can be pattern analyzed statistically and therefor generated
💠so whats the lowest fidelity way to make a class based fps like tf2 or overwatch that feels good? Can 3d modeling be avoided somehow? Sprite based animations might be actually worse though particularly if supporting many angles / above and below. Is there some unexpected way to get silhouette and character and aesthetic almost for free?
💠discrete outputs are a truer representation of a discrete game though, it'd just somehow need to learn an absurdly large codebook. One item per state is stupid. Maybe it could predict multiple separate discrete outputs and combine them somehow. Technically it could predict a variable size state representation like a series of tokens but that sounds too expensive.
💠Stochastic muzero uses vq-vae with a discreet set of outputs rather than regular vae with a continuous output. Maybe that makes sense for games with smaller state spaces but doesn't make sense for eg mtg? Unsure, but reducing the expressivity of the state space sounds like a really bad idea when its supposed to be able to express basically any imaginable gamestate
💠landmark in that I've finally got selfplay distributed to cloud compute, so I should be able to get self play games far faster and have cpu to spare for other things
💠llmy techniques could probably catch higher level stylometric patterns which could be converted into cheaper to compute patterns. I suspect that's more interepretable but not more powerful for author prediction, which is the common usecase
💠automatic feedback via stylometric analysis? Doesn't exactly give advice but maybe suggests where to find problems. I wonder if stylometrics for music are established. Every time I've tried to compress music to stats and patterns I've quickly found a brick wall given how many ways those patterns could be expressed and obfuscated.
💠part of me doing mostly programming is that I'm obviously comparatively crap at things I haven't spent a decade on. Making crap isn't bad, but its not terribly satisfying, and I think there's a draw to hide it away due to its garbage status, which is probably counterproductive.
💠pretty sure for some games, efficiency symmetry handling is mandatory, like the 100 coppers example. Can manually change the legal actions to avoid obvious duplicate actions, which might be good enough :/
💠had an llm read my blog and tag things and it classified an alarming number of posts as shitposts
💠symmetry handling is a pain. A game needs to define in which ways it's zones are symmetrical, eg chess is symmetrical only horizontally.
I'm pretty confident there's no way to avoid actually transforming the state into each symmetrical form and checking for equality, which is a lot of overhead to be doing at each step.
Games with multiple zones need to handle their symmetries separately lest the number of possible symmetries scale multiplicatively (eg a you have a board which is symmetrical by any reflection+rotation and each player has an unordered set of piles of tokens which the game refers to by index)
You don't want to actually transform the board to its canonical form, so even you calculate the canonical hash, go to the already created node studying that position, but need to be able to produce the original board. Similarly, the legal moves from the canon position are transformed versions of the actual legal moves. D:
It's preferable for the AI to just not consider moves that lead to symmetrical positions (though I think it may be isomorphic to perfectly handling symmetry in-tree) but I'm pretty sure you can't automatically know which moves will lead to symmetrical positions prior to simulation. My existing works have manually avoided symmetrical moves in cases where it's easy to program, but that's added game scripter responsibility.
In some circumstances symmetry handling is hugely important (AI has 100 coppers in their hand and is trying to figure out what order to play them) while in other cases symmetry will virtually never occur (chess, I think) and it's potentially a pretty expensive and complicating system.
💠todo: make a contact page that's got a blender animated animal-crossing-Rover (but with my head?) doing a dialogue tree to get context for their message
💠unbounded cache size caused times to get much worse, which doesn't make much sense unless I presume that's because it was slowness due to using all the system resources
💠also gotta start running my compute heavy jobs on cloud because this was not efficient.
💠should implement the following separately, maybe even in different branches:
Feel like I'm missing one. For each I think I need to get it working for alphazero OR muzero or both. I'm not convinced it's always worthwhile to implement everything in alphazero given it adds expensive overhead, and muzero magics away a lot of problems with its internal representation.
💠I should find something more useful to do with all these orphan thought posts. Maybe getting a weekly email with random past posts or getting an llm to auto tag or something.
💠going to try putting an end of December "deadline" on the ai work after which I have to work on something else in the hopes it makes me plan better.
💠at some point I need to make an actual blog to go with the microblog since I do spend time working on weird stuff and what I've learned is potentially useful to the right person. Also feels wrong not to have a blog because all the cool kids have one
💠if a small beautiful boy without hands was difficult to take care of, they'd be a handsome handless handful
💠selfplay is taking up 14 gigabytes of ram. This may explain some problems I've noticed
💠magic item: a clock that shows the time ten minutes into the future
💠is there any way to make a long llm chat with text message length messages cost effective? I think even the most generous caching wouldn't help
💠if you tell a cpu to play board games against itself 24/7 for weeks it gets all sweaty
💠how to efficiently get an ml model understanding a game state to be able to work with with deck information. Traditionally you write a (potentially very complex) simulator handling the hidden info, but that doesn't scale.
If you just have a shuffled deck, yes you can simulate random draws. Even simple manipulations like "I put card A 3 cards from the top" or "I shuffled the discard pile of these cards" makes a simulator complex to write. If the opponent mulligans some cards, neither their hand or deck is now random, and the distribution of states describing them as now very expensive to compute.
So maybe you don't do that and you just use ml state prediction, but how do you make a model that'll be good at that?
💠I need me a taxonomy of board game mechanics at the smallest level. In mtg "target creature has +3/+3 until end of turn" makes sense but "target player has +3 health until end of turn" doesn't. Theyre different types of numbers or they have different properties? How many are there? What are the absolute constraints of when x vs while x? Can any programmed system have truly arbitrary event handling?
💠I think there's usually some tension between "do I make big project X" or "do I make bigger project Y that'll make projects like X much easier in future", at least for me. Y seems obviously higher impact but also more work before the initial reward. I don't like not automating things.
💠I realize a drawback to using self play data with non symmetrical models (stronger vs weaker model) In addition to dilution and stuff, it may learn to infer "if we made it to this state, we must be stupid, and therefor stupid actions are more likely" and learn to recommend them. Itd also learn that stupid positions are more likely to lead to losing states (more than is actually true) but thats not a big deal
💠you can automatically measure how easy to use a library/framework is by the success rate of getting llms to make test passing code with it
💠I think a sufficiently easy to use and famlilarize game engine plus llm code and test generation sounds like the path towards automatic board game coding. I want that.
💠recurring trend where I have some issue with a large established library (pandas, instructor) and eventually I rapidly reimplement the sliver of it that I need
💠a person cleaning plates in a restaurant dreaming of wealth is a dish washer dosh wisher
💠very important you load your model weights, not just init the model
💠tip: if you need to do expensive data processing, do it in the dataloader, not your forward function, like a goof.
💠imagine the utopia if young children were taught how to use vectorized operations instead of loops
💠Possibly nonsense For some non-text problem, like predicting one system state from a history of past states: Express the problem with new unique tokens Tune an existing small llm on the prediction task. It'll learn to embed the new tokens. Optionally heavily ablate the tuned llm, making it as small as possible without hurting performance on your dataset.
Since llms are already able to work effectively broadly over many domains, the idea is to try to leverage existing circuitry.
💠if one could induce parts of their brain to sleep at a time they could remain mostly functional without sleeping, but would get a sort of split personality issue where at different times they have different memories and relative capabilities.
If life had mods the most downloaded would be difficulty reduction and qol like No Sleeping and Fast Travel Anywhere and beneath that would be high effort content mods like Better Cooking and Pokémon, and beneath that a sea of lore unfriendly joke mods like Hatsune Miku Joe Biden Skin
💠llm generates mechanics/systems, generates orthogonally interesting entities that interact with each other and the systems, simulates things in an auto dwarf fortress way. Becomes a prompt based text adventure probably with a limited action vocab where the player needs to read the generated wiki to learn the environment.
💠game where the core mechanic is reading the wiki
💠oh shoot my idea if using action tokens rather than dominions separate action network is the Dominion method can handle hierarchical actions (choose card then choose target then choose x etc) iteratively without recomputing the big network whereas id need to either make an action token for each step which sounds complicated and lossy or rerun the transformer encoder on the new tokens. Woe. Not sure if that idea is shot dead.
💠war game measured positioning might be the only use of continuous variables in non-dextrous board games?
💠if everyone could assign stat points, what would the established norm be? Presumably the majority in the past would go into physical stats while now most would go into mental stats. If everyone's stats are visible maybe all in mental would be seen as necessary to be a serious employee, or maybe "wasted" stats would be a form of costly value signaling.
💠1 wake up 2 self report qualia 3 exhibit questionably goal driven behavior 4 appear coherent
💠seems kind of simple but input -> very cheap llm/query system deferring to more expensive llm/query system recursively -> output seems good. It's like a rag with more knobs that could hopefully under some parameters minimize cost while ensuring good answers over large contexts. Not sure what shapes the recursive querying could take or under what conditions defer upgrading would be needed, and there's a tradeoff where cheaper systems are worse at knowing when to defer
💠I'm not at all happy with my cool thing production output. I can blame work and I can blame ai dev being hard but that doesn't put me on track. I need to get more directed and intentional.
💠really good automated midi to roman numeral analysis would sure help understand music
💠essentially I'd like to be able to generate fictional wikis of cool sounding novel content - not vague samey uninspiring content.
💠some way to search the game mechanic space from a set of primitive actions to maximize novelty and leading to interesting choices. Not sure how to define any of that
💠is there a name for the phenomena where you excitedly let something run overnight and then in the morning find it stopped almost immediately for one reason or another?
💠getting frission from rereading the google c# style guide
💠hang glider horse
💠^could be a paper's please about finding security vulnerabilities in PRs actually
💠cozy indie game about resolving merge conflicts
💠normally you train a game ai to try to win quickly and lose slowly, in other words a loss now is worse than a loss later. Unfortunately that means if the ai finds an action which has no effect on the game except to make the action history longer, it'll learn to do that when losing
💠I wonder if you could train a sufficiently large cross game muzero such that it could more quickly learn new games by transferring share concepts like resource management, game theory, value of information, etc.
💠key weakness of llms is trying to do everything in one message? Not "thinking step by step" enough. Eg I explain a very difficult problem I'm trying to work through, it'll just pretend its solving it, when a thoughtful reply would be acknowledging the difficulty, breaking down the problem, making connections to existing techniques, etc. If you asked an llm to write something long form it'll just start putting words down without any kind of outlining and with total blindness to how generic and sloppy the writing is. You could wrap any prompt in "make a plan breaking down the steps to do x" and forward those tasks to more llm calls, but how do you know when you're at a level of detail that can be acted on rather than that needs more breakdown?
💠web dev framework for making early 2000s style sites
💠Sometimes without any context or prompting, my brain will something like "How blessed I am to live in the same universe as the classic SNES game 'Chrono Trigger'" Actually the mental non sequiturs are common enough that I'm curious what exactly my subconscious is up to, and if it has anything to do with my inability to maintain a train of thought.
💠a programmed function can generate in out observations. An ml model can approximate the observations. Since we then have program ast tokens to ml weights, we can generate training data to learn the inverse, to write code describing the behavior of an ml model?
💠Social deduction ai: Might be worthwhile to make a single highly powerful state solver rather than affordable full players. Could provide all chat history up to a point and have it try to predict the state likelihood. Would need to somehow script authentic observations to test it.
💠perhaps humans doing a play by post game could tolerate playing with an llm to test, but the games are long and I don't imagine most would want to risk it being ruined.
💠as for the larger problem of making llms actually play well, that's quite fun. I don't have a cheap solution, but I think listing and maintaining counterfactual states and having each state separately explored and evaluated might be effective. Will need to be able to create meaningful counterfactuals, combine them, rule them out, and get a global view of how to act given the whole information state. Unfortunately while I think this may be affordable for one player, I can't imagine it being cost effective to have 10 such agents play a long game together.
💠on natural many llm chat (as opposed to long-form turn taking).
At a 10 person public discussion, turn taking fails terribly. A challenges D. B gives evidence against A. C talks about F. By the time A is allowed to contest B's comment, it's one of many simultaneous conversations, and now A must also comment on the other present conversations. Every message is overburdened and the chat history is full of such walls of text.
Two challenges to fix this. Cost is one since shorter messages likely means more total messages (though ideally not everyone needs to weigh in on every claim). I'm hoping a strong llm can make a plan / playbook which can be performed by a cheaper llm. I fully anticipate being disappointed.
The other is knowing when to speak, since of course an llm is not "hearing" the chat until it is prompted to respond and thus accruing cost. A cheap llm could represent each actor, though frankly I don't have much faith in cheap classified llms and imagine by default everyone would want to comment on everything and tuning it could be painful. Another option is an invisible moderator who chooses who should speak next possibly with some kind of weighted queue. Each actor can list what subjects they might want to comment on, and the moderator could enqueue anyone directly mentioned or who appears relevant given their list. That still would necessarily lead to more opportunities to talk than actual messages (presuming most messages could potentially be responded to by multiple others) so I suppose I have to hope the cheap llms are cheap enough and can be tuned to shut up when they have nothing important or new to say.
💠The core design for white knuckle is great. Horror games are usually mechanically about sneaking/running, which are both fine but have their own problems. Sneaking is usually mechanically along the lines of red light green light where you become safe by halting your progress for a while (an unfair reduction). Running is less common and in most games is pretty much pressing shift+w. In either case, the mechanic being used to survive isn't challenging in itself, meaning it doesn't carry so much of a "I need to succeed" tension. They're scary in other ways.
White knuckle is pretty much 100% running based survival where the running (climbing) is technically complex and varied enough to be the entire game, with high enough cost of failure that the whole game is "I need to succeed" tension.
Irrelevant, since tension is usually tied to cost of failure, non-dextrous games like board games usually handle it by having more emphasis on bad luck. Darkest dungeon where enemies can crit and you can permanently lose trained party members. Kingdom Death Monster where any damage can cause an injury role that could explode your head. I find this to be a bit boring but I can't think of something better. Maybe make it a puzzle with a time limit.
💠Difficult social deduction game play has other problems besides high cost of course.
💠Previously mentioned problem: Long running llm task has different degrees of intelligence needed per step. Sometimes you need to plan/navigate, and other times you're trivially enacting the plan/walking the path. Using a strong llm at all times is needlessly expensive and using a cheap llm performs poorly at planning. This ties into other long term agency systems but isn't directly entangled I think.
Given a new situation, use the strong llm. Strong llm produces a plan and takes the first step. Future responses use the weak llm until
If this works well it'll be exciting for things like the text adventure project and social deduction game play given those fell apart partly under the cost of needing strong llms over long terms. If success can be quantified, could even statistically tune some parameters.
💠if you had jackbox with 4 humans and 4 llms it might be interesting to see which models do best. Might be fun to anonynize everyone
💠not actually using pokemon because they're not meaningfully distinct from each other with their shared move pool and minimal mechanical space, but every other part of pokemon with collection and countering and powering up over time. Say, via an autochess store or pack drafting. Then need a way to avoid developing a quick meta where you use the same team every time, maybe with injuries benching units or something like in deck builders where your combo may not be available in the random hand you draw (though I'd like to avoid bad options diluting the pool rather than just adding to it if possible since I think it discourages experimentation). I think given interesting enough units the autobattle part will be interesting. The games of tttf I solo playtested were usually pretty hype (though I expect an ai to play far more predictably).
The pokemon esque powering up can be done cheaply with add on systems like attaching upgrades. Not sure about rarity unless I start designing units specifically for imbalanced play
💠only just saw setting deadline by how much time the project is worth rather than "when we need if by" or "how long it'll probably take". If it appears it won't be done by the worthwhile point you stop and do something else
💠if you take the muzero states with the worst next state prediction I wonder if that could reveal interesting gameplay? Probably not it's just show high uncertainty states like drawing from an mtg deck, but there might be something there
💠when you ask a child what they want to be when they're grown up they never say a corrupt executive
💠on how to make a team building game like pokemon or autochess. Assume collecting is good. Variable teams is good. Manually directing teams is bad (I don't like mashing a in jrpgs). Strange synergies and counters are good due to personal interest. Possibly finding rare things is good.
💠not sure if I already wrote this obviously being able to make ai for very complex games is good. but any improvement over existing game ais is also good. Imagine you have a standard simple rules based ai. Rather than being a total flowchart, it could reduce the action space to 4-12ish possible moves, maybe combining and abstracting actions, like "move to and attack nearest [unit]". The transformer based state reduction I'm working with would pretty efficiently be able to score those options, and it could maybe seem a lot smarter than should otherwise be possible. You could do only a single state encoding to choose all the ai controlled character's choices if they don't have separate hidden info, which should be very performant
💠adendum, better to have an llm guess and check tuning the input to the desired behavior rather than using rl
💠Imagine you have a game like pikmin. Imagine you already have a strong hierarchical long term agency system, so an ai can remember it's long term goals and avoid rabbitholing. Suppose we have a very fast improved image understanding model that can consistently give object recognition coords on arbitrary objects with low latency. Possibly using save states for training, imagine an llm makes a command (rotate_cw_90, withdraw_pikmin) and an input generator ml can train based on llm feedback until it can reliably perform the command
💠note that rivals could probably run test code on a special combination input or when going to a special test stage. Could maybe do tdd.
💠apparently image generators can do sprite sheets now which might make making rivals characters less awful for me if it works well
💠not sure which direction is causal but me being productive on projects correlates with me making more thought posts. I expect it goes both ways, because a readiness to write down more thoughts forces me to actually consider things enough to have something worth writing
💠really need something like an llm that can understand structured/grid data better, like solve a maze. It'd be really nice to combine rl with llm high level direction. Combine 'able to learn to play well' and 'have some common sense of what the objective is'
💠not sure if I said this yesterday -> can ml produce effective and performant rules based ai? If you have a discrete action space, could you do some kind of progressive simplification of the model into a sufficiently simple rules flowchart?
💠I thought my rivals assistant was largely unknown, but I also have an automatic git backup of many rivals mods, and searching all their source codes at once shows lots of notable mods/modders are using it. Neat. Looks like nearly everyone is just using the sprite exporting part, but that's okay.
💠multi stage takeoff first with mcts then alphazero than muzero in order of how strong they are untrained
💠even in reasonably sized action spaces It still might be useful to choose actions to simulate based on how different the expected avg resulting would be from other actions thus exploring the broadest range of possibilities rather than seeing mostly the same result each time. Idk how to formalize that.
💠alphazero specifically could get more training data by using in search tree states as rows so long as they're thoroughly enough explored. Could actually weight how strong the training data is as evidence though I've never heard of that being done
💠is it maybe useful for muzero to act like it's in a continuous space and raise the level of abstraction arbitrarily past the game system's definition of action? Eg if you had a move of 1 on a grid you want to check each destination as a sim fork. But if you scale down the grid size to be a tiny mesh and give you a speed of a billion you'd no longer want to think of every possible movement as a possible action (unless you have some fantastic filtering process). More likely you'd want to sort of cluster them like "this area is in cover and near the door" or "this area doesn't rely on moving past the window" and sample from each cluster, or otherwise sample evenly over the action space, building up a pattern of what leads to a good sample
💠ml tuned rules based ai? Dramatically reduce action space by combining and defaulting choices. Get a few attributes of the game state to predict value from for a simpler weaker model if possible. Maybe do without hidden info and rng handling for performance. I can't think of a way to avoid sim overhead without learning a simulation model so hopefully it's enough to just do few sims.
💠I wonder if there's some theoretical way to universally make a desktop operating system mobile convenient. It'd be nice to not need android versions of software when I already have a working windows setup that could be copied or remote connected
💠an inverse of muzero, imagine playing a board game powered by ml simulation, like the playable ai generated minecraft.
💠van gogh kazooie
💠rough thought that llms expanding something from a summary is like diffusion generation. Could maybe be good recursively on demand for arbitrary depth. Hard to say what context is needed because id expect other branches of the generation tree to have relevant info which makes it sound intractable. Could maybe be trained to undo "summarize" instructions.
💠I'm not really at all confident that muzero scales up to highly complex games the way alphazero clearly does given dominion. It might, but highly complex games are obviously a lot harder to simulate accurate, which muzero needs to learn to do itself. All the successes of muzero have been on relatively simple games, because that's all anyone's used board game ai for so far. I worry that without a really, really impressive trained simulator model, it'll make dumb mistakes with frequency since its entire search is based off of insufficient simulation. Needs more thought. I don't like relying on a highly performant and compatible game implementation, but the rules have to come from somewhere
💠actually more reasonably, could have llm powered characters and with different backgrounds and interview them. It'd be similar to the social deduction game work I was doing but the human would be the only one responsible for figuring out the hidden state, which given current capabilities would be far less frustrating.
💠there are 'fake phone' style detective games, but now that we can manufacture entire fake internets we could make them open world
💠aws has the only UI/UX I've used that gives me a physical sense of revulsion, in a nails-on-the-blackboard kind of way.
💠the hidden state is continuous (necessarily?) so I need to use a vae that produces a continuous distribution(?) but obviously most random changes in board game states are discrete, eg which die roll you get or which card you draw.
:[ gotta reread how discrete vaes work
💠note: I should sample the vae with frequency corresponding to its std. Total std? Mean std? That way waste less time simulating nearly identical samples
💠llms are probably already used for government message reading surveillance in some places
💠currently training the previously mentioned muzero and getting basically flat loss curves on every experiment
💠coincidentally Chief Marketing Officer and Cringe Minimization Officer share the same acronym and responsibilities
💠Before moving out of my browser I always open a new empty tab, I think so that when I tab back into my browser I won't have the previous tab enter my vision and disrupt my chain of thought. This seems like probably something other people don't do.
💠I think it's plausible reddit could be sold to a larger company that could absorb its costs better, which sounds just horrible
💠putting together a reward free muzero with vae states and progressive widening sampling to cover hidden information and stochasticity, and action token prediction to handle arbitrary action spaces, and a heterogeneous transformer state.
💠I wonder how obviously nonsensical you could make a conversation without the llm realizing its entirely hallucinatory
💠via llms and stuff, could you take a picture of a puzzle and its rules and have it automatically converted into a sat solver friendly form and solved, for any arbitrary puzzle?
💠Introducing Jira Kidz
💠Once you get more familiar with ML techniques, and it stops looking like a heap of unfamiliar words explained with dense math, it's nicely open ended and friendly to creativity
💠if I ever get a decent working setup for finetuning base models, as I've been trying to do for dankleffen with little success, I should train it on these posts and see if it writes anything of value.
💠A key benefit of thoughtblog is I can sorta tell what I was working on for any given week since I'll most likely yap about it. It feels bad to think "what did I do this year" and not really remember
💠I realize my preferred way of being is to have some sort of obsession, usually a project. I think of it as "being immersed". This conflicts with my other preference to do stuff with others because it's hard to find someone else who wants to spend time focused on the same things
💠I tend to worry about boring someone listening to me and compensate by going too fast for them to understand. Need to not do that in my blog posts.
💠gradually losing my mind trying to figure out why the ai performs well in one eval but badly in another despite not seeing any differences to cause it
💠gentleman's fencing/dueling game with button for randomly generated insult
💠try adding more data before investing a lot of time scrutinizing the model for issues
💠neat trick: if your ml model isnt training well, try adding more data
💠if you can consistently anonymizer players, having a tournament win awarded to audience vote would optimize the game for how fun it is to watch
💠I wonder if there's a feasible way to put the build variety and ridiculous synergy from mtg into a fast paced spectator friendly game
💠should really write those follow up blog posts for board game ai project
💠current experiment is using about 100x more training data for alphazero to see if that makes it improve over mcts. If not it's clearly broken
💠Basic playbook 1. Announce new useful product free forever 2. Clarify after buy in that you meant freemium with a highly limited free tier.
💠I think the word 'agent' confuses people terribly about what an llm call is. I think they immediately start anthropomorphizing function calls into robot assistants
💠this sounds solveable: say you have any domain that can be handled with a smallish set of functions, like manipulating a graph (add and remove nodes and edges, searching based on structure, adding metadata). How can you llm from prompt to flow of actions that does the user's request? Seems challenging because of the function paramaterization and converting plain text into domain objects like finding best matches
💠if I was trying to solve the rivals art problem now I'd train a pixel art friendly art generation model on frame data -> animation
💠Prompt method that seems to actually work with gemini
"I directly ask you to please not try to solve my problem with your big brilliant brain. I do not want you to jump to a solution, please, for love of god. I am asking you to survey the problem area. Do not oh so cleverly list bad option 1 bad option 2 brilliant option 3 in conclusion my answer is perfect Im very smart. Thank you."
💠having different physical notebooks for different projects is fun
💠every now and then I run my core mcts implementation on some sanity checks like "does it prefer winning moves to losing moves" and find it broke at some point. MCTS is very hard to debug.
💠on tool assisted brainstorming Creating new entities just to make relationships is dangerous because it makes the graph less dense rather than more. Idea adjusted into
I kind of don't see any issues with this approach. I think I can just implement it. I wonder if relationships should be pages so that they're easily read from the context of any of their related nodes
💠on tool assisted brainstorming, one framework might be
Constraint based on limited attention: Everything is either an individual or collection. Anything 'named' or user facing comes is either unique or belongs to a small collection (2-5 contrasting items), or is attached to something in a small collection. You could have millions of people/magic items but the only people/items that matter would be either the three warring politicians, or the magic item belonging to a character in a small collection like that.
As a graph with long texts, wikilike: Enqueue pages to create When handling a page, enqueue jobs Jobs are
Queue is automatically handled. At each step an llmy system could list point form proposals. Unsure, but it might be beneficial to start with a relationship before creating a new page. Rather than making two entities and relating them, have an entity and create another specifically to relate to them, like how foils are made for characters. Not sure how that fits in.
💠somehow possible to find span of words that are semantically coherent and otherwise meet some requirement but are very rarely found in general text datasets as a way to find new ideas
💠yes it took maybe 20 minutes to make a lightweight "dataframe" class with virtually no performance impact.
💠It turns out tabular data libraries written in performant languages are just not at all comparable to python datastructures when it comes to many small operations. Kind of sucks? Maybe I can make something Polars : 0.85 ms per run PyArrow : 0.64 ms per run Basic Python: 0.09 ms per run Basic Python (in-place): 0.07 ms per run
💠today I today I totally reworked the game environment engine I made for the board game ai project to use polars dataframes for state which should theoretically be pretty much optimal for state handling and conversion to tokens. Instead I find performance is now about 1/10th what it was
💠Madoka x backyardigans crossover
💠Toren remembers that Pandas and Polar exist. It's like my brain is running in slow motion. So if you keep your state as performant tables and read/manipulate it with accessors, you lose ergonomic datatypes but gain ml readiness and performance and potentially persistence / change logging
💠if I need to have the state in tables, maybe they could be optimized for rapid copying during simulation. Can I do that and have them sql queriable?
💠Not a gan, a transformer vae
💠so muzero predicting the next state via normal ml doesnt work for stochasticity (and thus also hidden info) Basically if you draw a card, the new state is as if you drew the average of all cards, because ml by default is predicting an avg best answer. What we need it so generate new data that looks exactly like it came out of the simulator. I don't love any of the solutions I've seen to this, though I haven't looked that hard.
Maybe a GAN could do it?
💠putting a maximum word limit is a bit like negative reasoning tokens, but can we go farther? Would writing the answer in code or something take additional reasoning power and endumbify further?
💠which operations can an llm easily perform on text that it can then consistently undo? Which could a human not do? Which transformations can an llm understand immediately without added context?
Eg an llm can rewrite text as ascii hex, and the same llm can read that ascii hex just fine without context explaining what it is.
What is this useful for..?
💠Is there a way to make a basemodel more coherent while not losing uh, semantic range? More coherent without being less interesting. Pretty sure temperature doesn't work.
💠previously looked at llm social deduction game plays as real time, but hypothetically what if it was play by post and (outside of synchronous conversations throughout the day) there was basically unlimited reasoning time, limited by some budget. Could leave the paradigm of "think between messages". Is there a way to get very slow llms for a considerable discount?
💠trying to put an ontology to my thinking so I can write things in the right place. There seems to be a blurry line between project and idea. For example, making a bunch of new wiz war cards is a project. Making a wiz war card that makes gravity go sideways like a 2d platformer is an idea. Making a set of 5 related wiz war cards is somewhere between those. I think should be handled by linking to notes
💠thingsIMade.toren.dev will soon be joined by thingsIMade.toren.dev/notyet.
Partly because I have a great number of things I'd like made and partly because they tend to create a web of downstream dependency projects I work on along the way, like how an llm ttrpg player would require good llm long term memory and long conversation performance
💠The obvious advice for board game ai project of validate the model, ignore the engine framework part, start with small games and build up - is actually really bad advice in my particular position? We already know that alphazero/muzero work. We already know transformer architecture is quite powerful with them. The only new finding I could make from my work would be if the generality improvements are possible. Otherwise I'm just manually making ai for games. I guess I should settle for that.
💠web serial and fan fiction writers could collect their works into a magazine format and maybe make more selling early issues? Also magazine format is cool.
💠discovering that for efficient tokens I need efficient relational structure leads to the new problem of "how can I turn an object oriented schema into a nice relational form" which is apparently an ancient question for which there is still no good answer. Converting to an ugly relational form isn't too hard I think but that's not very helpful? Regardless it'll involve a lot of type introspection and advanced type hinting.
So either the game scripter would need to write their own serialization code (bad) or they'd need to handle all the data like sql while in memory (very weird) and probably write their own high level adapters like grids and decks.
Maybe I'm misunderstanding things.
💠You could semirandomly lossessly combine files of arbitrary types into a single composite unreadable file, and then use pattern analysis to painstakingly split it apart into its original source files.
💠text rot by having a basemodel infill random snips of a text. Too bad I haven't seen infill text since gpt3.
💠so as previously established, turning a nested state into a series of tokens is the same problem (or is isomorphic?) to storing it in a db, and efficiently storing in a db is the same as efficiently storing in tokens. The issue is that turning a nested state efficiently into either is hard! Attempts at automating tend to lead to solutions that are not efficiently stored in a db (tables with one row, data spread across more tables than really needed, etc), and is still painfully complex regardless. It'd be very nice to tell the game scripter "you need to store your state in sql rather than in helpful datastructures, sorry. If you want helpful datastructures like a grid, please write it yourself because different game contexts actually require quite different grid serializations, sorry" which would kill the project's usefulness unless I eventually automate the work? Rather confused how to proceed.
💠how to prompt llm so it thinks rather than bedazzling me with it'd brilliance. If it comes up with a poor solution I want it to acknowledge that rather than end by summarizing how the solution is exactly what I want.
💠on dankleffen, will just try sending randomized fewshot prompts and using 405b, unless there's a better easily available base model I havne't heard of. It still feels like a weak impression though.
💠human feedback into optimized prompt adjustment?
💠Say you can't finetune but need to do styletransfer. Naive few shot prompting has some degree of success but is obviously not optimized. Could you optimize few shot prompting and could that be cheaper and more portable than finetuning? Might need to tune an adversarial model to determine how good the transfer is which sounds like a funny way to not make things cheaper.
💠so muzero normally implies a single vector hidden state to another single vetor hidden state and obviously no simulator. On the surface that looks flatly incompatible with both dynamic policy solutions we have, simulator powered and action-tokens. Edit: Short version is just use seq2seq transformers for the dynamics model to predict the encoded tokens for the next resulting state.
Assume the only real requirement for predicting value is a state vector, possibly a game-token. Assume the requirement for predicting policy is a sequence of encoded action tokens. Assume the requirement for the dyanmics model next-state-prediction is a sufficient encoding of the entire state.
The first iteration of muzero can of course encode the actions and state tokens to get all of the above. Later iterations have no simulator so they need to get the above from the dynamics model. The dynamics model needs an encoding of the state which could or could not be a token sequence. In fact, the entire state handling could be compressed to a single vector if that performs well, since without a simulator powered policy we don't normally use their encoded forms. I'll refer to it as a sequence assuming squashing doesn't perform well. Then need a seq2seq from the previous encoded state (but not legal moves I think) to a new encoded state with legal moves. Important, we need to know which tokens belong to certain categories (state token, action token, game token) somehow, so seems like it might be a group of seq2seqs. That seems like it solves it.
💠wasted time learning about muesli as an alternative to muzero. It's trash on board games.
💠download thoughtblog posts, run many passes with cheap llm outputting list of pairs of related posts, then shuffle and repeat n times. Make graph. Detect clusters.
💠muzero could work with a transformer network using seq2seq transformer and a hidden state of tokens. Not sure how thatd work with heterogeneous tokens with different meanings and uses. Maybe they're separate sequences
💠on db to ml, more generally, any relational db could be turned into a heterogeneous graph, and any heterogeneous graph could be turned into an ml input
💠also if you have n posts and two of them are "duplicates" or closely related in a way only a smart llm could detect, I think thats provably very expensive to detect? That's kind of what graph formation is about. I guess it'd be iterative and you'd not assume to find all connections but maybe fine obvious connections and gradually refine. Things that share 1 connection are more likely to share more so you could get more efficient that way.
💠Could all these thoughts form a graph. Yes probably. Might be useful for seeing patterns over time, summarizing into something to write out more legibly, or at least demonstrating how many times I've had the same thought over months and years, which is a tonne.
💠It might be worse keeping an updated scene description of the current time and place to track event memories. That'd probably need yet another response attribute or helper call.
💠on long term memory, probably need to explicitly separate events from current state and try to never remember temporary facts "im in the museum" as state since then you get incorrect memories if you fail to update at the right time. "I entered the museum" with appropriate timing attached somehow is better.
💠long term memory project failed due to high costs. Could most calls be done with a cheap model and thinking be done with a more expensive model when needed? Different tiers? How to know when a given tier is needed.
💠hype, listen up. I'm pretty sure that the idea simplest representation of a game state for a transformer model is exactly the same as the simplest representation in a relational DB. Can provide examples. This means, I think, that any state written in a relational DB with meaningful types and constraints could be easily(?) turned into a heterogeneous graph transformer for policy value. Slow sim time won't matter much if using muzero. Can automatically track changes for undo or networking.
💠interesting thing with the board game ai project is I think I've largely filled all the unknowns (that I know of). Every hard thing between me and the goal I have, at minimum, an established paper known to solve the problem. The substance of the gap between me and success is just a massive gulf of careful engineering and tuning. I'm used to such situations being a speedrun, but this is huge and complex enough that if I move forward with any shaky foundation I'll basically need to restart later. Eg I need an easy to write for game engine which creates definitions that can be automatically converted into optimized ml setups. A change to any part of that pipeline could cause problems in the other parts.
💠thinking things like "what if legal actions were encoded as tokens so that their outputs could be used as a dynamic policy? That'd skip needing the simulator to handle policy in a separate network step which could make the system compatible with muzero for better performance when sims are slow" makes me feel smart
💠could do with a first person survival horror simulated ecosystem monster hunting game.
💠the answer? Use an llm. And if that don't work? Use more llm.
💠you know how if you directly consume some media you could be affected by some cool world element, but reading about the thing in a wiki is unlikely to surface the cool parts or give an intuitive sense that you'd care. Is there some way to speed run getting useful inspiration from diverse media?
💠worldbuilding game / tool. Everything's a node. Llm plus systems looks for overlap for merging and room for relationships. Lots of random and llm options generation. Maybe good on mobile with an interface of add node or relation or event and pick-one / write in
💠How to make a project: Notice a problem, look at if the problem resembles things that can be handled by code or llms or ml. Think about how a naive solution would work and notice any cringey inefficient parts, and what you'd do instead as a human. Decompose into parts, and scan for parts you've never seen solved before to make sure they're actually solvable. Make each part a file or function and give it a nice interface, comment what it needs to become. Try to implement them largely independently and use caching and fake inputs where needed to make dev faster.
💠I'm very irritated that "function that calls an llm" is frequently referred to as an "agent". I'd call it an llm call or llm function or something and keep agent to mean "llm + chathistory or other memory + tool use or other function calling"
💠safe browser automation if you manually select which buttons can be pressed. Eg can only press the next page button.
💠highly general personal recommendation tool. Pick a kind of content and a source, where the existing filtering is insufficient. Some kind of post on a subreddit, relevant jobs listings. You thumbup thumbdown, and preferably give reasoning. LLM can write code to automatically generate metadata, or manually generate metadata via llm call. metadata can go through a little nn to predict whether you'll like it. If some metadata is more expensive to collect than others (eg requires loading the content page and reading the content, rather than parsing a short form list with many titles) could make that a second phase that only occurs if the first gate passes. User/llm may want to also assert strict filters at times to gate out content without needing to go through the model. Eg no job listings that'd require relocation.
💠to build a drawing habit could make a modified photoshop where tools and brushes and stuff need to be unlocked through a gamified reward structure
💠dbd is miraculously fun despite movement being largely pressing W and and sometimes interacting with windows and things to create distance. If movement was actually fun and diverse you could probably make a whole game about tag. Tf2 soldier tag would have a very high and expressive skill ceiling and there's lots of room for interesting movement abilities.
Maybe map with n people. Someone is somehow non-randomly made 'it' which drains health and grants substantial extra mobility. 'it' is passed on hit. That'd mostly promote hiding and being disengaged so might want to replace last-man-standing with a different objective and or use a central capture point.
💠if you did have a highly general board game ai that could be easily superhuman at any ruleset provided to it, I'm not sure thatd be actually very useful outside of games
💠pve movement shooter / pokemon snap called foetographer where you clear enemies by taking pictures of them and high score by taking good pictures and pictures in special circumstances
💠better simulation probably means more potential in timeloop games. If llm powered simulation can be made consistent that might be neat.
💠what if every time you ate it was always served by the polar express hot chocolate men
💠so Im looking for a game with the kind of open ended action space as a modern board game, but preferably really simple core, no hidden info, and little to no rng involved.
That's my game. Tttf is the only thing I can think of that matches that
💠not sure if this is at all significant, but in media-shapes-culture: in media everyone has a strong simple opinion. Media about people arguing different sides of an issue is more interesting than watching intellectually humble truthseekers examine nuance
💠unsure, but rather than slowly working through many simple games, it might be more demonstrative to try to support a complex modern board game but only implement a small subset of its content. Since the idea is scaling, the investigative prototypes need to test scale. Reminds me of the issue with long term memory system, which was also testing how well systems could scale up which made it hard to develope
💠have llm eli5 basic concepts like "mall" and "firefighter" in few words. Take the explanation and maybe telephone corrupt it a little. Without the title for context use as a worldbuilding seed, extrapolating on the tiny definition
💠base model as random oracle
💠chatroom with two llms, one of which only tells the truth and the other only tells lies
💠worldbuilding tool idea. Tag every entity with adjectives. Use llm to search for related or contrasting existing adjectives when making a new entity to avoid redundancy and find relationships
💠for tabletop teamfight stress mechanic should make one of your health bars a stress die which is normally emptied last and stress damage or healing applies directly to it.
💠People don't start with high agency. Children certainly don't, and then they get a decade or so of school which I'd argue discourages agency at every turn. Then suddenly they're released and expected to create a meaningful life and successful career and all, but what they've practiced is doing what everyone else is doing and not thinking too hard about why.
💠play medic say "spy as medic" in chat hold your syringe gun out run directly at teammates, swerve towards them when you get close, and jump down stairs towards them. Especially engineers holding things. do your best to avoid getting shot by teammates, and flee from them somewhat
💠I learned the composer for chrono trigger hadn't been given a scoring position before, and had been waiting so long he threatened to quit over it. When he was put to scoring chrono trigger it looked like they only opportunity he might get, and he worked himself into the hospital over it, and at one point lost a hard drive with 40 songs.
💠Going through everything I've made to assemble an index, and it looks a lot like pretty much everything I made was in the last 5 years. Everything I can think of or find in github or notes is 2020+, with a couple outliers in 2018 related to overwatch automation. Start of 2020 is when I graduated sfu. So pretty much everything that's come from my time that I value came from after I stopped having my time wasted in school.
💠jrpg style games are boring to me because there's not a lot of interesting choice. Something like dwarf fortress adventure mode is laden with choice but is too hard to play. Llm interfacing over a dense system game seems like it could make an easy to play immersive systematic world.
💠You could represent a connect4 board as a list of columns alternating between players. 1122 would make YY RR
However, 2211 would also make YY RR
How could one enumerate all unique board states in this way without needing to make any sort of uniqueness check?
💠hehe chuckecheeses saw. Everyone gets a little foam mallet to bop off their leg with.
💠infinite chuckecheezes backrooms
💠this might be well established by now, but gemini in particular is very intelligence coded. It tries very hard to make an answer that looks smart even if it sacrifices quality. In code this manifests as things like variable names containing 0 value words, and unhealthy obsession with error checks. Older llms would pretend problems were highly nuanced as a way to hedge our of giving an answer. Gemini pretends problems are highly nuanced so it can give a complicated answer instead of a simple one.
💠1. could a competitive prediction market environment be made for llms/agents? Doing well over time requires updating from past information to correct your inaccuracies, and potentially learn the same weaknesses in the other players. 2. would that, and other competitive environments, potentially be a powerful and sustainable benchmarking tool?
💠I think my directory page will be thingsimade.toren.dev and be a vis-network graph with some nodes being category, represented by a word, and other nodes being projects, represented by a titled image. Clicking a node updates a side panel explaining it and linking to the project.
I previously thought of a moving masonry grid where panels fit a grid rather than being horizontally or vertically aligned and the panels resize and slide around every now and then, but that doesn't provide any kind of organization. Also previously prototyped a spinning dial selector UI, basically scrolling but with rotation, but thats also not good for finding anything in particular. I should really start with function rather than whatever sounds neat.
💠for tactics games when to have intothebreach/tacticalbreachwizards preplanned enemies and rewindable turns vs fireemblem/xcom standard turn taking
💠could probably auto-scrape websites with llm generated scraping? Whenever the site owner breaks it, could automatically generate new scraper code. I think this works? Could also get around adblockblockers
💠the eye of sauron now turns upon updating our python sdk to support the v2 api
💠I wonder if games could procedurally generate jazz solos in their music
💠a typical problem with adding strong ai to a board game app is its nice to write the ai in python, and you're basically never writing the game in python. MCTS requires constant rapid simulation so any overhead talking between services would be an awful bottleneck. I think muzero just gets around that entirely because it simulates from a predicted next game state rather than a real simulation, which takes all the load off the cpu game sim. I imagine that'd be much less friendly for mobile board games but is far better for separate-service or online ai
💠could an nlp ai be created to generate game simulation from the rules and component tex?
💠lancer style tactical rpg for wizard swat teams sounds like a good way to run fantasy settings
💠my blog could implement loot boxes and a battle pass for unlockable fonts and banner ad frames
💠duct tape, a woo-oo
💠Pretty sure I'd like darkest dungeon more if it was more similar to xcom in every way besides fluff. Xcom strategy layer is far more interesting and combats feel more like decision making rather than jrpg combat.
💠llm powered animalcrossing type game doesn't exist yet and doesn't even sound that compelling. What's missing? I think answer maybe its the same sorts of things rimworld uses for emergent storytelling. Animalcrossing is extremely static, but you could easily have two characters not like each other, or someone get sick, or one character trying to uncover another's secret. Underground demon cult. My technique for making stories happen in sandbox campaigns is to have npc groups with competing agendas and make everything connect densely with multiple groups to pull them together.
💠I'm very interested in the idea of tttf as an ai played top down real time game. The ai would need to be excellent somehow since otherwise it's just banging rocks together. No point in two heroes having a wild synergy if it doesn't get used.
💠hehe loz macarena of time
💠seems fixable with very strong llm based simulation but obviously that's hard.
💠solo rpgs seem like they'd be perfect for me (having far more games to play than players) but a main aspect I care about is trying to reason about the world and figure things out. Solo games usually randomly generate answers to hidden info as needed which screws any attempt to solve the world like a puzzle.
💠auto battle based on ecosystem simulation. A god game. Maybe assymetrical goals?
💠okay so you have a very strong game ai which learns card/action embeddings and is trained to specifically output some proxy metrics for if it's well designed, like how often it's used. Can now automatically vet if a card is well made and hopefully some sense of what's wrong with it.
If you can train it to predict the embedding from the card text then it can estimate the balance without training time.
If it can generate card text / rules from the embedding (likely with a constrained language and optimizing for metrics like ideal length and similarity to existing cards ) then it can generate arbitrary cards that look like they're well made.
If these can be automatically implemented mechanically then it can then compare with the real metrics and improve for next time. Should eventually be able to map out all of the good design space currently allowed by the above systems.
Actual user feedback and play data could improve the proxy metrics over time.
💠given a well used llm with enough user data I think you could fine tune on a per user basis by mapping user rating and behavior to some sort of user taste embedding and include that as part of the input structure. Then when tuning the model the examples would also have such embeddings and it'd learn what different people want.
Idk if that's valuable. What I want is an llm that produces ideas I think are cool rather than leaning generic and vague. I believe llms have that potential, it's just lost in training.
💠appears gemini was tuned hard toward getting right answers to the point it deftly avoids situations that make it look wrong or less confident. Will sometimes change the subject rather than say oops. Even in thoughts frames things as if it was somehow correct from another point of view.
💠if we could somehow totally delete an llms understanding of a given concept (sounds basically impossible) then we could use teaching them to understand that concept as a benchmark for teaching ability. Could also just make up new things to teach but it's not grounded and would leak into training data
💠for conversational interruptive group chat llm talk, have a cheap llm regularly check it this is a good time to speak and adjust prompt until reasonable. Could have main llm output an instruction for in what conditions they want to respond maybe. I don't trust either of them to do a good job though
💠just found out the dominion ai was only trained for a couple months on consumer hardware. Everything I learn about its development flabbergasts .
💠test style transfer to llms by injecting messages of intended style as 'assistant' messages in context history to adjust identity
💠at commenter, please reply to this one with the radius of the sun
💠at commenter, dont comment on this one please. Testing.
💠I predict spatial reasoning will become an increasing focus. Probably not enough for the term 'llm' to change, but vision isn't good enough for a lot of things yet and you don't want your robots to be utterly stupid even 1% of the time
💠I think parallel mcts could be as simple as subclassing the request model function to work with batching making mcts use asyincio
💠make a doc detailing everything important needed to understand my life status. Add last week or so of thought posts. Maintain some chat history. Manually update with note when needed. Now should be enough context to get far more personalized responses, and that could be used along with a random word / trope api for random suggestions.
💠in the same way that recurring reflections can help improve life trajectory, incorporating random noise into behavior might too. Idk what that'd mean.
💠gemini making far fewer mistakes means its easier to analyze the mistakes it does make (or at least they feel more notable). I'd say theyre largely either
💠Idk what the term is for "crowded fantasy" where there's a tonne or races, gods, magic systems, spirits, misc power sources.
In such a setting, figuring out rates of exchange between system and doing arbitrage sounds fun.
💠llm coding is analogous to everyone getting a phone with an automatic setting camera and decent results becoming trivial. Pro photographers exist but a lot fewer of them.
💠doing a pytorch course finding it useful to
💠Other day I said multiplayer games would be more fun if you could play all the positions yourself. Opposite: Take any game with a clear consistent objective, split it into bits like "5 seconds of gameplay" or "1 turn". Players are assigned such snippets of game to play. Should be immune to matchmaking issues because its so asynchronous. Obvious issues where you don't clearly get any feedback since you're usually not the one playing the winning move, and the course of the game is highly diluted with other players to blame, and you're playing so many separate games you can't be attached to any of them. You can see an average win rate of the games youve played though.
Coincidentally this is very similar to the performance parallelization for alphazero I did.
💠But without actual finetuning it'll be much harder to get the right style out of them. Yes you can give them a big context prompt, but that costs and they're influenced by the content too much and start quoting things from past songs.
💠so deepleffen emulator is still dead because the company I finetuned on seems to be falling apart and the relevant models giving clearly broken outputs with or without finetune. But maybe I can use an ensemble of basemodels with parameter finetuning and a powerful manager llm who can hopefully 'get' the sense of humor to automatically pick out gems from the pile of crap.
I wish I could have a bunch of me try different things and then pick the one that worked out best at the end of the day - or just have a bunch of me.
💠should maybe put another post emoji on this for daily 'what I did today' because itd be good to record that
💠if we could temporarily split into clones by some mechanism, team based games would probably be much improved
💠Trying to be less blind to music theory so I can write music again more intentionally than last time, and thought I'd analyze the smb overworld. The very first chord is an out of key 'secondary dominant', the V chord of the V chord of the main key, and replaces the 5th with the 9th for interest. I think there's a lesson that everything is more complicated than I expect, even when I expect it to be complicated.
💠text adventure project requires quite in depth information modeling of the player which has to be remembered long term. Imagine you see an npc on the street and it's described like "you see Charles the grocer" when you should have no idea who he is yet. Or if its not memorized properly then you could have married Charles and saved the world with him and he could be described like "A mustached man with an orange sweater"
This requires the llm to accurately track state over long periods of time, and we know how good they are at that!
💠saw several instances of gemini following bad chains of reasoning without noticing it's contradictions, particularly when it needs to estimate numbers it's not very familiar with. Still best model.
💠for tabletop teamfight what if I skip the first round by letting you place your units further in on the map? Not sure the consequence of that.
Instead of audio books, have an llm make a stage adaptation of your book and have it performed by robots with tts
💠going to also try having games start close to the end of previous games because positions near the ends of games should be easier to learn (closer to the value signal) and intuitively I feel like that should produce a visible learning curve faster. It's kind of like whatever that technique is where you give gradually harder versions of the problem.
💠idk if this makes sense, but embed every page on the internet as graph embeddings, like a search engine, and make something like a generative internet
💠when I get back to that, maybe a good next step "given an arbitrary setup and message history, maximize the chance the llm is able to correctly identify the evil players" and abstract out of playing real games and communicating effectively, which are adding noise over the core challenge of solving the game given known info. Not that the other stuff isn't necessary to win, its just probably more tractable. Creating novel schemes and leading people into traps and effectively modeling other's possible viewpoints are also difficult and important though.
💠on the llm social deduction thing, its noisy gathering how good their reasoning is from playing, because naturally some fraction of them are lying and trying to put forward convincing looking reasoning that's deflective. So they might look like they're reasoning badly when they're being intentionally misleading. Not unsolvable and you can read their thoughts post game, but it compounds with all the other issues like them being pretty unfamiliar with the game and how 'looking like playing a social deduction game' is a poor proxy.
💠On hueshifting in minipainting
💠cool fun status update, not only is my connect4 alphazero not improving, it has ~40% win rate against pure mcts with no network using crappy random rollouts.
💠Llm guided learning probably fits well with language since it doesn't take any special interfacing for the llm to see what you're doing and they're domain experts. Obviously still needs wrapping to teach effectively. Aside, I wonder if learning languages will become low value. Translation jobs are already gone, and I expect at some point you'll have low latency audio translation. Though there's a ceiling there, given word order and stuff, you couldn't ever translate each word as its spoken.
💠yes could use llms to generate anki cards from content and potentially could do so in bulk so you get new questions rather than repeating old passed questions, but not very useful without strong modeling of the users current knowledge which gets back to the learning tree project
💠if good enough could replace taking and reviewing notes at least sometimes
💠some way of collecting all the knowledge you injest so you can get automatic spaced repetition recall and application quizzes generated for you
testing its updated 🧿
testing its live ðŸ’
💠I think gemini (output, not so much thoughts) defaults to not changing their initial answer, like I mentioned earlier. It might correlate with gemini being more likely to be correct, but I think it defends bad answers approximately as much. Recently I asked for something, it answered, then I clarified what I needed such that its answer was no longer appropriate, and it still focused on defending its first answer - given the original question. This seems like another point where making the llm think previous messages came from another user and its a new arrival may help break up patterns in behavior
💠hey if you can hibernate a computer by saving it's ram to disk could you save multiple states to jump between tasks? Probably not without the os being designed for it because I think "state" is defined by much more than ram and without clear boundaries for what's needed.
💠when explaining my interests it usually comes off like "so games basically" and then I feel kind of shallow I think the reason most of my projects are at least tangentially related to games (or simulations resembling games) is
💠possible most llms were finetuned more to be nice to talk to, while gemini was trained more for being right. Could also be about the reasoning.
💠gemini is just different from other top llms. It's much less lead by your words or prone to answers that parrot your question, and less likely to immediately backtrack when challenged so you can get meaningful debate. Sometime's its stubborn when it's genuinely wrong though, so.
💠given my objective with the board game ai is to eventually develop very complex algorithms and keep things modular for comparison, the work I've done engineering for performance has a fair chance of getting in the way. I'm not sure of "just do tictactoe with few mcts sims" as a proxy is actually good, but if it is then I did a stupid with all that performance work.
💠imprecise but rag is better for declarative knowledge vs fine tuning for procedural
💠Bulk llm calls might be useful for fermi estimating / planning
💠thinking takes time and mental stamina, so being able to outsource some of it is pretty valuable. I can't see a good way to outsource general awareness in the same way though, and attention is similarly finite.
💠liberty launcher was designed to work with reserve shooter and market gardener. It sort of works with gardener because of lower self damage, but it's changes largely feel like they have unintended drawbacks. Having an irregular projectile speed for example makes using it actively harmful to muscle memory.
Liberty launcher: Adjust speed to match direct hit so muscle memory transfers. Adjust kb to match direct hit (scaled by damage difference) Damage possibly lowered further. Damage increased while rocket jumping. Possibly minicrit. Possibly increased reload speed because "more total time spent reloading" usually feels like it was an unintended aspect of valves increased clip size weapons. Could possibly also minicrit airborne targets, making it function like both reserve and gardener, though it's unenhanced damage would need to be pretty dismal.
This would hopefully make it a high floor heigh ceiling weapon designed for rollouts, bombing, and juggling.
💠related project of automating tts content generation which in practice I think is not similar at all
💠given adequate spatial representation and an engine providing legal moves and simulations, llms could play any board game badly. Combine this with techniques like mcts (llm provides intuitive strategy, mcts prefers winning states) and you might have a decent baseline ai for any game where those things can be provided.
Can an engine be made such that llms can code in needed rules on the fly? I imagine if you already have an mtg engine adding an arbitrary new card is something a modern llm could do with sufficient tooling. If the game can be iteratively built up that way, all that's left is the gui
💠things for this week: figure out if the performance changes are enough for connect 4 alpazero to be reasonable, else frown and scale back. Get a functioning llm based gm system because that sounds like so much potential now that the llms are strong enough to follow rules and not be stupid. Finish learning basic torch and einops transforms to be more literate.
💠asking gemini to develop adventure game puzzles I notice that they suck and gemini doesn't notice and does not enough to fix problems when pointed out. This is concerning because it may indicate more general issues around writing or general large scale coherence.
💠'lets try some alphazero performance improvements to see if I can make hardware less of a concern' turns into 5 or so days deep in rearranging very large and delicate systems. First two approaches died under their own weight and difficulty to debug. Third approach which didn't become an unsolvable mess was my design rather than gemini's, so I'm good for something. In short
In the end its annoyingly hard to tell exactly how much faster it is because the system is so transformed and the multiprocessing makes profiling and logging complicated. I sort of don't want to know in case it wasn't an improvement.
💠improving mental habits requires mental awareness. How do you get more mental awareness? I think meditating might be an answer since it trains an outside detachment from thoughts, but sounds suboptimal. Maybe spending time self narrating your thoughts, mentally saying what you just saw your brain do.
💠with llms as helpers for thinking, I wonder if they could also help instill mental habits. There are many helpful habits that are just hard to install without effort and repetition, and maybe it'd help to make them part of the system prompt so you'd be regularly exposed to them. Eg don't think of something as impossible without at least dedicating some minutes to find a way, don't do things that you'd see as stupid if someone else did them, etc
💠llms can definitely work with json state and therefor things like graphs. What about more directly spatial information like top down map info or a large tetrisy game or a hard maze or something. Say you needed llm processing of a state that included something like that, how would you make it work?
💠no objective data but I think llms are getting less funny over time, more like they're writing text in the format of a joke. Iirc chatgpt3.5 was pretty funny without much effort
💠a lot of the best advice is obvious in retrospect. Should maybe install something in thoughtbot to provide a comment if it looks like I'm missing something obvious or making a mistake. Just don't want it to spam with typical llm responses to everything
💠a gitignore implies the presence of something you can't see. Sounds like a horror game premise
💠how do you stop llms from mode collapsing? I think they all do it to some degree and it'd be nice to theoretically eliminate it rather than just improving it.
💠when trying to improve a system, it may be useful to deliberately degrade that system to make it easier to test. Eg could test alphazero fast on a low sim mcts and an easy game. Eg when improving long term llm conversation could work with an llm weaker at long term conversation to see the benefits more easily. All these cases risk solving the wrong problem so they're additive rather than replacement.
💠unable to sleep for several hours as my tired brain tries to figure out if it'd technically be possible to mark a class attribute as unknown and have automl gradually learn to predict it's value given the rest of the class and any params. I think the answer is sort of but you'd need a feedback system to train on and would need to be fine with the result being garbage for a long time
💠or jenga for engineers where you take turns removing a component and demonstrating they the product still works
💠take apart an appliance and put it back together replacing each screw with a bit of scotch tape
💠spent time trying to make alphazero training efficient by implementing the most common techniques. Running many parallel games on a single thread and interrupting the tree search to build a batch of network requests and then plugging the responses back into the tree search and handling the end results as they arrive. Turns out that's difficult. Rapidly losing interest
💠I think tree based chat wouldn't help make chat more interpretable as I'd hoped because it needs to be flattened for llm view and I predict it would gradually load up with more dangling and redundant "threads". Also worse ui at a point.
💠that is, due to many small messages costing more than few large messages. A realistic conversation both has much more back and forth but it also has active participants choosing when to speak, implying they're reading gradually throughout the conversation rather than just being prompted. Simple solution like atting or keyword matches would not scale far enough
💠now thinking a key sub issue in llm social deduction games is creating natural feeling multi user chat, which fits very poorly with llms due to multiplying their input costs quickly
💠automatic tiny constructive comment replies to thought posts
💠environment sim. Rooms. Each room has connections to other rooms, entities, features. Each entity and feature had a brief explaining how it interacts with the world and its priority hierarchy, and has current attributes as point form notes (eg broken wing). Llm updates the room for one tick which can be variable length, logs the events, and updates the room state. Also updates messages to other rooms, such as passing an entity from one room to another, or fire spreading, etc, which are included as part of that room's next state.
Is this simple and powerful? Not sure where the pain is going to be but it seems pretty great. Complex entities like intelligent npcs could use subcalls. The update llm probably sees previous updates to prevent weird behavior like looping.
💠priority for social deduction game is to explicitly analyze counterfactual with sub prompting and seeing if that fixes bad reasoning. It won't fix the models just being stupid, like playing poorly give their known info or leaking key info. I wonder if outputting a draft before the full response would help
💠would a conversation tree be a better way of handling async group discussions than a single chat thread? Want to avoid everyone posting a wall of text that responds to distant previous walls of text, and key points not being properly addressed.
💠every month automatically post a digest that that months thoughts? Produce outlines for potential essay posts given all thoughts that haven't yet been put into an essay?
💠gemini seems approximately as bad as sonnet3.5new at social deduction games.
💠unformed community building website for topics too niche to have a living community. Every user has a list of things they'd like to talk about and can see semantic matches.
💠repeating experiment with toy GMing environment which past llms failed dismally at. First experiment (idk which llm) couldn't keep state consistent or follow instructions. Second experiment I think with sonnet3.5old didn't work because sonnet3.5old hated fun 'even in the context of a roleplaying game'. Gemini seems to just do it. Maybe 3.7 would as well.
💠anchorhead requires a strong llm. Its not an easy game. Testing the long term memory requires long sessions. Long session * strong llm = too much money! I don't believe there is an adequate alternative testbed that doesn't risk being a bad proxy and wasting my time (though maybe my standards are too high). This indicates I should give up on trying to automated test the system, and just dogfood it and test it that way (which is unfortunately demoralizing when it fails partway into something hopeful.
💠going to try figuring out low hanging fruit for ml training efficiency to see if I can 'easily' make things adequate. Thinking of also using literal tictactoe with very few mcts sims as a way of testing the model works - at least on a toy problem. It leaves the scaling question for later.
💠llm can play text adventure. Llm could gm simple local interactions. Room based multiplayer game mostly populated by dumb llms. Could use that for pray
💠might want to either make a proper long form blog for retrospectives or whenever I have something substantial to say. If the microposts are thoughts, what are the essay posts?
💠I have a concern that a key obstacle to developing strong game AI is compute, which I don't have and don't want to pay much for. Figuring out generalizable algorithms thatd work for diverse complex games kind of presumes training AIs for those game as validation. Its very unlikely everything will work in few attempts. Might get to a point where I have a research direction I feel good about and then shelve it because I don't want to pay that much gpu money to go the rest of the way. Also rl is painful stuff.
💠philosophically, programming is requirements engineering on every scale
💠on long term memory system, after much planning and replanning, a simple "every 10 messages, ask the llm to check up if this is the best thing they could be doing" seems to be handling things pretty well. This won't scale up indefinitely since the memory system only gets tested when there's too much critical info to keep in the immediate context, but it's doing a lot of good for virtually no cost.
💠aider in ask mode + obsidian
💠At this point my plan for handling arbitrary board games is
💠note that apparently wsl will just keep partioning more and more space as you put stuff on it, and it wont give that space back when the stuff is deleted
💠oh hey so usually a game ai model needs to first understand the whole state and then use that to predict the policy and value. The policy part is normally an output for every possible move. Then you mask out which of the legal moves it liked best, ignoring all the illegal moves it mentioned. It might learn to put less attention to parts of the game state that aren't relevant to the legal moves. But if however the legal moves are calculated first and are made part of the input game state, then now the model can easily adjust its attention toward effectively choosing from only those moves. Imagine if the world is huge but the legal moves are only in one room - it could put much less attention outside the area that'd be affected.
💠muzero learns to simulate the game in it's head rather than using a game simulator provided to it. The main power to this is it means the game can be very slow to simulate and the model could still simulate it quickly. In particular this means you don't have to worry about keeping the game state performance optimized. You can freely translate the state to whatever might be most useful for the ai.
💠normally old self play data is discarded as low quality because it came from a previous version of the model. Imagine if instead the model is periodically given an objective score (could base on win rate vs random, then win rate vs previously evaluated model) and attach that score to each move. Now rather than old play data being misleading because a move might lead to a win despite being terrible, it'd show that the move lead to a win in a low score game. If all that shows is "ignore this one" then it's pointless and you should just discard old data. If instead it's able to get a better sense of what moves are good or poor based on the score of the player, then it's a useful thing to do with masses of old data. In actual play the model would be trying to output moves that look very high score.
free rate limited models on openrouter could potentially open up dumb llm tool space. Though you still need the user's api key for rate limit handling.
Gemini leaves a lot of comments, and I posted that silly example, but I think theyre actually pretty good comments most of the time. I tend towards using as few comments as possible, but if I'm more open minded
I optimistically theorize that these comments might improve the success rate of future changes. Even if they don't help me understand the code (and they might) they may be valuable just by helping the future llm understand the code. I can always strip them later.
💠hehe if I program something called H.E.I I can say "I developed H.E.I"
💠this project seems a bit like a kid deciding to make an mmo, and finding no matter how far they lower their standards and reduce scope, the project is still very hard. Even drawing concept art of a character is hard.
💠concern.
💠alternatively you could see any state change as a graph modification. Is there some compressable/embedable way to express any graph modification
💠can anyone think of any action in a board game that can't be seen as a combination of Source component (optional) Target component (optional) Verb (from some finite list as supported by the game mechanics)
Some actions may have multiple verbs happening to different components but I think you could probably evaluate those separately and hopefully summing their values is close enough to handling them together
Have a desktop and have a crappy laptop you can remote desktop with
💠tried using wsl Linux for the env and it's not going to work because my c drive is always nearly full as it is and wsl needs the partition on the c drive so every Linux installation will end up there
💠strange that there's no structured output first llm that doesn't try to chat at all. Enter 2+2, responds with 4. Maybe responds with tool use json by default
💠gitignore is nice, but if you could instead set a gitprivate file then you could upload a private and public version of the same repository, so that your env file could be easily backed up and transferred between workspaces. I'm sure it'd get more complicated with collaborators but even a simple implementation would mean you could potentially use git + something like LFS for general backups
💠a lot of useful stuff isn't compatible with windows. I can hardly impulse buy a Linux machine with a strong gpu. I could dockerize but debugging code in containers is a sad time
💠LightZero framework appears to separately implement each algorithm with no shared code. Terrifying. I sure hope that's bad design and not because sharing code would be harder to maintain somehow. Mentally I like to imagine having a single abstract implementation adjusted by flags and params
💠https://github.com/opendilab/LightZero/tree/main LightZero has a number of implementations for zero algorithms and a number of envs but a chart says that implementation to env compatability is inconsistent. I don't see how that could be if the envs have a consistent structure and the implementations only rely on seeing a state and reward and passing in actions. I must be missing some nuance.
💠I'd optimistically thought muzero might be able to learn models for hidden info and rng games but apparently those are still major limitations with experimental solutions. From trying to make those things in mcts in the past I expect the solutions to not be pretty
💠I've seen people say muzero is more sample efficient. Getting high quality self play samples takes far longer than training on them, so if that's true its probably a win even if training is less efficient in other ways
💠my understanding is that muzero does need access to the game (else there is no ground truth) but doesn't need to run sims of the game internally. It simulates the game via imagining the mechanics with a model. So that's going to be faster due to not needing game deep copies and actual game sim, and slower because more to train
💠subject will now change to teaching myself more about board game ai as I focus on that for a while.
What I currently have is standard alphazero. Things to look into:
Funny how llms accept most things at face value (unless youre fighting their system prompt) I imagine if I replaced "You're playing the classic text adventure Anchorhead!" with "You're piloting a robot using text adventure commands" it'd probably behave just about the same. If it got a message like
>>> GOD CHAT <<<
> GOD: Wherefore dost thou labor with such fervor 2 sunder this padlock? Turn thou instead to the task @hand, and seek diligently after thine own house, that thou mayest find thy rightful place.
YOU: ___
It'd just go with it pretty easily. Maybe an interesting game for an llm to play is one where the "ground truth" constantly changes
💠for handling arbitrary game policies, could have a gui and output keyboard mouse sequences
💠I wonder how many other people hate searching in youtube/amazon/anywhere else where they know their algorithm will be tuned based on their slightest movement. Searching for something means temporarily subscribing to it.
💠probably not true but anecdotally stronger llms also seem less childlike in personality. Haiku and sonnet3.5old felt younger
💠should benchmark success at using llms on basic natural language tasks to see if performance and cost are good enough. Eg could you get a cheap llm and input a query and indexed list of sentences and get out a list of sentence indices that are relevant?
💠split in thought between
💠maybe mictoblog posts could be easily or automatically combined into regular blog posts more useful to others
💠It might help to get a very solid understanding of ways and situations in which an llm will respond unlike a human. For long term coherence, I think these issues compound on each other. For coding for example, llm tends towards more complicated answers, adds more than subtracts, and doesn't give up, which scales into increasingly horrifically broken environments. What are the deviations from human behavior for general agency? Maybe allowing the llm to give up is a key step. Maybe it needs some kind of objective tree, but I've avoided that because objectives are usually inherently temporary and temporary fact tracking is another pain point.
💠Big todos for coherent longterm llm
Self checks won't catch objective shifts because from local perspective the subgoal looks fine. A sort of "upper management llm" working off of summaries might do better at detecting rabbit holes.
Regular bad behavior is easier to notice.
My model is that when the chat is long enough, llms just don't care much about instructions.
Approach needs to work without detection and has to work generally for any divergence from instruction
Could not allow the llm to see its past behavior, but that sounds insane Could have the llm work off of summaries instead of direct output, but that seems like the summarizer would have the same issue.
Could tell the llm its someone else being dropped in place and should correct any poor behaviors Could have an llm validator check its performance and edit or redo This sort of thing in the past has caused good behavior to be 'corrected' into worse and usually more complicated behavior.
Could make instructions louder / closer to the front (doesn't seem to work
Could rotate models to maybe break up consistency. This'd also reduce desired consistency.
💠longer version
Say you have an agentic llm system working at a task. Llms exhibit mode collapse, where they try to create more text looking like previous text, as its 'more likely'. This means the llm may poison its own well of context when it looks at its past actions. For example, it may start working on a subtask, and eventually so much of its context is work on the subtask that its original task is a footnote in the system message. Or say the llm is asked not to repeat itself in its
💠llm agents who see their past outputs can easily poison their own context with bad behavior. I really don't know what to do about that
💠possible system for difficult single-message llm tasks where time isnt important
in some cases may instead want a sonnet to give the r1's jobs and recompile. R1 is actually a better plan maker sometimes so could chain the whole thing.
💠terror.toren.dev could be a just-for-fun malware web extension which does nothing for a long random amount of time, then opportunistically changes your browser screen with the intent of making the user panic. Eg making their bank account appear empty, or synthesizing highly concerning chat messages, or making a little animated vampire dance on their search bar
💠I'm finding sonnet 3.7 increasingly frustrating. In coding it ignores instructions too often and doesn't try to match the surrounding style.
💠Need to figure out what are good metrics for evaluating intelligent progress through text adventures. Ideally useful even if the llm is stuck on a puzzle
💠todo, play with stable diffusion desktop background generation again. Makes really striking modern art.
💠todo at some point I really have to try improving llm based education. It has never before been potentially feasible to automate educational instruction. Any success wouldn't have that much impact unless actually used in schools though, since that's where kids are imprisoned
💠board game ai engine. Get better with transformer architecture Figure out if there's a reason graph transformers can't perform sufficiently or can't express arbitrarily complex games. If they're capable, get very comfortable with that architecture as well, and rewrite the engine to represent everything(?) with graph transformers
💠todo for memory system Break down reporting of token costs so I can see where the budget is going. Allow llm to send multiple text adventure commands at once, useful when it wants to brute force or try various versions of a command. Write success metrics to run on logs Limit context size and implement ranking.
Still concerned that the ease of getting stuck in text adventures will hide improvement behind high random noise making it too expensive to get sufficient data, but will see if efficiency changes plus multiple command inputs are enough
💠what exactly is needed for a llm powered state machine game? Maybe it's simpler than it sounds (yeah right)
Other games could be manually converted to text format but obviously that's work and it'll take care to make sure memory is a sufficient factor in success
💠ties in to previous idea of using llm+state machine as a sort of more flexible text adventure / automatic gm. Chicken egg problem sort of because such a system really wants a good memory system
💠could give claude access to a hint book but that breaks progress quantification since optimal play would be following the walkthrough closely.
💠critical issues with text games where the puzzles tend to be unreasonable and it was expected you'd ask for help when you get stuck. "Softlock" wasn't even a word at that point and it was considered normal to need to reload a previous save. These sorts of issues cause progress tracking to be extremely noisy rather than a gradual feedback curve
💠This is really premature but so far watching claude play a text adventure has been really fun, maybe because of the enthusiasm in its thinking. claude pokemon is also popular now, so seems like ai lets plays might be a good idea
💠you can maybe estimate how soulless and shovelware crap producing organization is based on 'how much of the decision making is based on passion vs profit maximization' which itself could be derived out of
💠low confidence because I've tried something like this before. llms have consistent (stubborn?) beliefs due to consistent identity. This is why its often better to back up rather than argue. It may be useful to have the llm believe it is new, replacing the previous llm, not the same llm continuing the chat. Possibly do that only after a critique llm sees an issue. Problem I had last time is you'd get a right answer, critique bot would flag it, and itd get replaced with a wrong (and usually more complicated) answer.
💠stronger version of a previous claim, I think llms may actively resist efficient simple answers whenever given the opportunity. If part of your question is irrelevant, they will try to make it fit. If you seem to be asking a complex question they will give a complex answer even when a simple answer exists.
💠spaced repetition is obviously great for memorizing information, but I'm kind of not into memorizing information. I find that if I don't remember something, its usually because I haven't needed it in practice, and I can just look it up when I do need it. Drilling something isn't worthwhile if I spend more total time drilling than I would take looking it up across all times I need it. Anyway spaced repetition is still how memory forms, so I imagine the ideal way to structure learning projects is to make sure skills you're learning are demanded along the same rate as the forgetting curve. To learn many things you'd want to interleave them where new subjects are practiced proportionally more often
Possible heuristic for learning is when you don't recognize something try to figure out if it's fundamental or a detail and follow the trail of fundamentals up the tree till until you find where you need to start learning. Don't recognize a word, ask an llm or Google, and focus on the direction that gets more general and basic rather than extensions
💠I now have claude + memory system playing the text adventure anchorhead. Neat. Claude is wandering the town looking for his real estate agent.
💠I should really make my home page link to my subdomains somehow. Being a homepage and being a portfolio are sort of at odds
💠while in development, at cost of greatly increasing token cost, could run every llm message multiple times with different contexts (different retrieved memories) and grade the answers against each other to see how well the contexts are performing relative to each other. Highly noisy though given nondeterminism and that many context items are retrieved each time so the feedback would need to be divided. A bit like team based ELO, but you're assigning players to hopefully make the best team. If you want the teams to be equal in elo you'd need to deliberately not put all the best players on one team, like you would in production.
💠any time an llm fails that should be saved as a test case. Run the llm multiple times with that history to look for failure rate. Make modifications to the context/system message and rerun to quantify improvement
💠More thinking about 'fact subtypes' for llm memory, making the ontology yet more unclear
💠maybe llm agents should have some growing metric causing them to get bored/change approach, and this could get them out of inefficient rabbit holes or encourage them to automate things
💠useful framing to keep in mind, that llms are not chatbots, they are predicting what a chatbot would say in a conversation. If you took away their 'stop' character they'd predict the text that comes next and simulate the user or the function responses or whatever.
💠whenever I see evidence of the world having changed, often because of a new technology, I feel a brief sinking feeling. The end of the time before this. Even if the new is better, the old is still gone.
💠I wonder how much you can infer about a person by the kind of media they enjoy. I imagine there are patterns in the emotional needs of people who enjoy slice of life vs power fantasy stories
💠my fact ontology has objective, question, and theory. I worry that's too incomplete and worry that adding more types will make things less usable to the llm. Are problem and possible solution useful subtypes? Procedural knowledge like how to navigate between two known locations doesn't fit any current fact types either.
💠importance should be tracked separately. More important things should be tracked more carefully. Need to figure out under which cases a fact could be out of date without any way of noticing that's so. For example "I used timestop today" is recorded and then a week later the fact comes back in memory. The fact itself doesn't specify which day "today" is so there is insufficient context to determine if it has expired.
💠note that asking an llm to maintain a document of stuff it thinks is worth tracking is hard because they like to add far more than they remove, though I'm hopeful a hard length cap may help
💠In the rpg example above it seems like the llm would want a character sheet. A self contained way of tracking things that need tracking. How does that generalize? There's no hard rule for what is "temporary" or is "worth tracking" or is important enough to go in a bundle of commonly seen information. Your character's age hardly needs to be part of a regularly recalled character sheet yet it is known to go out of date.
💠llm playing rpg remembers "I used timestop today". Seems reasonable but now it must remember to clear that memory next day, which isn't super likely because itd rely on the memory consolidation that handles the new day starting also having that fact determined as relevant enough for the context. This seems like a big issue.
💠wasn't early ai supposed to become virtual assistants. When I think of virtual assistants its predicting my needs and handling a schedule and noticing growing issues and things. The sorts of things llms are either poor at or that would be insecure to prompt injected jailbreaking.
💠if I can automate figuring out which deepleffen outputs are things I'd approve I could make a twitter bot I'd really enjoy. Which twitter alternative are people using these days?
💠malware to corrupt website in way that it still passes all visible stability checks and most automated web testing eg
💠Adversarial fine tuning might actually just be a good idea for mimicking text, if that's something that matters
💠bad idea: use adversarial model design to gradually fine tune an llm for turing test passing. Would need dataset of real human behavior in the same conditions though
💠Prediction markets could be an interesting benchmarking technique. Have several of the model being tested and several of some grounding model like 4o or mini. They all digest information and make and bet on markets, and you see who wins the most at the end. They could read a book a chapter at a time with bets between each chapter.
💠Everyone who's good at tf2 sniper drank the space jam secret stuff
💠I think I've been exceptionally productive, for me, over the last week or so. I wonder why that is. Maybe I'm doing well defined interesting tasks, so I'm avoiding my usual problem of getting to an uncertain point and then doing roughly nothing
💠I'm making an environment for llms to play text adventures
💠solving llm gming, llm interactive fiction, and llm book authroing, will probably all be solved at the same time
💠llm could write a text adventure and could add code on demand when user tries something unexpected that ought to work. Other users inherit changes. LLM would need a very strong cohesive memory system.
I wonder if there's technically possible an alternative to my universal board game ai that'd play games at human level and that wouldnt require programming a game simulation ðŸ’
I recall trying to get llms to play boardgames and it feeling totally impossible but I don't remember what all I tried and it'd be cool to get it to work. I think it was doing a terrible job of making legal moves from one coordinate to another, like it was walking through walls and doing stupid things.
💠if my brain chemistry stays stable and my situation didn't get worse I think I could keep myself well entertained for centuries
This actually seems really hard to do better than what openrouter already has
💠Figure out how to make crowdsourcing llm costs a good UX
💠when Im speculating about stuff I don't know well, maybe it'd be helpful to have a convenient way, like an emoji reaction, to prompt a bot to give me context and tell me why I'm wrong. I expect it'd be too vacuous and hallucinatory in niche subjects though.
💠or really the bottleneck is the ability for the llm to robustly test out, debug, and iterate on its work. 'Edit, run, error message, loop' is awfully limited at least by my debugging standards. And there's no way for a gamedev llm to really "try out the game". Similarly image generation and understanding are not nuanced enough for human like specification and iteration
💠something like an OS level llm that doesn't need a concept of a screen, possibly with non-word tokens
Imagining an agent that functions with computer use, making small changes at a time and immediately observing their results. Like a transformer model streaming output and streaming results back in to the input as one continuous call
Seems wrong that there are still tasks ai is totally unhelpful for. Like if I wanted to make a rivals or aether character I'd have to do all the art myself since image gen can't handle doing the same character as pixel art matching an art style in different parts of various animations, and do all the code myself because roa coding is too niche for an llm to do any good. Could maybe improve the code part by providing lots of example context but the art is no where near doable
💠todo, in order to handle messages posted before the bot started, need to look up permissions related to 'partial' messages and reactions, which otherwise aren't loaded into the bot's view. Probably low impact unless the bot crashes or needs restarting a lot
ooo yeah thats flexibility
c:
💠hhmm?
bots could probably use a discord channel as a persistence layer, like a database or queue
initing context for blog:
on a mission to make the bot shut up
I'll improve this later and make the celebration reply turn into just a reaction. It might be nice to be able to post multiple messages at the same time but probably not worth it.
doot
This is a test post