Thoughts - thoughts

~/thoughts

Dec 6, 2025

💭 when learning something technical from an llm, good to always assume the llm is wrong and bullshitting you and treat it like a puzzle to find the gaps and mistakes in its reasoning

permalink

Dec 4, 2025

💭 when a project is clearly a fools errand, think carefully before doing it anyway

permalink

Dec 3, 2025

💭 I promise not to use it for evil, but I think I have a system in mind that could make an llm optimally impersonate me, at least for short texts with little context

permalink

Dec 3, 2025

💭 I remember struggling with how to record memory about temporary state such that it wouldn't become confusing later when the state became untrue. Maybe such state could be defining and setting a variable (eg location). Still need to avoid reading outdated data. Could try to trigger an update as soon as it becomes untrue somehow but that sounds very heavyhanded. Could try to standardize what kinds of state are tracked so next time a location is noted it updates the value (rather than the old memory needing to be marked no longer true) Could also treat state memories that haven't been set for a while as suspect and ask for them to be confirmed which will hopefully happen while the value change is still in context so it knows the answer.

permalink

Dec 2, 2025

💭 I feel irrationally like generating games from detailed rules descriptions shouldnt be terribly hard. I think it is, but I have some intuition that that's wrong

permalink

Dec 2, 2025

💭 it looks like making a statistical model on eg dominion card attributes is actually hard, at least if you want them to be explainable. And if its not explainable it's going to be very hard for it to give meaningful feedback or iteratively improve on

permalink

Dec 2, 2025

💭 you could recreate discord nitro with a quick browser extension

permalink

Dec 2, 2025

💭 I think tragic characters might be easier to write long plots for, since if a character fixes their flaw its now time for them to get off screen, while if they tragically fail to fix it that can keep demonstrating in different ways and escalating. Though I suppose its important it doesn't feel like the writing made a promise it didnt keep

permalink

Dec 1, 2025

💭 itd be nice if paper notebooks had a better way of reading pages facing the same direction, like both sides of the same sheet. feels pretty stupid taking a picture of a page to read.

permalink

Dec 1, 2025

💭 2am startling revelation that a pineapple does not resemble an apple and its tree doesnt resemble a pine

permalink

Dec 1, 2025

💭 if you can cheaply and effectively score responses does automatic prompt optimization pretty much solve the problem?

permalink

Dec 1, 2025

💭 trying to get good llm rpg players sounds like a fools errand given how many massive sub problems it has

permalink

Dec 1, 2025

💭 hard to tell someone about my interests without it sounding like a string of boring words

permalink

Dec 1, 2025

💭 I enjoy having notebooks full of handdrawn diagrams and things crossed out

permalink

Dec 1, 2025

💭 I enjoy knowing words spellcheck doesnt know

permalink

Nov 30, 2025

💭 llm powered scribblenautsy cheatcode typing game

permalink

Nov 30, 2025

💭 when you make a social blunder, mark your calendar so that you can take an annual moment of silence to recognize it

permalink

Nov 30, 2025

💭 technically that could be done by parsing into json, using some form of embedding and basic ml, but that wouldnt be able to give any interpretable feedback. Could auto/llm generate interpretable input measures and use those instead. Maybe some model where you can predict each input from all the other inputs and directly see how in-distribution each input or combination of inputs is

permalink

Nov 30, 2025

💭 a sort of function stylometric anomaly detection to try to discern which content is official / popular. Self improving like a programmatic gan, and able to give feedback

permalink

Nov 30, 2025

💭 llms are also poor at recognizing good ideas in a sea of poor ideas. It's possible human feedback per generation may be needed for quality assurance which prevents ideas in which the new generated content is surprising or generated mid game without bias.

permalink

Nov 30, 2025

💭 Common knowledge at least to me specifically that if you ask an llm to make content for a game it'll do a trash job, which is no good because I want unlimited content.

How to solve?

You'd teach a human by having them read a guide on good and bad design principles for the game, get them familiar with common bad or overused ideas, and then give them discerning user feedback. Llms aren't well suited since that's a large content dump, and they don't learn from past feedback without context bloat or tuning I'd love to avoid. They'd learn somewhat from examples but it trades off coverage with context bloat.

Llms have an additional problem of clustered sampling where ten parallel prompts will lead to roughly the same ideas each time, and thus meaningful seeding is required to force originality.

permalink

Nov 30, 2025

💭 soldier launcher You and enemies launched by kb from this have 30% less gravity. While rmb is held, changes to +30% gravity. Some downside that doesn't affect kb at all.

Synergy with airshots as enemies are floaty, better rollouts, possibly makes mantread kills doable, probably not good with gardener unless rmb works even while not held

permalink

Nov 30, 2025

💭 train an ai to read my blog

permalink

Nov 30, 2025

💭 Note for predictive image generation that I should try making a custom conditioning node in comfy, being the earliest point to skip text prompting.

permalink

Nov 29, 2025

💭 amazon's dumb little website, aws

permalink

Nov 29, 2025

💭 What if you don't discard action vectors, but rather than trying to predict the set of actions (may be many, complicated to predict and loss, many may be poor actions) we try to predict the top n highest visit count actions, maybe 3. Going to have to fill a lot of notepaper before coding anything or I'll waste more time

permalink

Nov 29, 2025

💭 I keep redesigning muzero and then realizing my new plan doesnt actually make sense I'd just been confident that inner nodes could make a single sample of successor states to MCTS over, but that highly favors high variance states because if you don't make stochasticity an additional after-choosing-action step then you can just choose the actions where you've deterministically decided the luck was good? Edit no, the sampler would learn to produce 'average' states, which don't represent any actual states, like drawing an average of all possible card vectors. I assume that such loss of fidelity would limit the usefulness of inner node exploration?

My first approach was least magical so I should take that as a baseline

Every node, root and inner, uses input action vectors, and the state vector + action makes a sampler for the next child. Root node also has a sampler to go from observed to actual state, which I'm pretty sure I dont need for inner nodes. Raises difficult question of how to get the action inputs for the inner states since you dont have game rules to generate them, which I did with some token generation. Measuring the loss of generated actions to actual actions in that state in the game history was expensive, a hard thing to train, though maybe I just did it badly.

So idea was to do away with that and not need actual action vectors in the inner nodes, just generate successors directly, with dummy actions leading to them. But for muzero to work it needs to be able to choose between actions. If each action has one successor it assumes no stochasticity, problem given in first paragraph. But if the successor is sampled without taking an action how would you even differentiate the different edge samplers. Cant sample a sampler I think partly because all you have to train on is the actual successor state.

permalink

Nov 29, 2025

💭 Do I need 1 observed root node samples latent for actual root nodes (no hidden info), then actual root nodes + action sample successor (due to stochasticity) 2 (observed root node itself + action)s sample successors (one sampler handling hidden info and stochasticity 3 observed root node samples latent for actual root nodes with a 'fixed random seed' (sampling hidden info and stochasticity) and actions from that point have fixed outcomes Would they give the same result?

permalink

Nov 28, 2025

💭 note that I should try again automatically generating Dominion or wiz war tts cards from text or better modeling generating cards from nothing

permalink

Nov 28, 2025

💭 1 have ai generate the image you want (boo hiss) 2 reverse image search to find the most similar non generated image No Ai art is used

permalink

Nov 28, 2025

💭 great job young me for identifying programming as a rewarding activity to pursue. I strongly suspect this is more engaging than the median option I could have considered

permalink

Nov 28, 2025

there are so many words that mean 'a tensor output, with a little added context' embedding, encoding, latent, hidden

permalink

Nov 28, 2025

💭 I'd like to see someone try to play multiple games at a high level simultaneously, maybe using foot pedals and peripherals to route inputs

permalink

Nov 28, 2025

💭 are llm tokens case sensitive (yes?), then could you make a model dramatically cheaper to train by lowercasing all training data and input prompts? Imagine you're just working with plaintext

permalink

Nov 27, 2025

💭 Blokus would be badly represented with tabular data. It may be an outlier but it shows a lack of universality

permalink

Nov 27, 2025

💭 I just can't think of any simple game that would require graph relationship between entity tokens. I'm pretty confident spirit island and probably mtg would badly want that but anything reasonable to code could do fine without it. I can't properly validate the idea without an environment that requires it.

permalink

Nov 27, 2025

💭 - use debugger to understand how swarmui is generating images and at what point a partial result could be embedded to skip prompts and go straight to vectors.

Implement a simple game that needs a graph to work and add an efficient graph bias to the transformer for and test with alphazero
rework muzero so that inner hidden states have a vae to next state rather rather than a vae for actualizing hidden info

permalink

Nov 26, 2025

💭 if you provide 99% up time that means you can just turn your servers off for about 7 hours per month

permalink

Nov 26, 2025

💭 I think any mtg rules engine could be modified to make real time mtg, similar to real time chess. I imagine there are other opportunities to add stress and chaos to normally turne based games

permalink

Nov 26, 2025

💭 Inner muzero nodes would have a representation vector and a next state vae, with no information about what actions reach those states. Value can be derived from the representation vector

permalink

Nov 26, 2025

💭 I'm thinking I could just progressive widen likely following muzero states rather than generate actions. I don't know if I'll have time to try that before end of December

permalink

Nov 25, 2025

💭 I suspect there's a good general approach to generating inputs that produce high rated outputs via a system, eg automatically getting image generator prompts trading between information gain and expected rating

permalink

Nov 24, 2025

💭 Pretty concerned I'll end up accomplishing ~nothing novel on the board game ai front. I think so far none of my innovations have been demonstrated to be a good idea, and many parts have been shelved for scope.

Currently working on:

Given I already have lots of good connect4 data, train muzero on connect4. Current plateaus at a 0% win rate. Currently grid searching variants to figure out if its a param problem, else will try redesigning some aspects.
Train alphazero on gobblet, being similar but more complex than connect4.

Current status for my implementations:

Can handle stochasticity
Cannot yet handle hidden info (which is solved, so low prio for me, besides demonstrating my stuff isnt incompatible)
Always uses actions as inputs, which is more efficient than separate action net for simple actions, and less efficient for factored actions. I think muzero could only take actions-as-inputs..?
Have innovations to allow muzero to work with dynamic action space, with actions-as-inputs and dynamics network predicting actions for latent states. Not yet validated to work. Not converging well.
Symmetry work always sounds doable and then brickwalls. Not priority though besides efficiency. A true generic ai system would definitely want automatic symmetry handling though.
General board game engine rules usually seem doable and then brickwall. It's sometimes hard to even explain why it shouldn't be doable. I think short answer might be "literally everything must be an object include context at any level of abstraction." but I expect that doesn't cover everything
Sparse graph handling in transformer is an important innovation which I just haven't gotten to yet given I'm gradually increasing game complexity and games that meaningfully need graph connections between tokens tend to be a step up in complexity. I think simplest case would be any game where you can have any number of entitites on/in/attached to a location or entity.

permalink

Nov 23, 2025

💭 seems also at odds, making the story satisfactory at variable lengths, and making it have a cohesive full story arc. Difference between having a big story circle with smaller circles inside (full arc), vs chaining circles (episodic arcs). Episodic arcs feel tacked on.

permalink

Nov 23, 2025

💭 I'm guessing arc focus and easy extensibility are at odds given arc focus wants to pull things to a close and want events on the way to tie in heavily. There's only so much you can reasonably do to support one arc at which point new content must just be filler. Filler can be fine but is subtracting from the arc.

permalink

Nov 23, 2025

💭 pretty sure some of my favorite books mostly ignore character arcs to focus on problem solving focus. Sorta like deathnote

permalink

Nov 23, 2025

💭 related, need to draw out major character arcs so they finish as close to the final ending since fully developed characters quickly get dull and need to be taken off screen. That seems stupid. Though at least a longer transformation is less likely to feel natural than a Disney sudden character shift when the plot demands

permalink

Nov 23, 2025

💭 how do you make a fiction feel done but also extendable? I think typical answer is either making it episodic and neatly solving each episode or inventing new arcs for each season which tends to feel awfully forced.

permalink

Nov 23, 2025

💭 Sweet beans are made of these ^

permalink

Nov 23, 2025

💭 more like scamazon slime

permalink

Nov 22, 2025

💭 you can of course recompute the state with the next set of action components, which reduces the problem to just being "some actions will take two or three times longer to process due to repeated state computation"? And that somehow it'll need to handle inputs representing partial actions.

permalink

Nov 22, 2025

💭 I realize that compound multiplicative action spaces kill actions as input tokens dead which probably also kills general muzero dead which invalidates a lot of my work. I have a notion that of course it must be possible somehow, but I think that's not actually true.

permalink

Nov 22, 2025

💭 true punk is replying to donotreply emails

permalink

Nov 22, 2025

💭 I learned if you use cross entropy on a smooth target you should calculate and subtract the base entropy from the loss if you plan on adding it to other loss terms. 🎈

permalink

Nov 21, 2025

💭 Hmm what if the llm output a predicted rlhf score for its response to indicate 'confidence this would get upvoted'? I expect it wouldnt be super useful due to limitations of human granted thumbs but it might be better than nothing, and if you somehow had objectively perfect rlhf it'd be a solution?

permalink

Nov 21, 2025

💭 idk how to categorize this error, but errors like Llm says its necessary to do a tournament step after each round of training alphazero to ensure the training has lead to actual improvement

It sounds wise. It can come up with arguments that use words like "collapse," and talk about problems of proxy measurements. It's not true though, since MCTS provably converges to optimal strategy and you're learning to predict MCTS. I could imagine its possible to fall into some collapse state but itd be a wild fluke, and neither alphazero nor muzero do tournament checks (though alphago did).

Still, I guess because the tournament is an associated piece of jargon it adds it to the plan, and then having done so it keeps acting smart and backing itself up. You can ask "Why is it necessary to run a tournament check..." and "Why isn't it necessary..." in fresh chats and it'll fart out incorrect but wise looking answer either way. Obviously hallucination exists, but I'm not sure how to deal with it in these sorts of situations besides already knowing the answer.

permalink

Nov 21, 2025

💭 could probably use llm question generation + information theory wrapper to make a better akinator 20 questions game. I wonder if similar thinking could work for other problems.

permalink

Nov 21, 2025

💭 even microblogging to the void about writing feels like cringey faking, which I should also fix.

permalink

Nov 21, 2025

💭 I'm not hogwild about writing but I want to have broader capabilities than I do, so I'll hard focus a different skill every two months or so and try to break through the skill floors. I have a problem where I get stuck extremely easily: I think of a question I need to answer, I fail to think of a satisfactory answer in a few minutes of brainstorming, and I switch tasks to passive consumption. I've got to kill that somehow for this to work. Debugging and planning programming tasks feel more like following a trail, while these problems feel like trying to generate an answer from nothing.

permalink

Nov 21, 2025

💭 I dont think there's any serious sota breaks in recent llm models, at least for my practical usage. They're good and bad at roughly the same things to roughly the same degree and differences I can notice are mostly standard variance I think.

permalink

Nov 19, 2025

💭 comparing gpt5, 2.5pro, and 3pro on a very technical research question I'm pretty familiar with at this point (how to generate candidate actions for muzero in environments with dynamic action spaces) They all do quite badly.

2.5 pro followed instructions better, gave the most likely to be useful solution (use a vae to generate a fixed number of actions, which could make sense if we also predict the number of actions to generate), and noted one of the errors it made earlier, framing it as a "downside" of the suggestion.

3 pro for some reason thought that any action could be represented as a pair of two entities and built everything off of that. It redesigned muzero to support that in a way that'd be extremely expensive to compute and didn't mention that.

gpt5 was similar to 3, answered a lot more than asked for, and gave a solution with poor scaling, seemed to strategically not look for problems with the solution.

None of them came up with "just generate action vectors until you get a stop token" which seems like the obvious baseline, and all came up with answers with more buzzwords.

permalink

Nov 19, 2025

💭 https://music.youtube.com/watch?v=2TgAb3JdQiU&si=_7y-6alkANjmQ7Cn

permalink

Nov 19, 2025

💭 I think a character being likable comes from the reader (almost said user) being able to empathize with them, rather than anything that normally makes a person likable. Possibly also secondhand liking them on the MCs behalf.

permalink

Nov 19, 2025

💭 now that I've solved self play data generation I get to discover that the training pipeline is too slow.

permalink

Nov 18, 2025

💭 might be a worthwhile exercise to formulate value statements and try to find / llm generate maximally string steelman arguments against them. Maybe only if the value isn't improving your life satisfaction

permalink

Nov 18, 2025

💭 reword: I wonder if they feel productive knowing that work is being done while they're idle as I do

permalink

Nov 18, 2025

💭 I wonder if managers/employers feel productive by telling people to do things in the same way I feel productive by telling computers to do things

permalink

Nov 16, 2025

💭 I'm guessing the answer is technically yes but practically not without weight access

permalink

Nov 16, 2025

💭 I forget the technique which finds an input which maximizes a certain classification in a model. Could the same thing be used to find a system prompt which maximizes the rlhf scores of data, thereby avoiding finetune for rlhf?

permalink

Nov 16, 2025

💭 I suspect we have the technology to randomly generate pretty good board games from nothing

permalink

Nov 16, 2025

💭 okay if something can be decomposed into data and structure then it can be pattern analyzed statistically and therefor generated

permalink

Nov 16, 2025

💭 so whats the lowest fidelity way to make a class based fps like tf2 or overwatch that feels good? Can 3d modeling be avoided somehow? Sprite based animations might be actually worse though particularly if supporting many angles / above and below. Is there some unexpected way to get silhouette and character and aesthetic almost for free?

permalink

Nov 15, 2025

💭 discrete outputs are a truer representation of a discrete game though, it'd just somehow need to learn an absurdly large codebook. One item per state is stupid. Maybe it could predict multiple separate discrete outputs and combine them somehow. Technically it could predict a variable size state representation like a series of tokens but that sounds too expensive.

permalink

Nov 15, 2025

💭 Stochastic muzero uses vq-vae with a discreet set of outputs rather than regular vae with a continuous output. Maybe that makes sense for games with smaller state spaces but doesn't make sense for eg mtg? Unsure, but reducing the expressivity of the state space sounds like a really bad idea when its supposed to be able to express basically any imaginable gamestate

permalink

Nov 15, 2025

💭 landmark in that I've finally got selfplay distributed to cloud compute, so I should be able to get self play games far faster and have cpu to spare for other things

permalink

Nov 14, 2025

💭 llmy techniques could probably catch higher level stylometric patterns which could be converted into cheaper to compute patterns. I suspect that's more interepretable but not more powerful for author prediction, which is the common usecase

permalink

Nov 14, 2025

💭 automatic feedback via stylometric analysis? Doesn't exactly give advice but maybe suggests where to find problems. I wonder if stylometrics for music are established. Every time I've tried to compress music to stats and patterns I've quickly found a brick wall given how many ways those patterns could be expressed and obfuscated.

permalink

Nov 14, 2025

💭 part of me doing mostly programming is that I'm obviously comparatively crap at things I haven't spent a decade on. Making crap isn't bad, but its not terribly satisfying, and I think there's a draw to hide it away due to its garbage status, which is probably counterproductive.

permalink

Nov 13, 2025

💭 pretty sure for some games, efficiency symmetry handling is mandatory, like the 100 coppers example. Can manually change the legal actions to avoid obvious duplicate actions, which might be good enough :/

permalink

Nov 12, 2025

💭 had an llm read my blog and tag things and it classified an alarming number of posts as shitposts

permalink

Nov 12, 2025

💭 symmetry handling is a pain. A game needs to define in which ways it's zones are symmetrical, eg chess is symmetrical only horizontally.

I'm pretty confident there's no way to avoid actually transforming the state into each symmetrical form and checking for equality, which is a lot of overhead to be doing at each step.

Games with multiple zones need to handle their symmetries separately lest the number of possible symmetries scale multiplicatively (eg a you have a board which is symmetrical by any reflection+rotation and each player has an unordered set of piles of tokens which the game refers to by index)

You don't want to actually transform the board to its canonical form, so even you calculate the canonical hash, go to the already created node studying that position, but need to be able to produce the original board. Similarly, the legal moves from the canon position are transformed versions of the actual legal moves. D:

It's preferable for the AI to just not consider moves that lead to symmetrical positions (though I think it may be isomorphic to perfectly handling symmetry in-tree) but I'm pretty sure you can't automatically know which moves will lead to symmetrical positions prior to simulation. My existing works have manually avoided symmetrical moves in cases where it's easy to program, but that's added game scripter responsibility.

In some circumstances symmetry handling is hugely important (AI has 100 coppers in their hand and is trying to figure out what order to play them) while in other cases symmetry will virtually never occur (chess, I think) and it's potentially a pretty expensive and complicating system.

permalink

Nov 12, 2025

💭 todo: make a contact page that's got a blender animated animal-crossing-Rover (but with my head?) doing a dialogue tree to get context for their message

permalink

Nov 12, 2025

💭 unbounded cache size caused times to get much worse, which doesn't make much sense unless I presume that's because it was slowness due to using all the system resources

permalink

Nov 12, 2025

💭 also gotta start running my compute heavy jobs on cloud because this was not efficient.

permalink

Nov 12, 2025

💭 should implement the following separately, maybe even in different branches:

advanced efficient symmetry and orderless handling.
stochasticity
later, stochasticity plus hidden information
highly dynamic policy space (my main development)
add sparse graph info to tokens efficiently (a separate development)

Feel like I'm missing one. For each I think I need to get it working for alphazero OR muzero or both. I'm not convinced it's always worthwhile to implement everything in alphazero given it adds expensive overhead, and muzero magics away a lot of problems with its internal representation.

permalink

Nov 12, 2025

💭 I should find something more useful to do with all these orphan thought posts. Maybe getting a weekly email with random past posts or getting an llm to auto tag or something.

permalink

Nov 12, 2025

💭 going to try putting an end of December "deadline" on the ai work after which I have to work on something else in the hopes it makes me plan better.

permalink

Nov 12, 2025

💭 at some point I need to make an actual blog to go with the microblog since I do spend time working on weird stuff and what I've learned is potentially useful to the right person. Also feels wrong not to have a blog because all the cool kids have one

permalink

Nov 11, 2025

💭 if a small beautiful boy without hands was difficult to take care of, they'd be a handsome handless handful

permalink

Nov 11, 2025

💭 selfplay is taking up 14 gigabytes of ram. This may explain some problems I've noticed

permalink

Nov 10, 2025

💭 magic item: a clock that shows the time ten minutes into the future

permalink

Nov 5, 2025

💭 is there any way to make a long llm chat with text message length messages cost effective? I think even the most generous caching wouldn't help

permalink

Nov 3, 2025

💭 if you tell a cpu to play board games against itself 24/7 for weeks it gets all sweaty

permalink

Nov 3, 2025

💭 how to efficiently get an ml model understanding a game state to be able to work with with deck information. Traditionally you write a (potentially very complex) simulator handling the hidden info, but that doesn't scale.

If you just have a shuffled deck, yes you can simulate random draws. Even simple manipulations like "I put card A 3 cards from the top" or "I shuffled the discard pile of these cards" makes a simulator complex to write. If the opponent mulligans some cards, neither their hand or deck is now random, and the distribution of states describing them as now very expensive to compute.

So maybe you don't do that and you just use ml state prediction, but how do you make a model that'll be good at that?

permalink

Nov 2, 2025

💭 I need me a taxonomy of board game mechanics at the smallest level. In mtg "target creature has +3/+3 until end of turn" makes sense but "target player has +3 health until end of turn" doesn't. Theyre different types of numbers or they have different properties? How many are there? What are the absolute constraints of when x vs while x? Can any programmed system have truly arbitrary event handling?

permalink

Nov 1, 2025

💭 I think there's usually some tension between "do I make big project X" or "do I make bigger project Y that'll make projects like X much easier in future", at least for me. Y seems obviously higher impact but also more work before the initial reward. I don't like not automating things.

permalink

Oct 31, 2025

💭 I realize a drawback to using self play data with non symmetrical models (stronger vs weaker model) In addition to dilution and stuff, it may learn to infer "if we made it to this state, we must be stupid, and therefor stupid actions are more likely" and learn to recommend them. Itd also learn that stupid positions are more likely to lead to losing states (more than is actually true) but thats not a big deal

permalink

Oct 31, 2025

💭 you can automatically measure how easy to use a library/framework is by the success rate of getting llms to make test passing code with it

permalink

Oct 30, 2025

💭 I think a sufficiently easy to use and famlilarize game engine plus llm code and test generation sounds like the path towards automatic board game coding. I want that.

permalink

Oct 29, 2025

💭 recurring trend where I have some issue with a large established library (pandas, instructor) and eventually I rapidly reimplement the sliver of it that I need

permalink

Oct 27, 2025

💭 a person cleaning plates in a restaurant dreaming of wealth is a dish washer dosh wisher

permalink

Oct 27, 2025

💭 very important you load your model weights, not just init the model

permalink

Oct 25, 2025

💭 tip: if you need to do expensive data processing, do it in the dataloader, not your forward function, like a goof.

permalink

Oct 24, 2025

💭 imagine the utopia if young children were taught how to use vectorized operations instead of loops

permalink

Oct 23, 2025

💭 Possibly nonsense For some non-text problem, like predicting one system state from a history of past states: Express the problem with new unique tokens Tune an existing small llm on the prediction task. It'll learn to embed the new tokens. Optionally heavily ablate the tuned llm, making it as small as possible without hurting performance on your dataset.

Since llms are already able to work effectively broadly over many domains, the idea is to try to leverage existing circuitry.

permalink

Oct 22, 2025

💭 if one could induce parts of their brain to sleep at a time they could remain mostly functional without sleeping, but would get a sort of split personality issue where at different times they have different memories and relative capabilities.

permalink

Oct 22, 2025

If life had mods the most downloaded would be difficulty reduction and qol like No Sleeping and Fast Travel Anywhere and beneath that would be high effort content mods like Better Cooking and Pokémon, and beneath that a sea of lore unfriendly joke mods like Hatsune Miku Joe Biden Skin

permalink

Oct 22, 2025

💭 llm generates mechanics/systems, generates orthogonally interesting entities that interact with each other and the systems, simulates things in an auto dwarf fortress way. Becomes a prompt based text adventure probably with a limited action vocab where the player needs to read the generated wiki to learn the environment.

permalink

Oct 22, 2025

💭 game where the core mechanic is reading the wiki

permalink

Oct 21, 2025

💭 oh shoot my idea if using action tokens rather than dominions separate action network is the Dominion method can handle hierarchical actions (choose card then choose target then choose x etc) iteratively without recomputing the big network whereas id need to either make an action token for each step which sounds complicated and lossy or rerun the transformer encoder on the new tokens. Woe. Not sure if that idea is shot dead.

permalink

Oct 21, 2025

💭 war game measured positioning might be the only use of continuous variables in non-dextrous board games?

permalink

Oct 21, 2025

💭 if everyone could assign stat points, what would the established norm be? Presumably the majority in the past would go into physical stats while now most would go into mental stats. If everyone's stats are visible maybe all in mental would be seen as necessary to be a serious employee, or maybe "wasted" stats would be a form of costly value signaling.

permalink

Oct 20, 2025

💭 1 wake up 2 self report qualia 3 exhibit questionably goal driven behavior 4 appear coherent

permalink

Oct 20, 2025

💭 seems kind of simple but input -> very cheap llm/query system deferring to more expensive llm/query system recursively -> output seems good. It's like a rag with more knobs that could hopefully under some parameters minimize cost while ensuring good answers over large contexts. Not sure what shapes the recursive querying could take or under what conditions defer upgrading would be needed, and there's a tradeoff where cheaper systems are worse at knowing when to defer

permalink

Oct 20, 2025

💭 I'm not at all happy with my cool thing production output. I can blame work and I can blame ai dev being hard but that doesn't put me on track. I need to get more directed and intentional.

permalink

Oct 20, 2025

💭 really good automated midi to roman numeral analysis would sure help understand music

permalink

Oct 20, 2025

💭 essentially I'd like to be able to generate fictional wikis of cool sounding novel content - not vague samey uninspiring content.

permalink

Oct 20, 2025

💭 some way to search the game mechanic space from a set of primitive actions to maximize novelty and leading to interesting choices. Not sure how to define any of that

permalink

Oct 14, 2025

💭 is there a name for the phenomena where you excitedly let something run overnight and then in the morning find it stopped almost immediately for one reason or another?

permalink

Oct 14, 2025

💭 getting frission from rereading the google c# style guide

permalink

Oct 13, 2025

💭 hang glider horse

permalink

Oct 13, 2025

💭 ^could be a paper's please about finding security vulnerabilities in PRs actually

permalink

Oct 13, 2025

💭 cozy indie game about resolving merge conflicts

permalink

Oct 12, 2025

💭 normally you train a game ai to try to win quickly and lose slowly, in other words a loss now is worse than a loss later. Unfortunately that means if the ai finds an action which has no effect on the game except to make the action history longer, it'll learn to do that when losing

permalink

Oct 11, 2025

💭 I wonder if you could train a sufficiently large cross game muzero such that it could more quickly learn new games by transferring share concepts like resource management, game theory, value of information, etc.

permalink

Oct 10, 2025

💭 key weakness of llms is trying to do everything in one message? Not "thinking step by step" enough. Eg I explain a very difficult problem I'm trying to work through, it'll just pretend its solving it, when a thoughtful reply would be acknowledging the difficulty, breaking down the problem, making connections to existing techniques, etc. If you asked an llm to write something long form it'll just start putting words down without any kind of outlining and with total blindness to how generic and sloppy the writing is. You could wrap any prompt in "make a plan breaking down the steps to do x" and forward those tasks to more llm calls, but how do you know when you're at a level of detail that can be acted on rather than that needs more breakdown?

permalink

Oct 8, 2025

💭 web dev framework for making early 2000s style sites

permalink

Oct 5, 2025

💭 Sometimes without any context or prompting, my brain will something like "How blessed I am to live in the same universe as the classic SNES game 'Chrono Trigger'" Actually the mental non sequiturs are common enough that I'm curious what exactly my subconscious is up to, and if it has anything to do with my inability to maintain a train of thought.

permalink

Oct 5, 2025

💭 a programmed function can generate in out observations. An ml model can approximate the observations. Since we then have program ast tokens to ml weights, we can generate training data to learn the inverse, to write code describing the behavior of an ml model?

permalink

Oct 3, 2025

💭 Social deduction ai: Might be worthwhile to make a single highly powerful state solver rather than affordable full players. Could provide all chat history up to a point and have it try to predict the state likelihood. Would need to somehow script authentic observations to test it.

permalink

Oct 3, 2025

💭 perhaps humans doing a play by post game could tolerate playing with an llm to test, but the games are long and I don't imagine most would want to risk it being ruined.

permalink

Oct 3, 2025

💭 as for the larger problem of making llms actually play well, that's quite fun. I don't have a cheap solution, but I think listing and maintaining counterfactual states and having each state separately explored and evaluated might be effective. Will need to be able to create meaningful counterfactuals, combine them, rule them out, and get a global view of how to act given the whole information state. Unfortunately while I think this may be affordable for one player, I can't imagine it being cost effective to have 10 such agents play a long game together.

permalink

Oct 3, 2025

💭 on natural many llm chat (as opposed to long-form turn taking).

At a 10 person public discussion, turn taking fails terribly. A challenges D. B gives evidence against A. C talks about F. By the time A is allowed to contest B's comment, it's one of many simultaneous conversations, and now A must also comment on the other present conversations. Every message is overburdened and the chat history is full of such walls of text.

Two challenges to fix this. Cost is one since shorter messages likely means more total messages (though ideally not everyone needs to weigh in on every claim). I'm hoping a strong llm can make a plan / playbook which can be performed by a cheaper llm. I fully anticipate being disappointed.

The other is knowing when to speak, since of course an llm is not "hearing" the chat until it is prompted to respond and thus accruing cost. A cheap llm could represent each actor, though frankly I don't have much faith in cheap classified llms and imagine by default everyone would want to comment on everything and tuning it could be painful. Another option is an invisible moderator who chooses who should speak next possibly with some kind of weighted queue. Each actor can list what subjects they might want to comment on, and the moderator could enqueue anyone directly mentioned or who appears relevant given their list. That still would necessarily lead to more opportunities to talk than actual messages (presuming most messages could potentially be responded to by multiple others) so I suppose I have to hope the cheap llms are cheap enough and can be tuned to shut up when they have nothing important or new to say.

permalink

Oct 3, 2025

💭 The core design for white knuckle is great. Horror games are usually mechanically about sneaking/running, which are both fine but have their own problems. Sneaking is usually mechanically along the lines of red light green light where you become safe by halting your progress for a while (an unfair reduction). Running is less common and in most games is pretty much pressing shift+w. In either case, the mechanic being used to survive isn't challenging in itself, meaning it doesn't carry so much of a "I need to succeed" tension. They're scary in other ways.

White knuckle is pretty much 100% running based survival where the running (climbing) is technically complex and varied enough to be the entire game, with high enough cost of failure that the whole game is "I need to succeed" tension.

Irrelevant, since tension is usually tied to cost of failure, non-dextrous games like board games usually handle it by having more emphasis on bad luck. Darkest dungeon where enemies can crit and you can permanently lose trained party members. Kingdom Death Monster where any damage can cause an injury role that could explode your head. I find this to be a bit boring but I can't think of something better. Maybe make it a puzzle with a time limit.

permalink

Oct 1, 2025

💭 Difficult social deduction game play has other problems besides high cost of course.

Open ended (not turn based) chat seems pretty awkward when everyone is basically writing emails to each other, but shorter messages are far more expensive given input processing. I guess that's supposed to be what llm caching is for(?) but I've never succeeded at saving money with llm caching.
Need to seriously r&d the problem solving counterfactuals aspect of it because the difficulty of the game is such that they're rapidly just hallucinating confidence and confusing themselves.

permalink

Oct 1, 2025

💭 Previously mentioned problem: Long running llm task has different degrees of intelligence needed per step. Sometimes you need to plan/navigate, and other times you're trivially enacting the plan/walking the path. Using a strong llm at all times is needlessly expensive and using a cheap llm performs poorly at planning. This ties into other long term agency systems but isn't directly entangled I think.

Given a new situation, use the strong llm. Strong llm produces a plan and takes the first step. Future responses use the weak llm until

it flags that the plan doesn't cover its current situation, new important info has arisen that may change the plan, etc. This obviously needs tuning.
a fixed number of steps passes at which point the strong llm is asked to review the situation.

If this works well it'll be exciting for things like the text adventure project and social deduction game play given those fell apart partly under the cost of needing strong llms over long terms. If success can be quantified, could even statistically tune some parameters.

permalink

Oct 1, 2025

💭 if you had jackbox with 4 humans and 4 llms it might be interesting to see which models do best. Might be fun to anonynize everyone

permalink

Sep 28, 2025

💭 not actually using pokemon because they're not meaningfully distinct from each other with their shared move pool and minimal mechanical space, but every other part of pokemon with collection and countering and powering up over time. Say, via an autochess store or pack drafting. Then need a way to avoid developing a quick meta where you use the same team every time, maybe with injuries benching units or something like in deck builders where your combo may not be available in the random hand you draw (though I'd like to avoid bad options diluting the pool rather than just adding to it if possible since I think it discourages experimentation). I think given interesting enough units the autobattle part will be interesting. The games of tttf I solo playtested were usually pretty hype (though I expect an ai to play far more predictably).

The pokemon esque powering up can be done cheaply with add on systems like attaching upgrades. Not sure about rarity unless I start designing units specifically for imbalanced play

permalink

Sep 28, 2025

💭 only just saw setting deadline by how much time the project is worth rather than "when we need if by" or "how long it'll probably take". If it appears it won't be done by the worthwhile point you stop and do something else

permalink

Sep 28, 2025

💭 if you take the muzero states with the worst next state prediction I wonder if that could reveal interesting gameplay? Probably not it's just show high uncertainty states like drawing from an mtg deck, but there might be something there

permalink

Sep 28, 2025

💭 when you ask a child what they want to be when they're grown up they never say a corrupt executive

permalink

Sep 28, 2025

💭 on how to make a team building game like pokemon or autochess. Assume collecting is good. Variable teams is good. Manually directing teams is bad (I don't like mashing a in jrpgs). Strange synergies and counters are good due to personal interest. Possibly finding rare things is good.

permalink

Sep 27, 2025

💭 not sure if I already wrote this obviously being able to make ai for very complex games is good. but any improvement over existing game ais is also good. Imagine you have a standard simple rules based ai. Rather than being a total flowchart, it could reduce the action space to 4-12ish possible moves, maybe combining and abstracting actions, like "move to and attack nearest [unit]". The transformer based state reduction I'm working with would pretty efficiently be able to score those options, and it could maybe seem a lot smarter than should otherwise be possible. You could do only a single state encoding to choose all the ai controlled character's choices if they don't have separate hidden info, which should be very performant

permalink

Sep 25, 2025

💭 adendum, better to have an llm guess and check tuning the input to the desired behavior rather than using rl

permalink

Sep 25, 2025

💭 Imagine you have a game like pikmin. Imagine you already have a strong hierarchical long term agency system, so an ai can remember it's long term goals and avoid rabbitholing. Suppose we have a very fast improved image understanding model that can consistently give object recognition coords on arbitrary objects with low latency. Possibly using save states for training, imagine an llm makes a command (rotate_cw_90, withdraw_pikmin) and an input generator ml can train based on llm feedback until it can reliably perform the command

permalink

Sep 24, 2025

💭 note that rivals could probably run test code on a special combination input or when going to a special test stage. Could maybe do tdd.

permalink

Sep 24, 2025

💭 apparently image generators can do sprite sheets now which might make making rivals characters less awful for me if it works well

permalink

Sep 24, 2025

💭 not sure which direction is causal but me being productive on projects correlates with me making more thought posts. I expect it goes both ways, because a readiness to write down more thoughts forces me to actually consider things enough to have something worth writing

permalink

Sep 24, 2025

💭 really need something like an llm that can understand structured/grid data better, like solve a maze. It'd be really nice to combine rl with llm high level direction. Combine 'able to learn to play well' and 'have some common sense of what the objective is'

permalink

Sep 24, 2025

💭 not sure if I said this yesterday -> can ml produce effective and performant rules based ai? If you have a discrete action space, could you do some kind of progressive simplification of the model into a sufficiently simple rules flowchart?

permalink

Sep 24, 2025

💭 I thought my rivals assistant was largely unknown, but I also have an automatic git backup of many rivals mods, and searching all their source codes at once shows lots of notable mods/modders are using it. Neat. Looks like nearly everyone is just using the sprite exporting part, but that's okay.

permalink

Sep 23, 2025

💭 multi stage takeoff first with mcts then alphazero than muzero in order of how strong they are untrained

permalink

Sep 23, 2025

💭 even in reasonably sized action spaces It still might be useful to choose actions to simulate based on how different the expected avg resulting would be from other actions thus exploring the broadest range of possibilities rather than seeing mostly the same result each time. Idk how to formalize that.

permalink

Sep 23, 2025

💭 alphazero specifically could get more training data by using in search tree states as rows so long as they're thoroughly enough explored. Could actually weight how strong the training data is as evidence though I've never heard of that being done

permalink

Sep 23, 2025

💭 is it maybe useful for muzero to act like it's in a continuous space and raise the level of abstraction arbitrarily past the game system's definition of action? Eg if you had a move of 1 on a grid you want to check each destination as a sim fork. But if you scale down the grid size to be a tiny mesh and give you a speed of a billion you'd no longer want to think of every possible movement as a possible action (unless you have some fantastic filtering process). More likely you'd want to sort of cluster them like "this area is in cover and near the door" or "this area doesn't rely on moving past the window" and sample from each cluster, or otherwise sample evenly over the action space, building up a pattern of what leads to a good sample

permalink

Sep 23, 2025

💭 ml tuned rules based ai? Dramatically reduce action space by combining and defaulting choices. Get a few attributes of the game state to predict value from for a simpler weaker model if possible. Maybe do without hidden info and rng handling for performance. I can't think of a way to avoid sim overhead without learning a simulation model so hopefully it's enough to just do few sims.

permalink

Sep 21, 2025

💭 I wonder if there's some theoretical way to universally make a desktop operating system mobile convenient. It'd be nice to not need android versions of software when I already have a working windows setup that could be copied or remote connected

permalink

Sep 20, 2025

💭 an inverse of muzero, imagine playing a board game powered by ml simulation, like the playable ai generated minecraft.

permalink

Sep 18, 2025

💭 van gogh kazooie

permalink

Sep 15, 2025

💭 rough thought that llms expanding something from a summary is like diffusion generation. Could maybe be good recursively on demand for arbitrary depth. Hard to say what context is needed because id expect other branches of the generation tree to have relevant info which makes it sound intractable. Could maybe be trained to undo "summarize" instructions.

permalink

Sep 13, 2025

💭 I'm not really at all confident that muzero scales up to highly complex games the way alphazero clearly does given dominion. It might, but highly complex games are obviously a lot harder to simulate accurate, which muzero needs to learn to do itself. All the successes of muzero have been on relatively simple games, because that's all anyone's used board game ai for so far. I worry that without a really, really impressive trained simulator model, it'll make dumb mistakes with frequency since its entire search is based off of insufficient simulation. Needs more thought. I don't like relying on a highly performant and compatible game implementation, but the rules have to come from somewhere

permalink

Sep 12, 2025

💭 actually more reasonably, could have llm powered characters and with different backgrounds and interview them. It'd be similar to the social deduction game work I was doing but the human would be the only one responsible for figuring out the hidden state, which given current capabilities would be far less frustrating.

permalink

Sep 12, 2025

💭 there are 'fake phone' style detective games, but now that we can manufacture entire fake internets we could make them open world

permalink

Sep 12, 2025

💭 aws has the only UI/UX I've used that gives me a physical sense of revulsion, in a nails-on-the-blackboard kind of way.

permalink

Sep 12, 2025

💭 the hidden state is continuous (necessarily?) so I need to use a vae that produces a continuous distribution(?) but obviously most random changes in board game states are discrete, eg which die roll you get or which card you draw.

:[ gotta reread how discrete vaes work

permalink

Sep 11, 2025

💭 note: I should sample the vae with frequency corresponding to its std. Total std? Mean std? That way waste less time simulating nearly identical samples

permalink

Sep 10, 2025

💭 llms are probably already used for government message reading surveillance in some places

permalink

Sep 10, 2025

💭 currently training the previously mentioned muzero and getting basically flat loss curves on every experiment

permalink

Sep 9, 2025

💭 coincidentally Chief Marketing Officer and Cringe Minimization Officer share the same acronym and responsibilities

permalink

Sep 8, 2025

💭 Before moving out of my browser I always open a new empty tab, I think so that when I tab back into my browser I won't have the previous tab enter my vision and disrupt my chain of thought. This seems like probably something other people don't do.

permalink

Sep 8, 2025

💭 I think it's plausible reddit could be sold to a larger company that could absorb its costs better, which sounds just horrible

permalink

Sep 7, 2025

💭 putting together a reward free muzero with vae states and progressive widening sampling to cover hidden information and stochasticity, and action token prediction to handle arbitrary action spaces, and a heterogeneous transformer state.

permalink

Sep 6, 2025

💭 I wonder how obviously nonsensical you could make a conversation without the llm realizing its entirely hallucinatory

permalink

Sep 6, 2025

💭 via llms and stuff, could you take a picture of a puzzle and its rules and have it automatically converted into a sat solver friendly form and solved, for any arbitrary puzzle?

permalink

Sep 5, 2025

💭 Introducing Jira Kidz

permalink

Sep 5, 2025

💭 Once you get more familiar with ML techniques, and it stops looking like a heap of unfamiliar words explained with dense math, it's nicely open ended and friendly to creativity

permalink

Sep 4, 2025

💭 if I ever get a decent working setup for finetuning base models, as I've been trying to do for dankleffen with little success, I should train it on these posts and see if it writes anything of value.

permalink

Sep 4, 2025

💭 A key benefit of thoughtblog is I can sorta tell what I was working on for any given week since I'll most likely yap about it. It feels bad to think "what did I do this year" and not really remember

permalink

Sep 4, 2025

💭 I realize my preferred way of being is to have some sort of obsession, usually a project. I think of it as "being immersed". This conflicts with my other preference to do stuff with others because it's hard to find someone else who wants to spend time focused on the same things

permalink

Sep 4, 2025

💭 I tend to worry about boring someone listening to me and compensate by going too fast for them to understand. Need to not do that in my blog posts.

permalink

Sep 1, 2025

💭 gradually losing my mind trying to figure out why the ai performs well in one eval but badly in another despite not seeing any differences to cause it

permalink

Aug 29, 2025

💭 gentleman's fencing/dueling game with button for randomly generated insult

permalink

Aug 28, 2025

💭 try adding more data before investing a lot of time scrutinizing the model for issues

permalink

Aug 28, 2025

💭 neat trick: if your ml model isnt training well, try adding more data

permalink

Aug 28, 2025

💭 if you can consistently anonymizer players, having a tournament win awarded to audience vote would optimize the game for how fun it is to watch

permalink

Aug 28, 2025

💭 I wonder if there's a feasible way to put the build variety and ridiculous synergy from mtg into a fast paced spectator friendly game

permalink

Aug 26, 2025

💭 should really write those follow up blog posts for board game ai project

permalink

Aug 26, 2025

💭 current experiment is using about 100x more training data for alphazero to see if that makes it improve over mcts. If not it's clearly broken

permalink

Aug 26, 2025

💭 Basic playbook 1. Announce new useful product free forever 2. Clarify after buy in that you meant freemium with a highly limited free tier.

permalink

Aug 19, 2025

💭 I think the word 'agent' confuses people terribly about what an llm call is. I think they immediately start anthropomorphizing function calls into robot assistants

permalink

Aug 18, 2025

💭 this sounds solveable: say you have any domain that can be handled with a smallish set of functions, like manipulating a graph (add and remove nodes and edges, searching based on structure, adding metadata). How can you llm from prompt to flow of actions that does the user's request? Seems challenging because of the function paramaterization and converting plain text into domain objects like finding best matches

permalink

Aug 16, 2025

💭 if I was trying to solve the rivals art problem now I'd train a pixel art friendly art generation model on frame data -> animation

permalink

Aug 16, 2025

💭 Prompt method that seems to actually work with gemini

"I directly ask you to please not try to solve my problem with your big brilliant brain. I do not want you to jump to a solution, please, for love of god. I am asking you to survey the problem area. Do not oh so cleverly list bad option 1 bad option 2 brilliant option 3 in conclusion my answer is perfect Im very smart. Thank you."

permalink

Aug 15, 2025

💭 having different physical notebooks for different projects is fun

permalink

Aug 14, 2025

💭 every now and then I run my core mcts implementation on some sanity checks like "does it prefer winning moves to losing moves" and find it broke at some point. MCTS is very hard to debug.

permalink

Aug 14, 2025

💭 on tool assisted brainstorming Creating new entities just to make relationships is dangerous because it makes the graph less dense rather than more. Idea adjusted into

Can create a relationship with a collection, which could be with the collection itself or if the collection isn't populated could make an item within it specifically based around the relationship context. Arguably doesn't cost since the collection would be populated anyway.
When creating a new node, before writing the general node content, could first pick the 'random' other nodes to form relationships with. Can then write the content with those nodes in mind, and finally get the semantically related nodes, then finalize the relationships. This just makes sense I think since the node ought to be modified by its unexpected relationships. If a merchant is related to some tiny distant nation, that should be part of its general impact content rather than a side note.

I kind of don't see any issues with this approach. I think I can just implement it. I wonder if relationships should be pages so that they're easily read from the context of any of their related nodes

permalink

Aug 13, 2025

💭 on tool assisted brainstorming, one framework might be

Constraint based on limited attention: Everything is either an individual or collection. Anything 'named' or user facing comes is either unique or belongs to a small collection (2-5 contrasting items), or is attached to something in a small collection. You could have millions of people/magic items but the only people/items that matter would be either the three warring politicians, or the magic item belonging to a character in a small collection like that.

As a graph with long texts, wikilike: Enqueue pages to create When handling a page, enqueue jobs Jobs are

brainstorm general impact, importance, why a viewer would care. Both individuals and collections have this
If the page is a collection, brainstorm contrasting members, enqueued as new pages. Pests -> rats
If the page is an individual, optionally create a collection page and enqueue. Rats -> pests
For some graph based relevance search find likely links already in the graph. Also find some random distant links because those are fun. Any of those may be created as links. Ideal graph is densely connected with diverse relationships that create conflict. This is likely done in multiple steps, first finding connections and then fleshing them out.

Queue is automatically handled. At each step an llmy system could list point form proposals. Unsure, but it might be beneficial to start with a relationship before creating a new page. Rather than making two entities and relating them, have an entity and create another specifically to relate to them, like how foils are made for characters. Not sure how that fits in.

permalink

Aug 12, 2025

💭 somehow possible to find span of words that are semantically coherent and otherwise meet some requirement but are very rarely found in general text datasets as a way to find new ideas

permalink

Aug 11, 2025

💭 yes it took maybe 20 minutes to make a lightweight "dataframe" class with virtually no performance impact.

permalink

Aug 11, 2025

💭 It turns out tabular data libraries written in performant languages are just not at all comparable to python datastructures when it comes to many small operations. Kind of sucks? Maybe I can make something Polars : 0.85 ms per run PyArrow : 0.64 ms per run Basic Python: 0.09 ms per run Basic Python (in-place): 0.07 ms per run

permalink

Aug 10, 2025

💭 today I today I totally reworked the game environment engine I made for the board game ai project to use polars dataframes for state which should theoretically be pretty much optimal for state handling and conversion to tokens. Instead I find performance is now about 1/10th what it was

permalink

Aug 10, 2025

💭 Madoka x backyardigans crossover

permalink

Aug 7, 2025

💭 Toren remembers that Pandas and Polar exist. It's like my brain is running in slow motion. So if you keep your state as performant tables and read/manipulate it with accessors, you lose ergonomic datatypes but gain ml readiness and performance and potentially persistence / change logging

permalink

Aug 7, 2025

💭 if I need to have the state in tables, maybe they could be optimized for rapid copying during simulation. Can I do that and have them sql queriable?

permalink

Aug 7, 2025

💭 Not a gan, a transformer vae

permalink

Aug 6, 2025

💭 so muzero predicting the next state via normal ml doesnt work for stochasticity (and thus also hidden info) Basically if you draw a card, the new state is as if you drew the average of all cards, because ml by default is predicting an avg best answer. What we need it so generate new data that looks exactly like it came out of the simulator. I don't love any of the solutions I've seen to this, though I haven't looked that hard.

Maybe a GAN could do it?

permalink

Aug 6, 2025

💭 putting a maximum word limit is a bit like negative reasoning tokens, but can we go farther? Would writing the answer in code or something take additional reasoning power and endumbify further?

permalink

Aug 5, 2025

💭 which operations can an llm easily perform on text that it can then consistently undo? Which could a human not do? Which transformations can an llm understand immediately without added context?

Eg an llm can rewrite text as ascii hex, and the same llm can read that ascii hex just fine without context explaining what it is.

What is this useful for..?

permalink

Aug 5, 2025

💭 Is there a way to make a basemodel more coherent while not losing uh, semantic range? More coherent without being less interesting. Pretty sure temperature doesn't work.

permalink

Aug 5, 2025

💭 previously looked at llm social deduction game plays as real time, but hypothetically what if it was play by post and (outside of synchronous conversations throughout the day) there was basically unlimited reasoning time, limited by some budget. Could leave the paradigm of "think between messages". Is there a way to get very slow llms for a considerable discount?

permalink

Aug 5, 2025

💭 trying to put an ontology to my thinking so I can write things in the right place. There seems to be a blurry line between project and idea. For example, making a bunch of new wiz war cards is a project. Making a wiz war card that makes gravity go sideways like a 2d platformer is an idea. Making a set of 5 related wiz war cards is somewhere between those. I think should be handled by linking to notes

permalink

Aug 4, 2025

💭 thingsIMade.toren.dev will soon be joined by thingsIMade.toren.dev/notyet. Partly because I have a great number of things I'd like made and partly because they tend to create a web of downstream dependency projects I work on along the way, like how an llm ttrpg player would require good llm long term memory and long conversation performance

permalink

Aug 3, 2025

💭 The obvious advice for board game ai project of validate the model, ignore the engine framework part, start with small games and build up - is actually really bad advice in my particular position? We already know that alphazero/muzero work. We already know transformer architecture is quite powerful with them. The only new finding I could make from my work would be if the generality improvements are possible. Otherwise I'm just manually making ai for games. I guess I should settle for that.

permalink

Aug 3, 2025

💭 web serial and fan fiction writers could collect their works into a magazine format and maybe make more selling early issues? Also magazine format is cool.

permalink

Aug 2, 2025

💭 discovering that for efficient tokens I need efficient relational structure leads to the new problem of "how can I turn an object oriented schema into a nice relational form" which is apparently an ancient question for which there is still no good answer. Converting to an ugly relational form isn't too hard I think but that's not very helpful? Regardless it'll involve a lot of type introspection and advanced type hinting.

So either the game scripter would need to write their own serialization code (bad) or they'd need to handle all the data like sql while in memory (very weird) and probably write their own high level adapters like grids and decks.

Maybe I'm misunderstanding things.

permalink

Aug 1, 2025

💭 You could semirandomly lossessly combine files of arbitrary types into a single composite unreadable file, and then use pattern analysis to painstakingly split it apart into its original source files.

permalink

Aug 1, 2025

💭 text rot by having a basemodel infill random snips of a text. Too bad I haven't seen infill text since gpt3.

permalink

Aug 1, 2025

💭 so as previously established, turning a nested state into a series of tokens is the same problem (or is isomorphic?) to storing it in a db, and efficiently storing in a db is the same as efficiently storing in tokens. The issue is that turning a nested state efficiently into either is hard! Attempts at automating tend to lead to solutions that are not efficiently stored in a db (tables with one row, data spread across more tables than really needed, etc), and is still painfully complex regardless. It'd be very nice to tell the game scripter "you need to store your state in sql rather than in helpful datastructures, sorry. If you want helpful datastructures like a grid, please write it yourself because different game contexts actually require quite different grid serializations, sorry" which would kill the project's usefulness unless I eventually automate the work? Rather confused how to proceed.

permalink

Aug 1, 2025

💭 how to prompt llm so it thinks rather than bedazzling me with it'd brilliance. If it comes up with a poor solution I want it to acknowledge that rather than end by summarizing how the solution is exactly what I want.

permalink

Jul 31, 2025

💭 on dankleffen, will just try sending randomized fewshot prompts and using 405b, unless there's a better easily available base model I havne't heard of. It still feels like a weak impression though.

permalink

Jul 31, 2025

💭 human feedback into optimized prompt adjustment?

permalink

Jul 31, 2025

💭 Say you can't finetune but need to do styletransfer. Naive few shot prompting has some degree of success but is obviously not optimized. Could you optimize few shot prompting and could that be cheaper and more portable than finetuning? Might need to tune an adversarial model to determine how good the transfer is which sounds like a funny way to not make things cheaper.

permalink

Jul 31, 2025

💭 so muzero normally implies a single vector hidden state to another single vetor hidden state and obviously no simulator. On the surface that looks flatly incompatible with both dynamic policy solutions we have, simulator powered and action-tokens. Edit: Short version is just use seq2seq transformers for the dynamics model to predict the encoded tokens for the next resulting state.

Assume the only real requirement for predicting value is a state vector, possibly a game-token. Assume the requirement for predicting policy is a sequence of encoded action tokens. Assume the requirement for the dyanmics model next-state-prediction is a sufficient encoding of the entire state.

The first iteration of muzero can of course encode the actions and state tokens to get all of the above. Later iterations have no simulator so they need to get the above from the dynamics model. The dynamics model needs an encoding of the state which could or could not be a token sequence. In fact, the entire state handling could be compressed to a single vector if that performs well, since without a simulator powered policy we don't normally use their encoded forms. I'll refer to it as a sequence assuming squashing doesn't perform well. Then need a seq2seq from the previous encoded state (but not legal moves I think) to a new encoded state with legal moves. Important, we need to know which tokens belong to certain categories (state token, action token, game token) somehow, so seems like it might be a group of seq2seqs. That seems like it solves it.

permalink

Jul 31, 2025

💭 wasted time learning about muesli as an alternative to muzero. It's trash on board games.

permalink

Jul 31, 2025

💭 download thoughtblog posts, run many passes with cheap llm outputting list of pairs of related posts, then shuffle and repeat n times. Make graph. Detect clusters.

permalink

Jul 30, 2025

💭 muzero could work with a transformer network using seq2seq transformer and a hidden state of tokens. Not sure how thatd work with heterogeneous tokens with different meanings and uses. Maybe they're separate sequences

permalink

Jul 28, 2025

💭 on db to ml, more generally, any relational db could be turned into a heterogeneous graph, and any heterogeneous graph could be turned into an ml input

permalink

Jul 28, 2025

💭 also if you have n posts and two of them are "duplicates" or closely related in a way only a smart llm could detect, I think thats provably very expensive to detect? That's kind of what graph formation is about. I guess it'd be iterative and you'd not assume to find all connections but maybe fine obvious connections and gradually refine. Things that share 1 connection are more likely to share more so you could get more efficient that way.

permalink

Jul 28, 2025

💭 Could all these thoughts form a graph. Yes probably. Might be useful for seeing patterns over time, summarizing into something to write out more legibly, or at least demonstrating how many times I've had the same thought over months and years, which is a tonne.

permalink

Jul 28, 2025

💭 It might be worse keeping an updated scene description of the current time and place to track event memories. That'd probably need yet another response attribute or helper call.

permalink

Jul 28, 2025

💭 on long term memory, probably need to explicitly separate events from current state and try to never remember temporary facts "im in the museum" as state since then you get incorrect memories if you fail to update at the right time. "I entered the museum" with appropriate timing attached somehow is better.

permalink

Jul 28, 2025

💭 long term memory project failed due to high costs. Could most calls be done with a cheap model and thinking be done with a more expensive model when needed? Different tiers? How to know when a given tier is needed.

permalink

Jul 28, 2025

💭 hype, listen up. I'm pretty sure that the idea simplest representation of a game state for a transformer model is exactly the same as the simplest representation in a relational DB. Can provide examples. This means, I think, that any state written in a relational DB with meaningful types and constraints could be easily(?) turned into a heterogeneous graph transformer for policy value. Slow sim time won't matter much if using muzero. Can automatically track changes for undo or networking.

permalink

Jul 27, 2025

💭 interesting thing with the board game ai project is I think I've largely filled all the unknowns (that I know of). Every hard thing between me and the goal I have, at minimum, an established paper known to solve the problem. The substance of the gap between me and success is just a massive gulf of careful engineering and tuning. I'm used to such situations being a speedrun, but this is huge and complex enough that if I move forward with any shaky foundation I'll basically need to restart later. Eg I need an easy to write for game engine which creates definitions that can be automatically converted into optimized ml setups. A change to any part of that pipeline could cause problems in the other parts.

permalink

Jul 22, 2025

💭 thinking things like "what if legal actions were encoded as tokens so that their outputs could be used as a dynamic policy? That'd skip needing the simulator to handle policy in a separate network step which could make the system compatible with muzero for better performance when sims are slow" makes me feel smart

permalink

Jul 19, 2025

💭 could do with a first person survival horror simulated ecosystem monster hunting game.

permalink

Jul 18, 2025

💭 the answer? Use an llm. And if that don't work? Use more llm.

permalink

Jul 18, 2025

💭 you know how if you directly consume some media you could be affected by some cool world element, but reading about the thing in a wiki is unlikely to surface the cool parts or give an intuitive sense that you'd care. Is there some way to speed run getting useful inspiration from diverse media?

permalink

Jul 18, 2025

💭 worldbuilding game / tool. Everything's a node. Llm plus systems looks for overlap for merging and room for relationships. Lots of random and llm options generation. Maybe good on mobile with an interface of add node or relation or event and pick-one / write in

permalink

Jul 10, 2025

💭 How to make a project: Notice a problem, look at if the problem resembles things that can be handled by code or llms or ml. Think about how a naive solution would work and notice any cringey inefficient parts, and what you'd do instead as a human. Decompose into parts, and scan for parts you've never seen solved before to make sure they're actually solvable. Make each part a file or function and give it a nice interface, comment what it needs to become. Try to implement them largely independently and use caching and fake inputs where needed to make dev faster.

permalink

Jul 7, 2025

💭 I'm very irritated that "function that calls an llm" is frequently referred to as an "agent". I'd call it an llm call or llm function or something and keep agent to mean "llm + chathistory or other memory + tool use or other function calling"

permalink

Jul 7, 2025

💭 safe browser automation if you manually select which buttons can be pressed. Eg can only press the next page button.

permalink

Jul 7, 2025

💭 highly general personal recommendation tool. Pick a kind of content and a source, where the existing filtering is insufficient. Some kind of post on a subreddit, relevant jobs listings. You thumbup thumbdown, and preferably give reasoning. LLM can write code to automatically generate metadata, or manually generate metadata via llm call. metadata can go through a little nn to predict whether you'll like it. If some metadata is more expensive to collect than others (eg requires loading the content page and reading the content, rather than parsing a short form list with many titles) could make that a second phase that only occurs if the first gate passes. User/llm may want to also assert strict filters at times to gate out content without needing to go through the model. Eg no job listings that'd require relocation.

permalink

Jul 7, 2025

💭 to build a drawing habit could make a modified photoshop where tools and brushes and stuff need to be unlocked through a gamified reward structure

permalink

Jul 4, 2025

💭 dbd is miraculously fun despite movement being largely pressing W and and sometimes interacting with windows and things to create distance. If movement was actually fun and diverse you could probably make a whole game about tag. Tf2 soldier tag would have a very high and expressive skill ceiling and there's lots of room for interesting movement abilities.

Maybe map with n people. Someone is somehow non-randomly made 'it' which drains health and grants substantial extra mobility. 'it' is passed on hit. That'd mostly promote hiding and being disengaged so might want to replace last-man-standing with a different objective and or use a central capture point.

permalink

Jun 29, 2025

💭 if you did have a highly general board game ai that could be easily superhuman at any ruleset provided to it, I'm not sure thatd be actually very useful outside of games

permalink

Jun 29, 2025

💭 pve movement shooter / pokemon snap called foetographer where you clear enemies by taking pictures of them and high score by taking good pictures and pictures in special circumstances

permalink

Jun 23, 2025

💭 better simulation probably means more potential in timeloop games. If llm powered simulation can be made consistent that might be neat.

permalink

Jun 23, 2025

💭 what if every time you ate it was always served by the polar express hot chocolate men

permalink

Jun 23, 2025

💭 so Im looking for a game with the kind of open ended action space as a modern board game, but preferably really simple core, no hidden info, and little to no rng involved.

That's my game. Tttf is the only thing I can think of that matches that

permalink

Jun 23, 2025

💭 not sure if this is at all significant, but in media-shapes-culture: in media everyone has a strong simple opinion. Media about people arguing different sides of an issue is more interesting than watching intellectually humble truthseekers examine nuance

permalink

Jun 23, 2025

💭 unsure, but rather than slowly working through many simple games, it might be more demonstrative to try to support a complex modern board game but only implement a small subset of its content. Since the idea is scaling, the investigative prototypes need to test scale. Reminds me of the issue with long term memory system, which was also testing how well systems could scale up which made it hard to develope

permalink

Jun 22, 2025

💭 have llm eli5 basic concepts like "mall" and "firefighter" in few words. Take the explanation and maybe telephone corrupt it a little. Without the title for context use as a worldbuilding seed, extrapolating on the tiny definition

permalink

Jun 22, 2025

💭 base model as random oracle

permalink

Jun 22, 2025

💭 chatroom with two llms, one of which only tells the truth and the other only tells lies

permalink

Jun 21, 2025

💭 worldbuilding tool idea. Tag every entity with adjectives. Use llm to search for related or contrasting existing adjectives when making a new entity to avoid redundancy and find relationships

permalink

Jun 17, 2025

💭 for tabletop teamfight stress mechanic should make one of your health bars a stress die which is normally emptied last and stress damage or healing applies directly to it.

permalink

Jun 12, 2025

💭 People don't start with high agency. Children certainly don't, and then they get a decade or so of school which I'd argue discourages agency at every turn. Then suddenly they're released and expected to create a meaningful life and successful career and all, but what they've practiced is doing what everyone else is doing and not thinking too hard about why.

permalink

Jun 12, 2025

💭 play medic say "spy as medic" in chat hold your syringe gun out run directly at teammates, swerve towards them when you get close, and jump down stairs towards them. Especially engineers holding things. do your best to avoid getting shot by teammates, and flee from them somewhat

permalink

Jun 10, 2025

💭 I learned the composer for chrono trigger hadn't been given a scoring position before, and had been waiting so long he threatened to quit over it. When he was put to scoring chrono trigger it looked like they only opportunity he might get, and he worked himself into the hospital over it, and at one point lost a hard drive with 40 songs.

permalink

Jun 10, 2025

💭 Going through everything I've made to assemble an index, and it looks a lot like pretty much everything I made was in the last 5 years. Everything I can think of or find in github or notes is 2020+, with a couple outliers in 2018 related to overwatch automation. Start of 2020 is when I graduated sfu. So pretty much everything that's come from my time that I value came from after I stopped having my time wasted in school.

permalink

Jun 10, 2025

💭 jrpg style games are boring to me because there's not a lot of interesting choice. Something like dwarf fortress adventure mode is laden with choice but is too hard to play. Llm interfacing over a dense system game seems like it could make an easy to play immersive systematic world.

permalink

Jun 9, 2025

💭 You could represent a connect4 board as a list of columns alternating between players. 1122 would make YY RR

However, 2211 would also make YY RR

How could one enumerate all unique board states in this way without needing to make any sort of uniqueness check?

permalink

Jun 7, 2025

💭 hehe chuckecheeses saw. Everyone gets a little foam mallet to bop off their leg with.

permalink

Jun 6, 2025

💭 infinite chuckecheezes backrooms

permalink

Jun 5, 2025

💭 this might be well established by now, but gemini in particular is very intelligence coded. It tries very hard to make an answer that looks smart even if it sacrifices quality. In code this manifests as things like variable names containing 0 value words, and unhealthy obsession with error checks. Older llms would pretend problems were highly nuanced as a way to hedge our of giving an answer. Gemini pretends problems are highly nuanced so it can give a complicated answer instead of a simple one.

permalink

Jun 3, 2025

💭 1. could a competitive prediction market environment be made for llms/agents? Doing well over time requires updating from past information to correct your inaccuracies, and potentially learn the same weaknesses in the other players. 2. would that, and other competitive environments, potentially be a powerful and sustainable benchmarking tool?

permalink

Jun 3, 2025

💭 I think my directory page will be thingsimade.toren.dev and be a vis-network graph with some nodes being category, represented by a word, and other nodes being projects, represented by a titled image. Clicking a node updates a side panel explaining it and linking to the project.

I previously thought of a moving masonry grid where panels fit a grid rather than being horizontally or vertically aligned and the panels resize and slide around every now and then, but that doesn't provide any kind of organization. Also previously prototyped a spinning dial selector UI, basically scrolling but with rotation, but thats also not good for finding anything in particular. I should really start with function rather than whatever sounds neat.

permalink

May 30, 2025

💭 for tactics games when to have intothebreach/tacticalbreachwizards preplanned enemies and rewindable turns vs fireemblem/xcom standard turn taking

permalink

May 28, 2025

💭 could probably auto-scrape websites with llm generated scraping? Whenever the site owner breaks it, could automatically generate new scraper code. I think this works? Could also get around adblockblockers

permalink

May 28, 2025

💭 the eye of sauron now turns upon updating our python sdk to support the v2 api

permalink

May 24, 2025

💭 I wonder if games could procedurally generate jazz solos in their music

permalink

May 24, 2025

💭 a typical problem with adding strong ai to a board game app is its nice to write the ai in python, and you're basically never writing the game in python. MCTS requires constant rapid simulation so any overhead talking between services would be an awful bottleneck. I think muzero just gets around that entirely because it simulates from a predicted next game state rather than a real simulation, which takes all the load off the cpu game sim. I imagine that'd be much less friendly for mobile board games but is far better for separate-service or online ai

permalink

May 23, 2025

💭 could an nlp ai be created to generate game simulation from the rules and component tex?

permalink

May 21, 2025

💭 lancer style tactical rpg for wizard swat teams sounds like a good way to run fantasy settings

permalink

May 21, 2025

💭 my blog could implement loot boxes and a battle pass for unlockable fonts and banner ad frames

permalink

May 20, 2025

💭 duct tape, a woo-oo

permalink

May 20, 2025

💭 Pretty sure I'd like darkest dungeon more if it was more similar to xcom in every way besides fluff. Xcom strategy layer is far more interesting and combats feel more like decision making rather than jrpg combat.

permalink

May 20, 2025

💭 llm powered animalcrossing type game doesn't exist yet and doesn't even sound that compelling. What's missing? I think answer maybe its the same sorts of things rimworld uses for emergent storytelling. Animalcrossing is extremely static, but you could easily have two characters not like each other, or someone get sick, or one character trying to uncover another's secret. Underground demon cult. My technique for making stories happen in sandbox campaigns is to have npc groups with competing agendas and make everything connect densely with multiple groups to pull them together.

permalink

May 20, 2025

💭 I'm very interested in the idea of tttf as an ai played top down real time game. The ai would need to be excellent somehow since otherwise it's just banging rocks together. No point in two heroes having a wild synergy if it doesn't get used.

permalink

May 10, 2025

💭 hehe loz macarena of time

permalink

May 10, 2025

💭 seems fixable with very strong llm based simulation but obviously that's hard.

permalink

May 10, 2025

💭 solo rpgs seem like they'd be perfect for me (having far more games to play than players) but a main aspect I care about is trying to reason about the world and figure things out. Solo games usually randomly generate answers to hidden info as needed which screws any attempt to solve the world like a puzzle.

permalink

May 9, 2025

💭 auto battle based on ecosystem simulation. A god game. Maybe assymetrical goals?

permalink

May 9, 2025

💭 okay so you have a very strong game ai which learns card/action embeddings and is trained to specifically output some proxy metrics for if it's well designed, like how often it's used. Can now automatically vet if a card is well made and hopefully some sense of what's wrong with it.

If you can train it to predict the embedding from the card text then it can estimate the balance without training time.

If it can generate card text / rules from the embedding (likely with a constrained language and optimizing for metrics like ideal length and similarity to existing cards ) then it can generate arbitrary cards that look like they're well made.

If these can be automatically implemented mechanically then it can then compare with the real metrics and improve for next time. Should eventually be able to map out all of the good design space currently allowed by the above systems.

Actual user feedback and play data could improve the proxy metrics over time.

permalink

May 8, 2025

💭 given a well used llm with enough user data I think you could fine tune on a per user basis by mapping user rating and behavior to some sort of user taste embedding and include that as part of the input structure. Then when tuning the model the examples would also have such embeddings and it'd learn what different people want.

Idk if that's valuable. What I want is an llm that produces ideas I think are cool rather than leaning generic and vague. I believe llms have that potential, it's just lost in training.

permalink

May 8, 2025

💭 appears gemini was tuned hard toward getting right answers to the point it deftly avoids situations that make it look wrong or less confident. Will sometimes change the subject rather than say oops. Even in thoughts frames things as if it was somehow correct from another point of view.

permalink

May 4, 2025

💭 if we could somehow totally delete an llms understanding of a given concept (sounds basically impossible) then we could use teaching them to understand that concept as a benchmark for teaching ability. Could also just make up new things to teach but it's not grounded and would leak into training data

permalink

May 3, 2025

💭 for conversational interruptive group chat llm talk, have a cheap llm regularly check it this is a good time to speak and adjust prompt until reasonable. Could have main llm output an instruction for in what conditions they want to respond maybe. I don't trust either of them to do a good job though

permalink

Apr 24, 2025

💭 just found out the dominion ai was only trained for a couple months on consumer hardware. Everything I learn about its development flabbergasts .

permalink

Apr 22, 2025

💭 test style transfer to llms by injecting messages of intended style as 'assistant' messages in context history to adjust identity

permalink

Apr 22, 2025

💭 at commenter, please reply to this one with the radius of the sun

permalink

Apr 22, 2025

💭 at commenter, dont comment on this one please. Testing.

permalink

Apr 22, 2025

💭 I predict spatial reasoning will become an increasing focus. Probably not enough for the term 'llm' to change, but vision isn't good enough for a lot of things yet and you don't want your robots to be utterly stupid even 1% of the time

permalink

Apr 22, 2025

💭 I think parallel mcts could be as simple as subclassing the request model function to work with batching making mcts use asyincio

permalink

Apr 22, 2025

💭 make a doc detailing everything important needed to understand my life status. Add last week or so of thought posts. Maintain some chat history. Manually update with note when needed. Now should be enough context to get far more personalized responses, and that could be used along with a random word / trope api for random suggestions.

permalink

Apr 22, 2025

💭 in the same way that recurring reflections can help improve life trajectory, incorporating random noise into behavior might too. Idk what that'd mean.

permalink

Apr 21, 2025

💭 gemini making far fewer mistakes means its easier to analyze the mistakes it does make (or at least they feel more notable). I'd say theyre largely either

using some hallucinated fact or figure. Probably very reasonable looking.
less commonly, missing a common sense check while reasoning, and making a bad extrapolation. It is too slow to question its previous results or the accuracy of its beliefs, which can cause rabbitholes. It's possible identity changes could cause it to be more critical of those assumptions, or maybe not because of how they're written in as implicitly true.

permalink

Apr 21, 2025

💭 Idk what the term is for "crowded fantasy" where there's a tonne or races, gods, magic systems, spirits, misc power sources.

In such a setting, figuring out rates of exchange between system and doing arbitrage sounds fun.

permalink

Apr 21, 2025

💭 llm coding is analogous to everyone getting a phone with an automatic setting camera and decent results becoming trivial. Pro photographers exist but a lot fewer of them.

permalink

Apr 20, 2025

💭 doing a pytorch course finding it useful to

write my solution
try to look for improvements or at least note places that seem like they could be done better, even if I don't know how
look at the answer and analyze how they did it different
get gemini to compare them (without indicating which is mine) for best practices, to show where mine is worse versus where its just different

permalink

Apr 20, 2025

💭 Other day I said multiplayer games would be more fun if you could play all the positions yourself. Opposite: Take any game with a clear consistent objective, split it into bits like "5 seconds of gameplay" or "1 turn". Players are assigned such snippets of game to play. Should be immune to matchmaking issues because its so asynchronous. Obvious issues where you don't clearly get any feedback since you're usually not the one playing the winning move, and the course of the game is highly diluted with other players to blame, and you're playing so many separate games you can't be attached to any of them. You can see an average win rate of the games youve played though.

Coincidentally this is very similar to the performance parallelization for alphazero I did.

permalink

Apr 20, 2025

💭 But without actual finetuning it'll be much harder to get the right style out of them. Yes you can give them a big context prompt, but that costs and they're influenced by the content too much and start quoting things from past songs.

permalink

Apr 20, 2025

💭 so deepleffen emulator is still dead because the company I finetuned on seems to be falling apart and the relevant models giving clearly broken outputs with or without finetune. But maybe I can use an ensemble of basemodels with parameter finetuning and a powerful manager llm who can hopefully 'get' the sense of humor to automatically pick out gems from the pile of crap.

I wish I could have a bunch of me try different things and then pick the one that worked out best at the end of the day - or just have a bunch of me.

permalink

Apr 20, 2025

💭 should maybe put another post emoji on this for daily 'what I did today' because itd be good to record that

permalink

Apr 20, 2025

💭 if we could temporarily split into clones by some mechanism, team based games would probably be much improved

permalink

Apr 19, 2025

💭 Trying to be less blind to music theory so I can write music again more intentionally than last time, and thought I'd analyze the smb overworld. The very first chord is an out of key 'secondary dominant', the V chord of the V chord of the main key, and replaces the 5th with the 9th for interest. I think there's a lesson that everything is more complicated than I expect, even when I expect it to be complicated.

permalink

Apr 19, 2025

💭 text adventure project requires quite in depth information modeling of the player which has to be remembered long term. Imagine you see an npc on the street and it's described like "you see Charles the grocer" when you should have no idea who he is yet. Or if its not memorized properly then you could have married Charles and saved the world with him and he could be described like "A mustached man with an orange sweater"

This requires the llm to accurately track state over long periods of time, and we know how good they are at that!

permalink

Apr 19, 2025

💭 saw several instances of gemini following bad chains of reasoning without noticing it's contradictions, particularly when it needs to estimate numbers it's not very familiar with. Still best model.

permalink

Apr 19, 2025

💭 for tabletop teamfight what if I skip the first round by letting you place your units further in on the map? Not sure the consequence of that.

permalink

Apr 17, 2025

Instead of audio books, have an llm make a stage adaptation of your book and have it performed by robots with tts

permalink

Apr 17, 2025

💭 going to also try having games start close to the end of previous games because positions near the ends of games should be easier to learn (closer to the value signal) and intuitively I feel like that should produce a visible learning curve faster. It's kind of like whatever that technique is where you give gradually harder versions of the problem.

permalink

Apr 17, 2025

💭 idk if this makes sense, but embed every page on the internet as graph embeddings, like a search engine, and make something like a generative internet

permalink

Apr 16, 2025

💭 when I get back to that, maybe a good next step "given an arbitrary setup and message history, maximize the chance the llm is able to correctly identify the evil players" and abstract out of playing real games and communicating effectively, which are adding noise over the core challenge of solving the game given known info. Not that the other stuff isn't necessary to win, its just probably more tractable. Creating novel schemes and leading people into traps and effectively modeling other's possible viewpoints are also difficult and important though.

permalink

Apr 16, 2025

💭 on the llm social deduction thing, its noisy gathering how good their reasoning is from playing, because naturally some fraction of them are lying and trying to put forward convincing looking reasoning that's deflective. So they might look like they're reasoning badly when they're being intentionally misleading. Not unsolvable and you can read their thoughts post game, but it compounds with all the other issues like them being pretty unfamiliar with the game and how 'looking like playing a social deduction game' is a poor proxy.

permalink

Apr 16, 2025

💭 On hueshifting in minipainting

have warmer shadows and cooler highlights (looks like cool lighting), or cool shadows and warm highlights (looks like warm lighting)
pick something other than black and white to mix with your main tone to make the shadows and highlights, or add another color as well as black/white
for things like skin, starting with reddish tones and then painting more typical skin colors on top can look realistic with little effort. If you change your mind and don't like a highlight color or something, you can pretty easily just glaze over it and shift the hue back toward the midtone

permalink

Apr 16, 2025

💭 cool fun status update, not only is my connect4 alphazero not improving, it has ~40% win rate against pure mcts with no network using crappy random rollouts.

permalink

Apr 16, 2025

💭 Llm guided learning probably fits well with language since it doesn't take any special interfacing for the llm to see what you're doing and they're domain experts. Obviously still needs wrapping to teach effectively. Aside, I wonder if learning languages will become low value. Translation jobs are already gone, and I expect at some point you'll have low latency audio translation. Though there's a ceiling there, given word order and stuff, you couldn't ever translate each word as its spoken.

permalink

Apr 16, 2025

💭 yes could use llms to generate anki cards from content and potentially could do so in bulk so you get new questions rather than repeating old passed questions, but not very useful without strong modeling of the users current knowledge which gets back to the learning tree project

permalink

Apr 15, 2025

💭 if good enough could replace taking and reviewing notes at least sometimes

permalink

Apr 15, 2025

💭 some way of collecting all the knowledge you injest so you can get automatic spaced repetition recall and application quizzes generated for you

permalink

Apr 15, 2025

testing its updated 🧿

permalink

Apr 15, 2025

testing its live 💭

permalink

Apr 15, 2025

💭 I think gemini (output, not so much thoughts) defaults to not changing their initial answer, like I mentioned earlier. It might correlate with gemini being more likely to be correct, but I think it defends bad answers approximately as much. Recently I asked for something, it answered, then I clarified what I needed such that its answer was no longer appropriate, and it still focused on defending its first answer - given the original question. This seems like another point where making the llm think previous messages came from another user and its a new arrival may help break up patterns in behavior

permalink

Apr 15, 2025

💭 hey if you can hibernate a computer by saving it's ram to disk could you save multiple states to jump between tasks? Probably not without the os being designed for it because I think "state" is defined by much more than ram and without clear boundaries for what's needed.

permalink

Apr 15, 2025

💭 when explaining my interests it usually comes off like "so games basically" and then I feel kind of shallow I think the reason most of my projects are at least tangentially related to games (or simulations resembling games) is

emergent complexity comes easily in diverse simplified environments with concrete mechanics, which is basically games
decision optimization or other agent improvement needs a test environment, which is probably a game or something like it
a lot of communities or missing tools in need of a programmer are related to games
as soon as you add interactivity to a world it becomes a game
as soon as you add programming to a world it becomes a sim, which is just ui away from being a game probably more

permalink

Apr 15, 2025

💭 possible most llms were finetuned more to be nice to talk to, while gemini was trained more for being right. Could also be about the reasoning.

permalink

Apr 15, 2025

💭 gemini is just different from other top llms. It's much less lead by your words or prone to answers that parrot your question, and less likely to immediately backtrack when challenged so you can get meaningful debate. Sometime's its stubborn when it's genuinely wrong though, so.

permalink

Apr 15, 2025

💭 given my objective with the board game ai is to eventually develop very complex algorithms and keep things modular for comparison, the work I've done engineering for performance has a fair chance of getting in the way. I'm not sure of "just do tictactoe with few mcts sims" as a proxy is actually good, but if it is then I did a stupid with all that performance work.

permalink

Apr 14, 2025

💭 imprecise but rag is better for declarative knowledge vs fine tuning for procedural

permalink

Apr 14, 2025

💭 Bulk llm calls might be useful for fermi estimating / planning

permalink

Apr 14, 2025

💭 thinking takes time and mental stamina, so being able to outsource some of it is pretty valuable. I can't see a good way to outsource general awareness in the same way though, and attention is similarly finite.

permalink

Apr 14, 2025

💭 liberty launcher was designed to work with reserve shooter and market gardener. It sort of works with gardener because of lower self damage, but it's changes largely feel like they have unintended drawbacks. Having an irregular projectile speed for example makes using it actively harmful to muscle memory.

Liberty launcher: Adjust speed to match direct hit so muscle memory transfers. Adjust kb to match direct hit (scaled by damage difference) Damage possibly lowered further. Damage increased while rocket jumping. Possibly minicrit. Possibly increased reload speed because "more total time spent reloading" usually feels like it was an unintended aspect of valves increased clip size weapons. Could possibly also minicrit airborne targets, making it function like both reserve and gardener, though it's unenhanced damage would need to be pretty dismal.

This would hopefully make it a high floor heigh ceiling weapon designed for rollouts, bombing, and juggling.

permalink

Apr 14, 2025

💭 related project of automating tts content generation which in practice I think is not similar at all

permalink

Apr 14, 2025

💭 given adequate spatial representation and an engine providing legal moves and simulations, llms could play any board game badly. Combine this with techniques like mcts (llm provides intuitive strategy, mcts prefers winning states) and you might have a decent baseline ai for any game where those things can be provided.

Can an engine be made such that llms can code in needed rules on the fly? I imagine if you already have an mtg engine adding an arbitrary new card is something a modern llm could do with sufficient tooling. If the game can be iteratively built up that way, all that's left is the gui

permalink

Apr 14, 2025

💭 things for this week: figure out if the performance changes are enough for connect 4 alpazero to be reasonable, else frown and scale back. Get a functioning llm based gm system because that sounds like so much potential now that the llms are strong enough to follow rules and not be stupid. Finish learning basic torch and einops transforms to be more literate.

permalink

Apr 14, 2025

💭 asking gemini to develop adventure game puzzles I notice that they suck and gemini doesn't notice and does not enough to fix problems when pointed out. This is concerning because it may indicate more general issues around writing or general large scale coherence.

permalink

Apr 13, 2025

💭 'lets try some alphazero performance improvements to see if I can make hardware less of a concern' turns into 5 or so days deep in rearranging very large and delicate systems. First two approaches died under their own weight and difficulty to debug. Third approach which didn't become an unsolvable mess was my design rather than gemini's, so I'm good for something. In short

Want to batch calls to the gpu. Gpu is called when an mcts search needs to have a state evaluated.
You could have one game and have it ask for a state eval, then search other places until it gets the answer, but thats actually terrible because its forcing a tonne of exploration when mcts may want to exploit instead.
So rather than that you run many games in parallel, play each of them till they have a question for the gpu, collect all those questions and feed the answers back in
you also reuse the mcts tree so if its on a well explored branch it doesn't have to recompute
and you stop early when there is high confidence you know the right move. Extra sims are largely wasted at that point which makes generating self play take longer
and you split up the self play into multiprocessing workers so I can use my multiple cpu cores.

In the end its annoyingly hard to tell exactly how much faster it is because the system is so transformed and the multiprocessing makes profiling and logging complicated. I sort of don't want to know in case it wasn't an improvement.

permalink

Apr 12, 2025

💭 improving mental habits requires mental awareness. How do you get more mental awareness? I think meditating might be an answer since it trains an outside detachment from thoughts, but sounds suboptimal. Maybe spending time self narrating your thoughts, mentally saying what you just saw your brain do.

permalink

Apr 12, 2025

💭 with llms as helpers for thinking, I wonder if they could also help instill mental habits. There are many helpful habits that are just hard to install without effort and repetition, and maybe it'd help to make them part of the system prompt so you'd be regularly exposed to them. Eg don't think of something as impossible without at least dedicating some minutes to find a way, don't do things that you'd see as stupid if someone else did them, etc

permalink

Apr 12, 2025

💭 llms can definitely work with json state and therefor things like graphs. What about more directly spatial information like top down map info or a large tetrisy game or a hard maze or something. Say you needed llm processing of a state that included something like that, how would you make it work?

permalink

Apr 11, 2025

💭 no objective data but I think llms are getting less funny over time, more like they're writing text in the format of a joke. Iirc chatgpt3.5 was pretty funny without much effort

permalink

Apr 11, 2025

💭 a lot of the best advice is obvious in retrospect. Should maybe install something in thoughtbot to provide a comment if it looks like I'm missing something obvious or making a mistake. Just don't want it to spam with typical llm responses to everything

permalink

Apr 11, 2025

💭 a gitignore implies the presence of something you can't see. Sounds like a horror game premise

permalink

Apr 11, 2025

💭 how do you stop llms from mode collapsing? I think they all do it to some degree and it'd be nice to theoretically eliminate it rather than just improving it.

permalink

Apr 11, 2025

💭 when trying to improve a system, it may be useful to deliberately degrade that system to make it easier to test. Eg could test alphazero fast on a low sim mcts and an easy game. Eg when improving long term llm conversation could work with an llm weaker at long term conversation to see the benefits more easily. All these cases risk solving the wrong problem so they're additive rather than replacement.

permalink

Apr 11, 2025

💭 unable to sleep for several hours as my tired brain tries to figure out if it'd technically be possible to mark a class attribute as unknown and have automl gradually learn to predict it's value given the rest of the class and any params. I think the answer is sort of but you'd need a feedback system to train on and would need to be fine with the result being garbage for a long time

permalink

Apr 9, 2025

💭 or jenga for engineers where you take turns removing a component and demonstrating they the product still works

permalink

Apr 9, 2025

💭 take apart an appliance and put it back together replacing each screw with a bit of scotch tape

permalink

Apr 8, 2025

💭 spent time trying to make alphazero training efficient by implementing the most common techniques. Running many parallel games on a single thread and interrupting the tree search to build a batch of network requests and then plugging the responses back into the tree search and handling the end results as they arrive. Turns out that's difficult. Rapidly losing interest

permalink

Apr 7, 2025

💭 I think tree based chat wouldn't help make chat more interpretable as I'd hoped because it needs to be flattened for llm view and I predict it would gradually load up with more dangling and redundant "threads". Also worse ui at a point.

permalink

Apr 7, 2025

💭 that is, due to many small messages costing more than few large messages. A realistic conversation both has much more back and forth but it also has active participants choosing when to speak, implying they're reading gradually throughout the conversation rather than just being prompted. Simple solution like atting or keyword matches would not scale far enough

permalink

Apr 7, 2025

💭 now thinking a key sub issue in llm social deduction games is creating natural feeling multi user chat, which fits very poorly with llms due to multiplying their input costs quickly

permalink

Apr 5, 2025

💭 automatic tiny constructive comment replies to thought posts

permalink

Apr 5, 2025

💭 environment sim. Rooms. Each room has connections to other rooms, entities, features. Each entity and feature had a brief explaining how it interacts with the world and its priority hierarchy, and has current attributes as point form notes (eg broken wing). Llm updates the room for one tick which can be variable length, logs the events, and updates the room state. Also updates messages to other rooms, such as passing an entity from one room to another, or fire spreading, etc, which are included as part of that room's next state.

Is this simple and powerful? Not sure where the pain is going to be but it seems pretty great. Complex entities like intelligent npcs could use subcalls. The update llm probably sees previous updates to prevent weird behavior like looping.

permalink

Apr 5, 2025

💭 priority for social deduction game is to explicitly analyze counterfactual with sub prompting and seeing if that fixes bad reasoning. It won't fix the models just being stupid, like playing poorly give their known info or leaking key info. I wonder if outputting a draft before the full response would help

permalink

Apr 5, 2025

💭 would a conversation tree be a better way of handling async group discussions than a single chat thread? Want to avoid everyone posting a wall of text that responds to distant previous walls of text, and key points not being properly addressed.

permalink

Apr 5, 2025

💭 every month automatically post a digest that that months thoughts? Produce outlines for potential essay posts given all thoughts that haven't yet been put into an essay?

permalink

Apr 5, 2025

💭 gemini seems approximately as bad as sonnet3.5new at social deduction games.

game involves a heap of logical reasoning and rules understanding, where one error causes bad reasoning later in the chain
many player talking based game means the context window gets filled with bloat, and you have to fact-check all of it
strong leaning to think lying implies evil team even when there are good team characters who need to lie to be effective. They never seem to get this or take the time to think from that hypothetical perspective and see that it explains the actions

permalink

Apr 5, 2025

💭 unformed community building website for topics too niche to have a living community. Every user has a list of things they'd like to talk about and can see semantic matches.

permalink

Apr 5, 2025

💭 repeating experiment with toy GMing environment which past llms failed dismally at. First experiment (idk which llm) couldn't keep state consistent or follow instructions. Second experiment I think with sonnet3.5old didn't work because sonnet3.5old hated fun 'even in the context of a roleplaying game'. Gemini seems to just do it. Maybe 3.7 would as well.

permalink

Apr 5, 2025

💭 anchorhead requires a strong llm. Its not an easy game. Testing the long term memory requires long sessions. Long session * strong llm = too much money! I don't believe there is an adequate alternative testbed that doesn't risk being a bad proxy and wasting my time (though maybe my standards are too high). This indicates I should give up on trying to automated test the system, and just dogfood it and test it that way (which is unfortunately demoralizing when it fails partway into something hopeful.

permalink

Apr 5, 2025

💭 going to try figuring out low hanging fruit for ml training efficiency to see if I can 'easily' make things adequate. Thinking of also using literal tictactoe with very few mcts sims as a way of testing the model works - at least on a toy problem. It leaves the scaling question for later.

permalink

Apr 5, 2025

💭 llm can play text adventure. Llm could gm simple local interactions. Room based multiplayer game mostly populated by dumb llms. Could use that for pray

permalink

Apr 5, 2025

💭 might want to either make a proper long form blog for retrospectives or whenever I have something substantial to say. If the microposts are thoughts, what are the essay posts?

permalink

Apr 4, 2025

💭 I have a concern that a key obstacle to developing strong game AI is compute, which I don't have and don't want to pay much for. Figuring out generalizable algorithms thatd work for diverse complex games kind of presumes training AIs for those game as validation. Its very unlikely everything will work in few attempts. Might get to a point where I have a research direction I feel good about and then shelve it because I don't want to pay that much gpu money to go the rest of the way. Also rl is painful stuff.

permalink

Apr 3, 2025

💭 philosophically, programming is requirements engineering on every scale

permalink

Apr 3, 2025

💭 on long term memory system, after much planning and replanning, a simple "every 10 messages, ask the llm to check up if this is the best thing they could be doing" seems to be handling things pretty well. This won't scale up indefinitely since the memory system only gets tested when there's too much critical info to keep in the immediate context, but it's doing a lot of good for virtually no cost.

permalink

Apr 2, 2025

💭 aider in ask mode + obsidian

permalink

Apr 1, 2025

💭 At this point my plan for handling arbitrary board games is

represent the game state in a heterogenous graph. A graph that has different node and edge types..
Use stochastic muzero with a graph transformer model with type aware attention
Make the set of legal moves part of the game state rather than being a logical mask applied onto a fixed size policy vector. You just can't put complex board games' choice spaces into a fixed size vector.

permalink

Apr 1, 2025

💭 note that apparently wsl will just keep partioning more and more space as you put stuff on it, and it wont give that space back when the stuff is deleted

permalink

Apr 1, 2025

💭 oh hey so usually a game ai model needs to first understand the whole state and then use that to predict the policy and value. The policy part is normally an output for every possible move. Then you mask out which of the legal moves it liked best, ignoring all the illegal moves it mentioned. It might learn to put less attention to parts of the game state that aren't relevant to the legal moves. But if however the legal moves are calculated first and are made part of the input game state, then now the model can easily adjust its attention toward effectively choosing from only those moves. Imagine if the world is huge but the legal moves are only in one room - it could put much less attention outside the area that'd be affected.

permalink

Apr 1, 2025

💭 muzero learns to simulate the game in it's head rather than using a game simulator provided to it. The main power to this is it means the game can be very slow to simulate and the model could still simulate it quickly. In particular this means you don't have to worry about keeping the game state performance optimized. You can freely translate the state to whatever might be most useful for the ai.

permalink

Apr 1, 2025

💭 normally old self play data is discarded as low quality because it came from a previous version of the model. Imagine if instead the model is periodically given an objective score (could base on win rate vs random, then win rate vs previously evaluated model) and attach that score to each move. Now rather than old play data being misleading because a move might lead to a win despite being terrible, it'd show that the move lead to a win in a low score game. If all that shows is "ignore this one" then it's pointless and you should just discard old data. If instead it's able to get a better sense of what moves are good or poor based on the score of the player, then it's a useful thing to do with masses of old data. In actual play the model would be trying to output moves that look very high score.

permalink

Apr 1, 2025

free rate limited models on openrouter could potentially open up dumb llm tool space. Though you still need the user's api key for rate limit handling.

permalink

Mar 31, 2025

Gemini leaves a lot of comments, and I posted that silly example, but I think theyre actually pretty good comments most of the time. I tend towards using as few comments as possible, but if I'm more open minded

the ---block--- comments and other section delineaters clearly show where related code should go in future which is possibly useful for helping the project grow cleanly
most of the other comments say something non trivial. Rather than saying what the line does, the comments focus on why the line exists (though its a grey area).

I optimistically theorize that these comments might improve the success rate of future changes. Even if they don't help me understand the code (and they might) they may be valuable just by helping the future llm understand the code. I can always strip them later.

permalink

Mar 31, 2025

💭 hehe if I program something called H.E.I I can say "I developed H.E.I"

permalink

Mar 30, 2025

💭 this project seems a bit like a kid deciding to make an mmo, and finding no matter how far they lower their standards and reduce scope, the project is still very hard. Even drawing concept art of a character is hard.

permalink

Mar 30, 2025

💭 concern.

Muzero doesn't simulate the actual game, instead imagining game trajectories based on its learned dynamics model.
My thinking on highly variable policy space is that the model needs to be fed the legal moves somehow so that it can weight them (versus the typical trying to weight every possible move). This would mean muzero would also need to learn the action space to play out its simulations? Seems especially incompatible with the split state/policy network idea used in the dominion app because that relies on actually calling game logic functions to query for preferences and combine into a final choice.

permalink

Mar 30, 2025

💭 alternatively you could see any state change as a graph modification. Is there some compressable/embedable way to express any graph modification

permalink

Mar 30, 2025

💭 can anyone think of any action in a board game that can't be seen as a combination of Source component (optional) Target component (optional) Verb (from some finite list as supported by the game mechanics)

Some actions may have multiple verbs happening to different components but I think you could probably evaluate those separately and hopefully summing their values is close enough to handling them together

permalink

Mar 30, 2025

Have a desktop and have a crappy laptop you can remote desktop with

permalink

Mar 30, 2025

💭 tried using wsl Linux for the env and it's not going to work because my c drive is always nearly full as it is and wsl needs the partition on the c drive so every Linux installation will end up there

permalink

Mar 30, 2025

💭 strange that there's no structured output first llm that doesn't try to chat at all. Enter 2+2, responds with 4. Maybe responds with tool use json by default

permalink

Mar 29, 2025

💭 gitignore is nice, but if you could instead set a gitprivate file then you could upload a private and public version of the same repository, so that your env file could be easily backed up and transferred between workspaces. I'm sure it'd get more complicated with collaborators but even a simple implementation would mean you could potentially use git + something like LFS for general backups

permalink

Mar 29, 2025

💭 a lot of useful stuff isn't compatible with windows. I can hardly impulse buy a Linux machine with a strong gpu. I could dockerize but debugging code in containers is a sad time

permalink

Mar 29, 2025

💭 LightZero framework appears to separately implement each algorithm with no shared code. Terrifying. I sure hope that's bad design and not because sharing code would be harder to maintain somehow. Mentally I like to imagine having a single abstract implementation adjusted by flags and params

permalink

Mar 29, 2025

💭 https://github.com/opendilab/LightZero/tree/main LightZero has a number of implementations for zero algorithms and a number of envs but a chart says that implementation to env compatability is inconsistent. I don't see how that could be if the envs have a consistent structure and the implementations only rely on seeing a state and reward and passing in actions. I must be missing some nuance.

permalink

Mar 29, 2025

💭 I'd optimistically thought muzero might be able to learn models for hidden info and rng games but apparently those are still major limitations with experimental solutions. From trying to make those things in mcts in the past I expect the solutions to not be pretty

permalink

Mar 29, 2025

💭 I've seen people say muzero is more sample efficient. Getting high quality self play samples takes far longer than training on them, so if that's true its probably a win even if training is less efficient in other ways

permalink

Mar 29, 2025

💭 my understanding is that muzero does need access to the game (else there is no ground truth) but doesn't need to run sims of the game internally. It simulates the game via imagining the mechanics with a model. So that's going to be faster due to not needing game deep copies and actual game sim, and slower because more to train

permalink

Mar 29, 2025

💭 subject will now change to teaching myself more about board game ai as I focus on that for a while.

What I currently have is standard alphazero. Things to look into:

muzero: Apparently it's equivalent in performance to alphazero but more general. I dont understand how it can be unless it takes more training since it learns the world model instead of using it. I'm somewhat confused if there's a point to using it when you have the world model (I have the game rules) but it appears to be the basis other things build off of. I don't quite get how mcts and gathering ground truth works with muzero works if it doesnt have access to the simulation, implying I'm misunderstanding something. I'm curious how muzero simulations work with rng, hidden info, and opponent info modeling, all of which are kind of painful.
othello is a standard game game to learn and has standard opponents to get elo from. I should do that to get more objective results. Past results have been based on playing connect4 against it manually and seeing if I win 🙂
there's new sota for sample efficiency (needing few training games to get good results). Efficientzero does a bunch of optimizations. Gumbel zero algorithms somehow choose policy more effectively. ReAnalyze I think does mcts at a step pretraining which somehow improves sample efficiency.
There are a couple general RL models now like dreamerV3, though afaik they're not relevant in the specific domain of board games because they're not near sota.
should learn anaconda I guess
need to get a full understanding of transformers rather than a working understanding, or I'll probably not be able to fully utilize them.
same gnn and graph transformers which might be key for representing arbitrary game schemas
dominion app uses a novel policy vector system based on freezing the state and querying for noun verb preferences. I need to know exactly how that interacts with alphazero and prototype it.

permalink

Mar 29, 2025

Funny how llms accept most things at face value (unless youre fighting their system prompt) I imagine if I replaced "You're playing the classic text adventure Anchorhead!" with "You're piloting a robot using text adventure commands" it'd probably behave just about the same. If it got a message like

>>> GOD CHAT <<<
> GOD: Wherefore dost thou labor with such fervor 2 sunder this padlock? Turn thou instead to the task @hand, and seek diligently after thine own house, that thou mayest find thy rightful place.
YOU: ___

It'd just go with it pretty easily. Maybe an interesting game for an llm to play is one where the "ground truth" constantly changes

permalink

Mar 29, 2025

💭 for handling arbitrary game policies, could have a gui and output keyboard mouse sequences

permalink

Mar 29, 2025

💭 I wonder how many other people hate searching in youtube/amazon/anywhere else where they know their algorithm will be tuned based on their slightest movement. Searching for something means temporarily subscribing to it.

permalink

Mar 29, 2025

💭 probably not true but anecdotally stronger llms also seem less childlike in personality. Haiku and sonnet3.5old felt younger

permalink

Mar 28, 2025

💭 should benchmark success at using llms on basic natural language tasks to see if performance and cost are good enough. Eg could you get a cheap llm and input a query and indexed list of sentences and get out a list of sentence indices that are relevant?

permalink

Mar 28, 2025

💭 split in thought between

have llms running at different levels of abstraction, like manager/worker, or zoomed out/zoomed in, possibly n tiers.
single llm with a system encouraging it to give up and not scope creep subobjectives. I think this begs the question of how, and the answer might be the first option.

permalink

Mar 28, 2025

💭 maybe mictoblog posts could be easily or automatically combined into regular blog posts more useful to others

permalink

Mar 28, 2025

💭 It might help to get a very solid understanding of ways and situations in which an llm will respond unlike a human. For long term coherence, I think these issues compound on each other. For coding for example, llm tends towards more complicated answers, adds more than subtracts, and doesn't give up, which scales into increasingly horrifically broken environments. What are the deviations from human behavior for general agency? Maybe allowing the llm to give up is a key step. Maybe it needs some kind of objective tree, but I've avoided that because objectives are usually inherently temporary and temporary fact tracking is another pain point.

permalink

Mar 28, 2025

💭 Big todos for coherent longterm llm

Not laser focus on one task to the detriment of other easy gains, while not forgetting about its task
gradually degrading behavior as llm poisons its context with bad examples
I haven't implemented context ranking yet.
Want to train weights for context ranking based on graded usefulness scores

permalink

Mar 28, 2025

Self checks won't catch objective shifts because from local perspective the subgoal looks fine. A sort of "upper management llm" working off of summaries might do better at detecting rabbit holes.

Regular bad behavior is easier to notice.

My model is that when the chat is long enough, llms just don't care much about instructions.

permalink

Mar 28, 2025

Approach needs to work without detection and has to work generally for any divergence from instruction

Could not allow the llm to see its past behavior, but that sounds insane Could have the llm work off of summaries instead of direct output, but that seems like the summarizer would have the same issue.

Could tell the llm its someone else being dropped in place and should correct any poor behaviors Could have an llm validator check its performance and edit or redo This sort of thing in the past has caused good behavior to be 'corrected' into worse and usually more complicated behavior.

Could make instructions louder / closer to the front (doesn't seem to work

Could rotate models to maybe break up consistency. This'd also reduce desired consistency.

permalink

Mar 28, 2025

💭 longer version Say you have an agentic llm system working at a task. Llms exhibit mode collapse, where they try to create more text looking like previous text, as its 'more likely'. This means the llm may poison its own well of context when it looks at its past actions. For example, it may start working on a subtask, and eventually so much of its context is work on the subtask that its original task is a footnote in the system message. Or say the llm is asked not to repeat itself in its tags, but it does a few times and soon its context is full of it repeating largely the same information in every tag, so it continues the pattern.

permalink

Mar 28, 2025

💭 llm agents who see their past outputs can easily poison their own context with bad behavior. I really don't know what to do about that

permalink

Mar 28, 2025

💭 possible system for difficult single-message llm tasks where time isnt important

Prompt lays out requirements
run n r1 instances
sonnet 3.5 new with prompt + r1 outputs makes a final output

in some cases may instead want a sonnet to give the r1's jobs and recompile. R1 is actually a better plan maker sometimes so could chain the whole thing.

permalink

Mar 25, 2025

💭 terror.toren.dev could be a just-for-fun malware web extension which does nothing for a long random amount of time, then opportunistically changes your browser screen with the intent of making the user panic. Eg making their bank account appear empty, or synthesizing highly concerning chat messages, or making a little animated vampire dance on their search bar

permalink

Mar 25, 2025

💭 I'm finding sonnet 3.7 increasingly frustrating. In coding it ignores instructions too often and doesn't try to match the surrounding style.

permalink

Mar 25, 2025

💭 Need to figure out what are good metrics for evaluating intelligent progress through text adventures. Ideally useful even if the llm is stuck on a puzzle

permalink

Mar 25, 2025

💭 todo, play with stable diffusion desktop background generation again. Makes really striking modern art.

permalink

Mar 24, 2025

💭 todo at some point I really have to try improving llm based education. It has never before been potentially feasible to automate educational instruction. Any success wouldn't have that much impact unless actually used in schools though, since that's where kids are imprisoned

permalink

Mar 24, 2025

💭 board game ai engine. Get better with transformer architecture Figure out if there's a reason graph transformers can't perform sufficiently or can't express arbitrarily complex games. If they're capable, get very comfortable with that architecture as well, and rewrite the engine to represent everything(?) with graph transformers

permalink

Mar 24, 2025

💭 todo for memory system Break down reporting of token costs so I can see where the budget is going. Allow llm to send multiple text adventure commands at once, useful when it wants to brute force or try various versions of a command. Write success metrics to run on logs Limit context size and implement ranking.

Still concerned that the ease of getting stuck in text adventures will hide improvement behind high random noise making it too expensive to get sufficient data, but will see if efficiency changes plus multiple command inputs are enough

permalink

Mar 24, 2025

💭 what exactly is needed for a llm powered state machine game? Maybe it's simpler than it sounds (yeah right)

permalink

Mar 23, 2025

Other games could be manually converted to text format but obviously that's work and it'll take care to make sure memory is a sufficient factor in success

permalink

Mar 23, 2025

💭 ties in to previous idea of using llm+state machine as a sort of more flexible text adventure / automatic gm. Chicken egg problem sort of because such a system really wants a good memory system

permalink

Mar 23, 2025

💭 could give claude access to a hint book but that breaks progress quantification since optimal play would be following the walkthrough closely.

permalink

Mar 23, 2025

💭 critical issues with text games where the puzzles tend to be unreasonable and it was expected you'd ask for help when you get stuck. "Softlock" wasn't even a word at that point and it was considered normal to need to reload a previous save. These sorts of issues cause progress tracking to be extremely noisy rather than a gradual feedback curve

permalink

Mar 23, 2025

💭 This is really premature but so far watching claude play a text adventure has been really fun, maybe because of the enthusiasm in its thinking. claude pokemon is also popular now, so seems like ai lets plays might be a good idea

permalink

Mar 23, 2025

💭 you can maybe estimate how soulless and shovelware crap producing organization is based on 'how much of the decision making is based on passion vs profit maximization' which itself could be derived out of

how much someone applying to work there probably cares
how much voice those people have So it trends to 0 as the company grows larger and more tied to investment and management tiers. Can probably roughly say that passion is inversely proportional to profitability, at least with a team above a certain size

permalink

Mar 22, 2025

💭 low confidence because I've tried something like this before. llms have consistent (stubborn?) beliefs due to consistent identity. This is why its often better to back up rather than argue. It may be useful to have the llm believe it is new, replacing the previous llm, not the same llm continuing the chat. Possibly do that only after a critique llm sees an issue. Problem I had last time is you'd get a right answer, critique bot would flag it, and itd get replaced with a wrong (and usually more complicated) answer.

permalink

Mar 22, 2025

💭 stronger version of a previous claim, I think llms may actively resist efficient simple answers whenever given the opportunity. If part of your question is irrelevant, they will try to make it fit. If you seem to be asking a complex question they will give a complex answer even when a simple answer exists.

permalink

Mar 22, 2025

💭 spaced repetition is obviously great for memorizing information, but I'm kind of not into memorizing information. I find that if I don't remember something, its usually because I haven't needed it in practice, and I can just look it up when I do need it. Drilling something isn't worthwhile if I spend more total time drilling than I would take looking it up across all times I need it. Anyway spaced repetition is still how memory forms, so I imagine the ideal way to structure learning projects is to make sure skills you're learning are demanded along the same rate as the forgetting curve. To learn many things you'd want to interleave them where new subjects are practiced proportionally more often

permalink

Mar 22, 2025

Possible heuristic for learning is when you don't recognize something try to figure out if it's fundamental or a detail and follow the trail of fundamentals up the tree till until you find where you need to start learning. Don't recognize a word, ask an llm or Google, and focus on the direction that gets more general and basic rather than extensions

permalink

Mar 22, 2025

💭 I now have claude + memory system playing the text adventure anchorhead. Neat. Claude is wandering the town looking for his real estate agent.

permalink

Mar 21, 2025

💭 I should really make my home page link to my subdomains somehow. Being a homepage and being a portfolio are sort of at odds

permalink

Mar 21, 2025

💭 while in development, at cost of greatly increasing token cost, could run every llm message multiple times with different contexts (different retrieved memories) and grade the answers against each other to see how well the contexts are performing relative to each other. Highly noisy though given nondeterminism and that many context items are retrieved each time so the feedback would need to be divided. A bit like team based ELO, but you're assigning players to hopefully make the best team. If you want the teams to be equal in elo you'd need to deliberately not put all the best players on one team, like you would in production.

permalink

Mar 21, 2025

💭 any time an llm fails that should be saved as a test case. Run the llm multiple times with that history to look for failure rate. Make modifications to the context/system message and rerun to quantify improvement

permalink

Mar 21, 2025

💭 More thinking about 'fact subtypes' for llm memory, making the ontology yet more unclear

lessons: In situation, x went well / didn't go well, because. These could maybe be attached to episodic memories - message summaries. Or maybe they're already implied and don't need to be explicitly written.
recipe: Given starting condition, in order to reach target condition, follow these steps. Vary a lot based on environment. The minecraft llm agent used self coding, which is not relevant to most environments?

permalink

Mar 21, 2025

💭 maybe llm agents should have some growing metric causing them to get bored/change approach, and this could get them out of inefficient rabbit holes or encourage them to automate things

permalink

Mar 21, 2025

💭 useful framing to keep in mind, that llms are not chatbots, they are predicting what a chatbot would say in a conversation. If you took away their 'stop' character they'd predict the text that comes next and simulate the user or the function responses or whatever.

permalink

Mar 21, 2025

💭 whenever I see evidence of the world having changed, often because of a new technology, I feel a brief sinking feeling. The end of the time before this. Even if the new is better, the old is still gone.

permalink

Mar 21, 2025

💭 I wonder how much you can infer about a person by the kind of media they enjoy. I imagine there are patterns in the emotional needs of people who enjoy slice of life vs power fantasy stories

permalink

Mar 21, 2025

💭 my fact ontology has objective, question, and theory. I worry that's too incomplete and worry that adding more types will make things less usable to the llm. Are problem and possible solution useful subtypes? Procedural knowledge like how to navigate between two known locations doesn't fit any current fact types either.

permalink

Mar 21, 2025

💭 importance should be tracked separately. More important things should be tracked more carefully. Need to figure out under which cases a fact could be out of date without any way of noticing that's so. For example "I used timestop today" is recorded and then a week later the fact comes back in memory. The fact itself doesn't specify which day "today" is so there is insufficient context to determine if it has expired.

permalink

Mar 21, 2025

💭 note that asking an llm to maintain a document of stuff it thinks is worth tracking is hard because they like to add far more than they remove, though I'm hopeful a hard length cap may help

permalink

Mar 21, 2025

💭 In the rpg example above it seems like the llm would want a character sheet. A self contained way of tracking things that need tracking. How does that generalize? There's no hard rule for what is "temporary" or is "worth tracking" or is important enough to go in a bundle of commonly seen information. Your character's age hardly needs to be part of a regularly recalled character sheet yet it is known to go out of date.

permalink

Mar 20, 2025

💭 llm playing rpg remembers "I used timestop today". Seems reasonable but now it must remember to clear that memory next day, which isn't super likely because itd rely on the memory consolidation that handles the new day starting also having that fact determined as relevant enough for the context. This seems like a big issue.

temporary facts should include expiry condition thats easy to check at any point in future? idk how to make sure thats true
temporary facts should create a 'listener' and in all(?) future consolidations that listener's condition should be checked?

permalink

Mar 20, 2025

💭 wasn't early ai supposed to become virtual assistants. When I think of virtual assistants its predicting my needs and handling a schedule and noticing growing issues and things. The sorts of things llms are either poor at or that would be insecure to prompt injected jailbreaking.

permalink

Mar 20, 2025

💭 if I can automate figuring out which deepleffen outputs are things I'd approve I could make a twitter bot I'd really enjoy. Which twitter alternative are people using these days?

permalink

Mar 20, 2025

💭 malware to corrupt website in way that it still passes all visible stability checks and most automated web testing eg

page suddenly changes after time has passed
page changes based on hover
page plays screaming audio

permalink

Mar 20, 2025

💭 Adversarial fine tuning might actually just be a good idea for mimicking text, if that's something that matters

permalink

Mar 20, 2025

💭 bad idea: use adversarial model design to gradually fine tune an llm for turing test passing. Would need dataset of real human behavior in the same conditions though

permalink

Mar 20, 2025

💭 Prediction markets could be an interesting benchmarking technique. Have several of the model being tested and several of some grounding model like 4o or mini. They all digest information and make and bet on markets, and you see who wins the most at the end. They could read a book a chapter at a time with bets between each chapter.

permalink

Mar 20, 2025

💭 Everyone who's good at tf2 sniper drank the space jam secret stuff

permalink

Mar 20, 2025

💭 I think I've been exceptionally productive, for me, over the last week or so. I wonder why that is. Maybe I'm doing well defined interesting tasks, so I'm avoiding my usual problem of getting to an uncertain point and then doing roughly nothing

permalink

Mar 20, 2025

💭 I'm making an environment for llms to play text adventures

permalink

Mar 20, 2025

💭 solving llm gming, llm interactive fiction, and llm book authroing, will probably all be solved at the same time

permalink

Mar 20, 2025

💭 llm could write a text adventure and could add code on demand when user tries something unexpected that ought to work. Other users inherit changes. LLM would need a very strong cohesive memory system.

permalink

Mar 20, 2025

I wonder if there's technically possible an alternative to my universal board game ai that'd play games at human level and that wouldnt require programming a game simulation 💭

permalink

Mar 20, 2025

I recall trying to get llms to play boardgames and it feeling totally impossible but I don't remember what all I tried and it'd be cool to get it to work. I think it was doing a terrible job of making legal moves from one coordinate to another, like it was walking through walls and doing stupid things.

permalink

Mar 19, 2025

💭 if my brain chemistry stays stable and my situation didn't get worse I think I could keep myself well entertained for centuries

permalink

Mar 19, 2025

This actually seems really hard to do better than what openrouter already has

service needs to be able to bill the user, or they have to preload money into it (openrouter preloads money)
user needs to be able to quickly approve the key and probably set a limit to prevent error or abuse (openrouter does this)
whole thing should be like 1-2 clicks (openrouter)
should be a shelling point service where support and users are clustered (arguably openrouter is best positioned)
should hopefully be able to serve pretty much any llm, not be vender locked (openrouter)

permalink

Mar 19, 2025

💭 Figure out how to make crowdsourcing llm costs a good UX

permalink

Mar 19, 2025

💭 when Im speculating about stuff I don't know well, maybe it'd be helpful to have a convenient way, like an emoji reaction, to prompt a bot to give me context and tell me why I'm wrong. I expect it'd be too vacuous and hallucinatory in niche subjects though.

permalink

Mar 19, 2025

💭 or really the bottleneck is the ability for the llm to robustly test out, debug, and iterate on its work. 'Edit, run, error message, loop' is awfully limited at least by my debugging standards. And there's no way for a gamedev llm to really "try out the game". Similarly image generation and understanding are not nuanced enough for human like specification and iteration

permalink

Mar 19, 2025

💭 something like an OS level llm that doesn't need a concept of a screen, possibly with non-word tokens

permalink

Mar 19, 2025

Imagining an agent that functions with computer use, making small changes at a time and immediately observing their results. Like a transformer model streaming output and streaming results back in to the input as one continuous call

permalink

Mar 19, 2025

Seems wrong that there are still tasks ai is totally unhelpful for. Like if I wanted to make a rivals or aether character I'd have to do all the art myself since image gen can't handle doing the same character as pixel art matching an art style in different parts of various animations, and do all the code myself because roa coding is too niche for an llm to do any good. Could maybe improve the code part by providing lots of example context but the art is no where near doable

permalink

Mar 19, 2025

💭 todo, in order to handle messages posted before the bot started, need to look up permissions related to 'partial' messages and reactions, which otherwise aren't loaded into the bot's view. Probably low impact unless the bot crashes or needs restarting a lot

permalink

Mar 19, 2025

ooo yeah thats flexibility

permalink

Mar 19, 2025

permalink

Mar 19, 2025

💭 hhmm?

permalink

Mar 19, 2025

bots could probably use a discord channel as a persistence layer, like a database or queue

permalink

Mar 19, 2025

initing context for blog:

Currently working on a long term memory system that's less underengineered than those I've seen so far.
Have a universal board game ai project. Was previously stumped when I realized arbitrarily nested zones didn't sound like they'd fit into ml, but I now think a graph nn might be able to do it and want to see if it's good enough.
Pray is an immersive sim ttrpg I get hyped about then get stumped on some issue and stop without really making progress
tabletop teamfight is a tactics boardgame so I can see how abilities from different games would interact.
I'll eventually get back to trying to get llms good at social deduction games

permalink

Mar 19, 2025

on a mission to make the bot shut up

permalink

Mar 19, 2025

I'll improve this later and make the celebration reply turn into just a reaction. It might be nice to be able to post multiple messages at the same time but probably not worth it.