SuperLLM Council feels like the natural sequel to Jarvis and Hive Mind: not just “one smart assistant,” but a ruthless, self‑improving layer on top of every major model that keeps evolving personas, killing the bad ones, and quietly converging on the best possible brain for each task and each human.
Setting the scene i’ve been working on jarvis for a while now, but it still feels like a very cracked prototype of what it could be. it does some cool random shit, it can control stuff, but it still behaves like one model trying to be everything at once. that’s fine for a demo, not fine for the kind of future i actually want to live in.
then a weird combo of things clicked together in my head: pewdiepie’s AI video where he runs this little persona council that literally deletes characters if they underperform, andrej karpathy’s whole LLM council idea, and anthropic’s claude + “ceo layer” experiment where they stacked similar models and still got non‑trivial gains before hitting the “they’re too similar” wall. all of them are dancing around the same point: single‑model brains are mid. the real game is orchestration, selection, and evolution over time.
Pewds, councils, and personas
pewds accidentally shipped a better research lab than most labs. in his video, he set up a system where different AI personas played roles, got judged, and literally got wiped if they sucked. you end up with a kind of evolutionary pressure on vibes: bad characters die, good ones survive and spawn variations. over time, the “cast” converges toward what actually works, not what sounds smart in a pitch deck.
what stuck with me wasn’t the memes, it was the mechanic. imagine doing that, but instead of characters for entertainment, you apply that to real work personas: researcher, hacker, therapist, strategist, meme lord, founder coach, pitch writer. you start to see which ones actually move your life forward and which ones are just aesthetic noise. and instead of one model pretending to be all of them, you let different models compete under the same persona constraints.
Claude, CEOs, and similarity traps anthropic mentioned that when they tried a “ceo layer” on top of claude, performance went up at first, but then plateaued because both layers were fundamentally the same brain. there’s only so much extra signal you get when the manager and the worker share the same blind spots, the same biases, the same failure modes.
that’s where karpathy’s council framing gets interesting. his take is: don’t just stack more of the same, stack different. multiple models, multiple agents, different priors, different weaknesses, all staring at the same problem from different angles. ensembles are not just an accuracy hack; they’re a way to escape local minima in thinking.
the missing piece for me was: okay, so a council of models is cool, but who designs the personas, and who decides which model gets which skin?
The Super LLM Council idea superLLM council is basically that missing glue: an outer system that doesn’t just pick “the best model,” but continuously discovers which persona–model pairs are cracked for which kind of task, and then keeps reshaping that mapping over time.
core loop in my head looks like this:
define a persona once: “unhinged but correct junior dev who overcomments everything”, “obsessive ops manager that hates ambiguity”, “investor who only cares about unit economics”.
run that same persona across all the big models: GPT, claude, gemini, your OSS beast on H100, whatever you add next.
fire tasks at this persona across models: coding, planning, brainstorming, reasoning, shitposting.
score and log everything: latency, cost, subjective quality, how often it hallucinates, how often you actually accept its suggestions.
cull and mutate: kill off underperforming persona–model pairs, tweak the survivors, spawn new variants.
repeat, forever.
the result isn’t “which model is best?” but way more granular: “this exact sarcastic researcher persona on this exact model is god‑tier at literature review, but mid at writing actual production code.” that’s a cooler asset than any benchmark screenshot.
Concrete stack and experiments the fun part is that this doesn’t have to stay a thought experiment. i’ve got access to a few H100s through modal, so the current plan is something like:
spin up a big OSS model (something like a GPT‑scale 120B on H100, probably QN‑style) as the weird, chaotic member of the council.
plug in the usual suspects: gemini, claude (opus or sonnet 4.5), GPT 5.2 (sota‑tier speed demon), maybe others as they show up.
build a unified persona spec layer: one YAML / JSON definition that can be translated into prompts / system messages for each provider.
add aggressive logging: every run tagged with persona id, model id, task type, cost, latency, and outcome rating.
build a graph view: something that makes it obvious which persona+model combos are secretly carrying and which are just cosplaying competence.
the UI in my head is very pewds‑coded: a panel of persona cards, some glowing, some fading, some with “dead” stamped across them because they underperformed and got wiped. except behind that aesthetic is a real research engine discovering structure in model behavior.
Why this feels like “super” the “super” in superLLM here isn’t “bigger model” or “more parameters”. it’s the meta‑brain that sits above them and learns how to use all of them over months and years of interaction with you.
over time, this thing could:
know your taste: maybe you like claude’s writing but gemini’s recall and GPT’s code; your council learns that routing.
automatically specialize: it notices that one persona on one model is cracked at debugging your specific stack, so it starts defaulting there.
adapt to model drift: when providers quietly update their models, the council metrics shift and personas get reassigned without you having to care.
become your personal AI lab: you’re running a continuous, private benchmark on “what actually works for my life”, not “what scores well on some static eval”.
this is also where it folds back into the hive mind idea. hive mind was “one external brain that sees everything and acts for you”. superLLM council is what happens when that external brain realizes it doesn’t have to be one brain at all. it can be a shifting parliament of specialized weirdos, curated by data instead of vibes.
The big futuristic angle push this forward a few years and you get something kind of wild: every serious human has their own superLLM council, constantly evolving in the background. models come and go, providers rise and fall, but your council’s experience compounds.
new model drops? it gets A/B tested as just another worker in the council. if it’s cracked for a certain persona, it starts stealing traffic; if not, it fades.
your council becomes a kind of personal “model‑native OS”: everything you do with AI is routed, evaluated, and optimized by it.
people trade not prompts but council configs: “here’s my founder mode council setup”, “here’s my research brain”, “here’s my relationship therapist panel, don’t judge”.
and then the really cursed thought: multiple humans, multiple councils, all talking to each other. your superLLM doing deals with someone else’s, negotiating schedules, contracts, collabs, maybe even pricing between agent systems. the unit of interaction stops being “you vs app” and becomes “council vs council”.
money starts to look like just one primitive in that system: tokens flowing between agents that buy compute, data, and capabilities on your behalf. half the economy is just councils paying other councils so their humans don’t have to think about it.
Building vs just vibing for now, my plan is way simpler: start the experiment. wire up the models, log everything, thrash personas, steal the best ideas from pewds’ persona recycling and karpathy’s council framing, and see what actually survives contact with reality.
some links if you want to see the core inspirations:
pewdiepie’s AI persona council video: https://www.youtube.com/watch?v=qw4fDU18RcU&t=624s
andrej karpathy talking about LLM councils: https://github.com/karpathy/llm-council
anthropic’s claude / claudius and orchestration ideas in their updates: https://www.youtube.com/watch?v=5KTHvKCrQ00
if jarvis was “one assistant that lives with you”, superLLM council is more like “an evolving studio of specialists that compete to serve you best”. and the fun part is that you don’t have to pick a side in the model wars anymore. you just run the tournament and let the council decide.
Comments (0)
No comments yet. Be the first!