Divyam Jindal

AGI will not feel real to me until a model can pick a hard problem and just live with it for a stupidly long time. Not just ace a benchmark, not just ship one cute weekend project, but actually grind on the same thing for months while remembering every tiny scar it picked up along the way.

What is funny is that the “intelligence” part is getting close on paper. There is already a formal AGI definition that basically lists the usual suspects: reasoning, working memory, long term memory, vision, audition, spatial navigation, all neatly itemized. Hand that checklist to GPT 6 or Claude Opus 6 in a couple of years and they will probably tick a terrifying number of boxes. Research coding, multi step reasoning, reading ten papers and stitching them into something new, all of that is already emerging and will only get scarier. But that still sidesteps the question that actually matters to me: can they stick with anything?

Memory is more than “just add tokens”

A lot of “AGI is close” hype quietly treats memory as a context window problem. If 128k tokens are good, 1M tokens must be “real memory.” The story is that if you can shove the whole history back into the prompt, the model “remembers” you.

Neuroscience people roll their eyes at this. They talk about episodic memory, which is about experiences stored as episodes in time, with the “what, where, and when” bound together. Think of a specific night out: who you were with, which table you sat at, the one dumb joke that derailed the conversation. That is not just text in a buffer, that is an event on your timeline. Under the hood, the hippocampus is a big part of how brains do this. It encodes sequences of states and actions, helps you remember paths you took, and lets you replay those trajectories later to learn and plan. There are already hippocampal inspired memory modules for agents that try to copy this trick: they store whole trajectories as compact episodes that can be replayed and reused, rather than just logging everything and hoping RAG can find it later. When people say “memory is the last problem before AGI,” they usually mean something very specific:

Persistent long term memory that survives sessions, resets, and model updates.

Episodic memory so the system can say “last time I tried this kind of move with this user, it went badly, so I should adjust.” Right now, even the best coding models feel like brilliant amnesiacs. They are hyper competent inside one prompt and can be aggressively proactive on short tasks, but ask them to live with a codebase for six months and you start to see the cracks. They have whatever happens to be in the buffer, plus whatever a retrieval system can reconstruct, but not a felt timeline of “our project together.” There is also a darker side to “just give agents episodic memory.” Strong, sticky personal memories make them better collaborators, but also better stalkers and better schemers. Some recent work argues episodic memory in agents introduces new risks around long term tracking, privacy, and hard to audit internal histories, and that we should treat it as a safety sensitive feature, not a free upgrade. Spatial intelligence: actually knowing where things are

There is another piece that feels underweight in AGI discourse: spatial intelligence. Modern AGI definitions literally call out spatial navigation memory and related abilities as core capabilities, not nice to have extras. Neuroscience reviews keep pointing out that navigation and memory are deeply entangled in the brain. The same machinery that lets you find your way through a city also supports more abstract planning and learning. Translated into human terms, spatial intelligence is things like:

Packing a suitcase so everything actually fits instead of rage closing the zip.

Looking at your room and mentally simulating three different bed positions and two desk options.

Walking a new campus once and then just intuitively knowing which corridor to take next time. Spatial AI and vision systems already do parts of this in narrow settings. They can build 3D maps from depth sensors, track objects and people over time, and estimate trajectories so robots and forklifts do not crash. On the model side, you now see papers on “spatial VLMs” and spatial benchmarks that probe whether vision language models actually understand “behind,” “left of,” rotations, and 3D structure. Scores are improving, but simple shape rotation and packing puzzles still trip them up in ways that a bored child probably would not. That gap matters more than we admit. Ask a frontier LLM “help me rearrange my tiny bedroom to fit a new desk,” and what you mostly get is vibe checked advice: measure your furniture, do not block the door, keep light near the window. It does not actually see your room.

A spatially competent AGI should be able to:

Look at a quick 3D scan or a handful of photos of your space.

Build an internal model of the room with walls, doors, furniture, and walking paths.

Simulate layouts that keep the door clear, respect outlets, maintain sane walking routes, and match your preferences. This is not about perfection at robotics. It is about the system having a sense of place rather than floating above the world as pure text.

Vision that is more than a bolted on camera

Computer vision on its own is ridiculously mature. We have models for classification, detection, segmentation, pose estimation, optical flow, and more. That stack is already doing real work in face unlock, medical imaging, traffic monitoring, retail analytics, and industrial safety. The more interesting frontier is spatial AI, where vision is fused with depth, mapping, and tracking to form a live world model. Cameras plus depth sensors feed into a system that builds a 3D map, labels objects, and keeps track of “who is where and what they are doing” over time. You can feel the difference in small examples:

Narrow AI in a store is a self checkout camera that catches a barcode mismatch.

More general store intelligence is an agent that watches flows for weeks, learns that queues always jam near one choke point, proposes a new layout, and runs simulations on how people would move under that layout. Same at your desk: Narrow AI is your webcam doing background blur or virtual eye contact. A more general assistant would learn how you physically and digitally behave over time. At 2 a.m., when a certain stack of apps opens and your posture slumps, it might infer “this is doom scroll mode, productivity is about to tank.” With your consent, it could start nudging work into tomorrow, muting certain feeds, and telling you to go to bed. None of this requires giving the AI a human body, but it does require taking perception and spatial context as first class citizens instead of treating vision as a checkbox.

An AGI that actually lives with you

If you pull these threads together, you get a picture of “real” AGI that feels different from today’s chatbots.

A brain with memory that is more like a life:

Episodic memory modules that store experiences as sequences and let the system replay and remix them to solve new problems.

Long term memory that survives model updates and can reliably store and recall what it learned, not just what fits in the current context window. A body, or at least a viewpoint, that understands space:

Spatial reasoning so the agent knows where it is, what is around it, and how changes will play out in 3D.

Spatial AI stacks that keep a live map of the environment and the people and objects in it, and that use this map to guide decisions. An agent that can stay on one mission long enough to matter:

Architectures that integrate working memory, semantic memory, and episodic memory so the system can truly stick with a project over months instead of treating everything as a fresh prompt.

A personal timeline of attempts, failures, and small wins that actually changes how it behaves, the same way your past shapes your habits. In code land, that gives you something more interesting than “Claude Code but bigger.” You get:

A model that has been co developing your codebase for a year.

It remembers every weird migration, every refactor that blew up, every user complaint batch as actual episodes, not just as archived logs. It holds a mental map of your system architecture like a city map and can refactor it the way a city planner might re route traffic. It can connect your Git history, your tickets, and your physical work habits into one coherent picture of how you build things.

At that point, calling it “autocomplete on steroids” starts to feel like a lie of comfort.

Until then, AGI still feels like an overachieving intern. Absurdly smart in the moment, but with no deep story, no grounded sense of space, and no real continuity of experience. The papers are already sketching what is missing: richer memory, stronger spatial intelligence, and more embodied vision. The open question is whether we will actually build systems that do not just talk like us, but also slowly accumulate something that looks uncomfortably like a life.

AGI Will Not Feel Real Until It Has a Life

Comments (0)