Subscribe on Apple Podcasts • Spotify • Deezer • Overcast • Pocket Casts • Radio Public • iHeartRadio
- Description
- Transcript
Linda: Many of our other mammalian friends get born and immediately get up and run around. We don't. We sit there for three months looking and listening. That's what we do. Got to believe it's important, right?
[THEME MUSIC]
Abha Eli Phoboo: From the Santa Fe Institute, this is Complexity.
Melanie Mitchell: I’m Melanie Mitchell.
Abha: And I’m Abha Eli Phoboo.
[THEME MUSIC FADES OUT]
So far in this season, we’ve looked at intelligence from a few different angles, and it’s clear that AI systems and humans learn in very different ways. And there’s an argument to be made that if we just train AI to learn the way humans do, they’ll get closer to human-like intelligence.
But the interesting thing is, our own development is still a mystery that researchers are untangling. For an AI system like a large language model, the engineers that create them know, at least in principle, the structure of the algorithms and the data that’s being fed to them. With babies though, we’re still learning about how the raw ingredients come together in the first place.
Today, we’re going to try to look at the world through an infant’s eyes. We know that the information babies are absorbing is very different from an LLM’s early development. But how different is it? What are babies seeing, hearing, smelling, and feeling? How are they processing it? And how much does the difference between babies and machines matter?
Part One: What do babies see, and what do they know?
Mike Frank (00:58.787)
I'm Mike Frank. I'm a professor of psychology at Stanford, and I'm generally interested in how children learn. So how they go from being speechless, wordless babies to, just a few years later, kids that can navigate the world. And so the patterns of growth and change that support that is what fascinates me, and I tend to use larger data sets and new methodologies to investigate those questions.
Melanie Mitchell (02:21.363)
What are the major challenges of doing experiments on babies and young kids?
Mike Frank (02:28.334)
Well, we're in the era of big data where there are these web scale data sets that seem like they're super accessible \\ and the challenge of working with babies is that it's much more of a retail operation. You know, if you want to deal with the baby, you have to recruit that family, make contact with them, get their consent for research. And then the baby has to be in a good mood to be involved in a study or the child has to be willing to participate.
And so we work with families online and in person. We also go to local children's museums and local nursery schools. And so often for each of the data points that you see, at least in a traditional empirical study, that's hours of work by a skilled research assistant or a graduate student doing the recruitment, actually delivering the experience to the child.
Melanie Mitchell (03:23.048)
So how is it that children learn so much in the first few years of life? I mean, it seems like they learn a lot more than than in any other part of life. Is that true? And how do they do it?
Mike Frank (03:38.523)
Well, there's a lot of definitional questions there, right? What does it mean to… you know, how do we quantify the amount that a kid is learning? So going from not being able to talk at all to being able to talk is pretty amazing. I mean, if you've ever studied another language, you know, started as an adult, there's this really amazing transformation of like, can say a few words, like I can say, you know, Mary went to the store in this new language. And then there's this immense frustration of like, wow, there are just a world of details that I don't know how to say. And I don't, I'm not going to know the word for anything in the hardware store. So in some sense, the transformations that kids go through are from not being able to do something to being able to do it. And that's just incredibly exciting and can be very revealing of the process and what it takes to do that learning. Even if as it turns out, the details are being worked out over the course of your entire lifetime. I mean, we're continuing to learn words forever as you continue to read and experience new things. as new concepts get defined, you learn the words for those so what we're seeing in kids is really the beginnings of this lifelong learning process.
But before we can speak, and before we can even move, we spend a lot of time as babies doing… basically nothing. Or what looks like nothing.
Linda: Many of our other mammalian friends get born and immediately get up and run around. We don't. We sit there for three months looking and listening. That's what we do. Got to believe it's important, right?
This is Linda Smith.
Linda Smith (01:13.24)
Okay, so I'm Linda Smith and I'm a professor at Indiana University. I'm a developmental psychologist and what I am interested in and have been for a kind of long career is how infants break into language. And some people think that means that you just study language, but in fact, what babies can do with their bodies, \\ how well they can control their bodies, determines how well they can control their attention and what the input is, what they do, how they handle objects, whether they emit vocalizations, all those things play a direct role in learning language. And so I take a kind of complex or multimodal system approach to trying to understand the cascades and how all these pieces come together.
Several decades ago, a popular method for studying babies was something called preferential looking studies. These studies involved researchers observing where babies looked, and for how long, when given a stimulus, like dropping an object nearby. In theory, this seems logical — if babies look at the object, then that means they’re paying attention to it, and we can draw a line from the stimulus to their internal cognition. Except, maybe not.
Linda Smith: I have done many, many in my life, okay? And everybody who does them knows what the data looks like, okay? It is always a mess and you're just on the edge of reliability. \\ (15:31.224) And young babies, this is just not straightforward. So I personally will never do a preferential looking study again ever. And everybody knows the data looks bad.
Instead of the traditional methods in which researchers watch babies and record what they do, So instead of studying the babies by looking at them, Linda helped pioneer a different method: looking at what the babies are looking at, from their perspective. She and several other researchers started mounting cameras onto babies’ heads to capture the visual and audio input they got — the raw data they were absorbing.
Linda Smith (18:33.112)
I began putting head cameras on babies because people have, throughout my career, major theorists, have at various points made the point that all kinds of things were not learnable. Language wasn't learnable. Chomsky said that basically. All this is not learnable. The only way you could possibly know it was for it to be a form of pre-wired knowledge. It seemed to me… even back in the 70s, that my thoughts were, we are way smarter than that. And I should sure hope that if I was put on some mysterious world in some matrix space or whatever, where the physics work differently, that I could figure it out. Okay. And so, but we had no idea what the data are.
Mike has also done a lot of work collecting data from head-mounted cameras on young kids.
Melanie Mitchell (19:32.187)
What motivated you to take on this kind of data collection project?
Mike Frank (19:48.581)
Well, when I was back in grad school people started working with this new method. started putting cameras on kids' heads. so Pavan Sinha did it with his newborn and gave us this amazing rich look at what it looked like to be a newborn perceiving the visual world. And then pioneers like Linda Smith and Chen Yu and Karen Adolf and Dick Aslan and others started experimenting with the method and gathering these really exciting data sets that were maybe upending our view of what children's input looked like you
And that's really critical because if you're a learning scientist, if you're trying to figure out how learning works, you need to know what the inputs are as well as what the processes of learning are. So I got really excited about this. And when I started my lab at Stanford, I started learning a little bit of crafting and trying to build little devices. You we'd order cameras off the internet and then try to staple them onto camping headlamps or glue them on at a little aftermarket fisheye lens. We tried all these different little crafty solutions to get something that kids would enjoy wearing.
Melanie Mitchell (21:57.679)
So what are the most interesting things people have learned from this kind of data?
Mike Frank (22:03.441)
Well, as somebody interested in communication and social cognition and little babies, I thought the discovery, which I think belongs to Linda Smith and to her collaborators, the discovery that really floored me was that we'd been talking about gaze following and looking at people's faces for years that human gaze and human faces were this incredibly rich source of information. And then when we looked at the head mounted camera videos, babies actually didn't see faces that often because they're lying there on the floor. They're crawling. They're really living in this world of knees.
Linda: Little babies get about 70 % of their time under three months where the input is big, simple edges going in the same direction.
Linda again.
Linda: Big, simple edges. Now, why is that? Well, it's all they can see, so they look at what they can see. \\
Mike: And so it turned out that when people were excited to, you know, spend time with the baby or to manipulate their attention, they would put their hands right in front of the baby's face and put some object right in the baby's face. And that's how they would be getting the child's attention or directing the child's attention or interacting with them. It's not that the baby would be looking way up there in the air to where the parent was and figuring out what the parent was looking at. So this idea of sharing attention through hands and through manipulating the baby's position and what's in front of the baby's face, that was really exciting and surprising as a discovery. And I think we've seen that borne out in the videos that we take in kids homes.
Linda: They wait till they get all the basic vision worked out before they can do anything else. The first three months to find the period of faces, they recognize parents' faces. They become biased in faces.
So while they’re looking and listening, we might want to ask, what do babies already have in their toolkit? Do they have certain innate knowledge or skills that help them learn more? Innateness has long been a controversial and contentious issue in developmental psychology. Linda doesn’t even think the notion makes sense.
Linda: What does it mean to be innate? I once had a discussion with Susan Carey, this is quite recently, so I think it's going to be correct, my characterization of her, but in which the claim was made, that as long as something was nearly universal and pretty hard to stop from happening, it was innate. Now, this would mean, for example, that… my fingers are innate, but from a developmental point of view, this is not true. Everything gets made in time and is part of a complex cascade from the molecular level on up. There are ways it can go wrong. \\
So it just is a question that doesn't make sense to me at all. You want to know how it's made. I'd like to give one example that I give in my Psych 1 course on this because it's so clear. When baby rats are born, they look like little erasers. They actually can't control their bodies or move. They have no hair. But \\ They manage after being born to… (07:19.608) immediately find their way to the ventral side of their mother so that they can suckle. It is absolutely, it's totally proper to call it innate and an instinct in the core knowledge sense. However, we actually know how this happens and how it can go wrong. So \\ there is a reflex in baby rats that \\ when babies are being pushed through the birth canal for birth, that reflex gets linked through learning to the smell of amniotic fluid. And they know it's a smell because scientists have gone in and made it mint -smelled or banana -smelled or whatever. And they know that that elicits it. And if you're not born through the birth canal, you don't have it. They also know it's the pressure of being squished because they do little cesarean sections and either squish them to mint or squish them to banana. It's absolutely learned by one of the most fundamental learning processes, the form of classical conditioning, okay? But you would think because they all do it, okay? They all do it. So what does it mean to be innate? It's not a good question, okay? It's just not.
Human development is like an intricate, interlocking set of dominoes. Most of us start out with similar sets, but the dominoes might not fall the same way if the conditions are different, or if someone swipes a hand across one of the pathways and interrupts it. The dominoes are set up so that babies take in basic input first, like big, simple edges. And then they move on to faces, and from there they move on to more complex information.
Linda: I think what Mother Nature is doing is, it's taking the developing baby who's got to learn everything in language and vision and holding objects and sounds and everything, okay, and social relations and controlling self-regulation. It is taking them on a little walk through the solution space, okay? (27:38.936) so they don't get lost somewhere. They make that walk. The data for training children has been curated by evolution.
As we know, one of the major distinctions between us humans and LLMs is that the input we learn from is totally different. To start, humans are, as we’ve mentioned before, interacting with the real world in all kinds of ways — looking, listening, smelling, tasting, and feeling. But let’s just focus on language for a moment, and how many words humans absorb before they start speaking. Here’s Mike again.
Mike: It's a trade secret of the companies how much training data they use, but the most recent systems are at least in the 10 trillion plus range of data. And just to calibrate that you know, a five -year -old has maybe heard 60 million words. That'd be a reasonable estimate. That's kind of a high estimate for what a five -year -old has heard. So that's, you know, six orders of magnitude different in some ways, five to six orders of magnitude different. So the biggest thing that I think about a lot is how huge that difference is between what the child hears and what the language model needs to be trained on. \\
Kids are amazing learners. And I think by drawing attention to the relative differences in the amount of data that kids and LLMs get, that really highlights just how sophisticated their learning is.\\
And from Linda’s point of view, quantity does not equal quality.
Linda Smith (36:57.752)
If you were to take a system and you were to train it on all the data, visual, auditory, language, whatever that you think is relevant, that you can grab your hands on, what would happen to knowledge? Would it get better over time? Just more data from what people have done or someone said or whatever? Would it get smarter over time? You know, you kind of think about the geocentric models of the, you know, planets and the sun going around Earth, right? There was nothing to stop people from continuing to, why would you just not think that forever? Every time you fed something into your AI system, there it would be. Everybody loves a geocentric model, okay? It's like great, okay? But it wasn't great. It was wrong, okay?
Melanie (39:48.001)
But I guess the question is sort of what is it a matter of getting better data or getting better sort of ordering of how you teach these systems or is there something more fundamental missing?
Linda Smith (40:05.848)
Well, I think it's, I don't think it's more fundamental actually, okay. I think it's better data. I think it's multimodal data. I think it's data that is deeply in the real world, not in human interpretations of that real world, but deeply in the real world, \\ data coming through the sensory systems \\ It's the raw data. It is not data that has gone through your biased, cultish views on who should or should not get funded in the mortgage, not biased by the worst elements on the web's view of what a woman should look like, not biased in all these ways. It's not been filtered through that information. It is raw, okay? It is raw.
In fact, Linda has quite a few thoughts on whether or not the project of large language models is really about intelligence. Especially when you compare their learning process to babies.
Linda: \\ This is about money. Why would you use crap data? Why would you just pull anything off Reddit and put it in there? Why would you do that, okay? Again, would you raise your child this way? No.
\\ That's not what their task is, is to build intelligence. Their task is to build better search, to take over writing plays, \\ you really don't want it used in any important case like building bridges or curing cancer or even diagnosing whether somebody should get a screening test or not. Right. You don't want to do that. So you're not really saying it's intelligent. It's like a very bad Wikipedia.
Melanie Mitchell (01:10:23.553)
I mean, believe me, there's many people who say it's intelligent.
Linda Smith (01:10:56.28)
Hey, you know, some of the things it did show, okay, it showed that for the sentence frames and syntactic properties of language, you don't need anything special, just a lot of data, okay? So it seems intelligent that way, okay? \\ And it can for things that are common and not too weird, you know, like how do I get my garage door open or open when I can't open it? Okay. It's pretty good. Okay. But so is YouTube. \\ So, \\ I don't think it's, it's not, it's not science, okay? It's no more about science than Facebook was about social relations.
TK REFLECTION
So in Part One, we asked about the data — what are babies seeing? What are they hearing and smelling and feeling? But then, there’s also the baby. How do they absorb that input? What sticks and what doesn’t?
Part Two: How do babies learn?
Clearly, Linda sees the quality of real-life data as really, really important. But it’s not the only factor that influences how babies learn and develop intelligence.
Mike: Maybe \\ it's actually the baby, not the data.
Mike Frank again.
Mike Frank (25:53.843)
\\ This is right where the scientific question is for me, which is what part of the child as a system, as a learning system or in their broader data ecosystem makes the difference. And you could think, well, maybe it's the fact that they've got this rich visual input alongside the language. Maybe that's the really important thing. And then you'd have to grapple with the fact that adding, just adding pictures to language models doesn't make them particularly that much smarter. At least in the most recent commercial systems, adding pictures makes them cool and they can do things with pictures now, but they still make the same mistakes about reasoning about the physical world that they did before.
Maybe though it's having a body, maybe it's being able to move through space and intervene on the world to change things in the world. Maybe that's what makes the difference. Or maybe it's being a social creature interacting with other people who are structuring the world for you and teaching you about the world. That could be important. Or maybe it's the system itself. Maybe it's the baby and the baby has built in some concepts of objects and events and the agents, the people around them as social actors. And it's really those factors that make the difference. \\
And unsurprisingly, trying to train a baby by throwing a ton of language at it isn’t going to be very effective. Unlike large language models, babies and children are selective with their attention — they only care about what’s interesting to them, and what fits their needs.
Mike: The best context for supportive language is through things like reading or storytelling or singing. It's these interactive things that are engaging to the child, that are fun for the child and the caregiver, and that allow the child to explore and ask questions and so forth. All of that interaction has been shown in a variety of different studies to support language development. Just putting on word salad in the background isn't going to do that. The child won't attend to it. \\ So don't train your baby like an LLM. Take care of them and if you want to add some language to the mix, reading to them is always fun. It's a snuggly nice time. I love reading to my kids and I think it does help support their reading once they get to school.
Melanie Mitchell (33:53.041)
But don't read them Wikipedia pages.
Mike Frank (33:58.946)
Yeah, I knew a guy who read Proust to his six month old. And, you know, I think that's because he wanted to read Proust and he was bored while he was holding his six month old and good for him. It was great. You know, I think he got to snuggle with his six month old and six month old probably liked hearing the sound of the words or something, you know, maybe to put them to sleep. And he enjoyed reading the Proust, but it's not because that's going to fundamentally make the kid, you know, become the next Proust.
And if you flip this logic the other way around and try to train a computer with the data babies are getting, it doesn’t go very well either.
Mike: If you train a computer vision model on ImageNet, which is this huge corpus of images that's kind of the de facto standard in computer vision, the models get really good. And that's because ImageNet is this curated set of images of objects, different classes and different poses, all set up to provide this really diverse and rich data set.
you go to videos from a child's home, you train models on that. And the video is all of the kid playing with the same truck or the, you know, there's only one dog in the house. And then you try to get that model to recognize all the dogs in the world. And it's like, no, it's not the dog. So that's a very different thing, right? So the data that kids get is both deeper and richer in some ways and also much less diverse in other ways. And yet their visual system is still remarkably good at recognizing a dog, even when they've only seen one or two. So that kind of really quick learning and rapid generalization to the appropriate class, that's something that we're still struggling with in computer vision. And I think the same thing is true in language learning.
In our first episode, we heard a clip of Alison Gopnik’s one-year old grandson experimenting with a xylophone — it’s a really interactive kind of learning, where the child is controlling and creating the data, and then they’re able to generalize to other instruments and experiences. And when it comes to the stuff that babies care about most, they might only need to experience something once for it to stay with them.
Linda Smith (47:30.328)
\\ Abha, can I ask you question, do you have a child?
Abha (01:06:10.542)
Yes I do, I have a house full of toys too. She's four.
Linda Smith (01:06:13.56)
How old is this child? Four, okay. Remember back, okay? I don't know what kind of mother you are, so I might not be using the best perfect example, but what is the chance that you took this child for the very first time to McDonald's and they understood how it worked immediately, forever after and could even generalize to Arby's or some other place? It's just, it's a one-shot learning. They don't need multiple episodes. \\ Because it's all interconnected and interpredictive. Similarly, what child has had their first evening of pizza? This usually happens in all these kids. We study somewhere around 14 months. \\ (01:06:59.864) And never forgets pizza or what the word pizza means. It's a one shot event, okay? A permanent memory, okay?
I think understanding how the structure of episodes with all these predictive and associative relations builds robust memories immediately. Notice again, this is kind of like a data question. You don't want to scramble everything and just massively throw it in. It is, I think, a really key factor that people have not thought about. There's an area about which I want to know more that I don't know which is about some AI models are both extracting and learning from sort of graph theoretic representations and they strike me as potentially a path. I don't know enough about them. I just know they exist.
In our previous episode about LLMs, we touched on the fact that we humans can update our representations and beliefs in real time. If we learn that something is wrong, we can change our approach. And for Linda, this ability to constantly check against reality and adjust is the other key factor that makes babies and children superior learners to large language models, which don’t have these internal checks — they need humans to tell them when something is wrong.
Linda: You know, and then there's the question of when do you realize that something just isn't good enough, even if it sort of works like the geocentric models of the solar system, right? Okay, so how do you know? Okay, so, anyway, I think there's a lot to be solved there. \\ (41:15.352) And they don't have internal evaluation systems of how they're doing that says, whoa, that shouldn't be so hard.
And finally, it’s worth pointing out that even though it takes babies quite a while to learn how to use language, they actually do figure out how to communicate pretty effectively much earlier.
Mike Frank (16:52.291)
A six month old can communicate. They can communicate very well about their basic needs. They can transfer information to other people. There's even some experimental evidence that they can understand a little bit about the intentions of the other people and understand some rudiments of what it means to have a signal to get somebody's attention or to get them to do something. So they actually can be quite good at communication.
So communication and language being two different things. Communication \\ enables language and is at the heart of language, but you don't have to know a language in order to be able to communicate. \\ LLMs do not start with communicative ability. LLMs are in the most basic, you know, standard architectures, prediction engines. They are trying to optimize their prediction of the next word. And then of course we layer on lots of other...fine -tuning and reinforcement learning with human feedback, these techniques for changing their behavior to match other goals, but they really start basically as predictors. And it is one of the most astonishing parts about the LLM revolution that you get some communicative behaviors out of very large versions of these models.
So, if there’s one thing we can take away from Linda and Mike, it’s that babies can actually do much more than we usually give them credit for.
Abha (56:37.167)
I'm wondering how, you know, when you were learning developmental psychology and you were doing experiments, a personal question here, but how old are your kids?
Mike Frank (56:49.208)
My son is five and my daughter is eleven.
Abha (56:51.971)
Okay, I'm just wondering like how it changed the way you looked at how they were interacting with the world.
Mike: It's just humbling and fascinating to see kids grow and change. We can sometimes make generalizations about what kids are doing, but understanding any individual kid's behavior in the moment is certainly beyond science. But what it did is it attuned me to tiny little aspects of their behavior, those first little signs of word comprehension, the ways they're using language, the ways their language grew. \\ I'm always one to say that kids are underrated. Kids are amazing learners. And I think by drawing attention to the relative differences in the amount of data that kids and LLMs get, that really highlights just how sophisticated their learning is.
TK WRAP UP DISCUSSION
Trying to compare the intelligence of humans and computers can often feel like trying to compare apples to oranges — or apples to nothing, if, like Linda, you don’t think LLMs exhibit any real intelligence at all. We’ve seen how trying to use tests that are typically used on humans, like the Sally Anne test we talked about in our last episode, don’t always give us the answers we’re looking for. And then it gets even trickier if you try to look at other beings, like animals. So, what kinds of ways should we try to measure intelligence? In our next episode, we’ll look more closely at how we assess intelligence, and if we’re asking the right questions.
[TKTK TEASE AUDIO FROM NEXT EPISODE]
Melanie: That’s next time, on Complexity.
Complexity is the official podcast of the Santa Fe Institute. This episode was produced by Katherine Moncure, and our theme song is by Mitch Mignano. Additional music from Blue Dot Sessions. I’m Melanie, thanks for listening.
________________________________________
ADDITIONAL SELECTS
Sometimes in the AI literature, people talk about models being pre -trained. That is, you do training ahead of time, and then you apply them, and maybe you fine tune them to a particular task, or you retrain a little bit for the particular task. That distinction, I think, is actually really helpful for thinking about evolution versus development. That is, in some sense, mammalian brains and human brains in particular for our niche are pre -trained by evolution. Now that pre -training gets compressed down by this bottleneck because you can't just hand off the full model. You can't say, my kid is going to have all the right object recognition tools and all the representations. But genetics can pre -train in certain ways the architecture, compress that down and then recreate it so that it's better prepared to learn. And there's really exciting theoretical work that tries to look at that trade -off between evolutionary genetic pre -training and the learning that happens during development. So that's a theoretical approach to the kind of question that Linda's posing that I think is really promising and exciting from my perspective.
Babies start with communication and then move to language — LLMs go the other way
So that's really remarkable and I think it's true. I think you can see pretty good evidence that they are engaging in things that we would call communicative. Does that mean they fundamentally understand human beings? I don't know and I think that's pretty tough to demonstrate. But they engage in the kinds of reasoning about others' goals and intentions that we look for in children. But they only do that when they've got 500 billion words or a trillion words of input. So they don't start with communication and then move to language the way we think babies do. They start with predicting whatever it is that they are given as input, which in the case of LLMs is language. And then astonishingly, they appear to extract some higher level generalizations that help them manifest communicative behaviors.