Transcript Generated by Easy Cloud AI’s Beluga
What’s up brother? How are you? Good to see you my friend. Good to see you. Hey, what have your people done? Your AI people with this chat GPT. This scares me. It’s your people. What do you mean your AI people? Your AI people. Your wacky coders. What have you done? Yeah, it’s super interesting.
Fascinating. Language models, I don’t know if you know what those are, but that’s the general systems that on July chat GPT and GPT. They’ve been progressing over the past maybe four years aggressively. There’s been a lot of development. GPT-1, GPT-2, GPT-3, GPT-3.5. And chat GPT, there’s a lot of interesting technical stuff that maybe we don’t want to get into.
Sure, let’s get into it. I’m fascinated by it. So chat GPT is based on fundamentally on a 175 billion parameter neural network that is GPT-3 and the rest is what data is it trained on and how is it trained. So you already have a brain, a giant neural network, and it’s just trained in different ways. So chat GPT-3 came out about two years ago and it was impressive but dumb in a lot of ways.
You would expect as a human being for it to generate certain kinds of text and it was like saying kind of dumb things that were off. You know like, all right, this is really impressive but it’s not quite there you can tell it’s not intelligent. And what they did with GPT-3.5 is they started adding more and different kinds of data sets there.
One of them, probably the smartest neural network currently is Codex which is fine tuned for programming. Like it was trained on code, on programming code. And when you train on programming code with chat GPT is also you’re teaching it something like reasoning because it’s no longer information and knowledge from the internet. It’s also reasoning. You can like logic even though you’re looking at code, programming code is you’re looking at me like, oh Jesus.
What is he talking about? No, no, no, that’s not what I’m looking at. I’m looking at you like, oh my God. But reasoning is in order to be able to stitch together sentences that make sense you not only need to know the facts that underlie those sentences you also have to be able to reason. And we take it for granted as human beings that we can do some common sense reasoning.
Like this war started at this date and ended at this date therefore it means that the start and the end has a meaning. There’s a temporal consistency. There’s a cause and effect. All of those things are inside programming code. By the way, a lot of stuff I’m saying we still don’t understand we’re like intuiting why this works so well.
Really? These are the intuitions. Yeah, there’s a lot of stuff that’s not clear. So GPT 3.5 which Chad GPT is likely based on. There’s no paper yet so I don’t know exactly the details. But it was just trained on code and more data that’s able to give it some reasoning. Then this is really important. It was fine tuned in a supervised way by human labeling.
Small data set by human labeling of here’s what we would like this network to generate. Here’s the stuff that makes sense. Here’s the kind of dialogue that makes sense. Here’s the kind of answers to questions that make sense. It’s basically pointing this giant Titanic of a neural network into the right direction that aligns with the way human beings think and talk.
So it’s not just using the giant wisdom of Wikipedia and I can talk about what data sets is trained on but just basically the internet. It was pointed in the wrong direction. So this supervised labeling allows it to point in the right direction to when it says shit you’re like holy shit that’s pretty smart. So that’s the alignment.
And then they did something really interesting is using reinforcement learning based on labeling data from humans that’s quite a large data set. The task is the following. You have this smart DPT 3.5 thing generate a bunch of text and humans label which one seems the best. So ranking. Like you ask it a question. For example you do generate a joke and install Joe Rogan.
And you have a label. They have five options. And you have a label. I don’t know how exactly but you get it to rank. The human label is just over sitting there. There’s a very large number of them. They’re working full time. They’re labeling the ranking of the outputs of this model.
And that kind of ranking used together with the technique of reinforcement learning is able to get this thing to generate very impressive to humans output. So it’s not actually there’s not a significant breakthrough in how much knowledge was learned. That was already in GPT 3. And there was much more impressive models already trained. So it’s on the way.
Not just open AI. But this kind of fine fine tuning it’s called by human labelers plus reinforcement learning. You start to get like like where students don’t have to write essays anymore in high school. Yeah. And style transfer like I said do a Lucy K. Joe can style Joe Rogan style. And does an incredible job at those kinds of style transfers.
You can more accurately query things about the different historical events all that kind of stuff. Holy shit man. The idea that you don’t exactly know why it works the way it works. That’s too close to human. It’s too close to human thinking like you know what this eerily is eerily similar to the plot of Ex-Machina when he’s talking about how he coded the brain.
Do you remember that that plot the that that scene that scene when you just yeah no the gentleman who’s the what’s the gentleman’s name the actor that dude’s bad really good really good. Isaac Isaac great casting. He’s amazing Alex Garland a director somebody I got to see right. Yeah no that movie was one of it’s one of my top 10s.
I love that movie but that scene where he’s this below John Wick one two and three. Well three of us are not a fan of three three didn’t have any muscle cars. Still worse than Central woman gone. Which one John Wick three or one all day. How dare you all of them. It’s silly man movies. Yeah you ever watch them when you’re on a treadmill though.
No I don’t. I don’t. Yeah. It’s constant action. You ever watch them a hundred times. Which apparently you have. Well I was trying to win a bet. All right. You know Rocky is better I think for that. Really. I’m a sucker for Rocky the whole the whole all the whole soundtrack the. I can’t get over the bad fight scenes.
All the bad fight scenes. I can’t my disconnect it won’t allow that. Have you seen the montages recently. No. Cheesiest how long they still work. Because he’s doing kind of fitness he’s doing he’s doing like pull ups and like he’s doing the silliest of stuff even Drago it’s it’s anyway it’s just it’s so there’s so much corny to the actual physical confrontations.
Sure. Like as an analyst you know my come on doesn’t work like that. Which is the interesting things about ex-machina for me as a somebody who knows about AI and robotics. And the corny signal doesn’t. What is this. So this is the one with these in Russia. Yeah. Old school training. Running in the snow jogging in the snow.