Tesla Bot
Elon Musk Reveals Tesla Bot 2022, Elon Musk told everyone that, Israel uh Tesla bot would be real um, but basically if you think about what we’re doing with cars right now, uh Tesla is arguably the biggest robotics company in the world, because our Cars are like semi-sensory robots, on neural nets that recognize the world.
understanding how to navigate through the world, it kind of makes sense, to put it in a humanoid form, they’re also pretty good at sensors and batteries and actuators, so we think, that we have Maybe sometime next year there will be a prototype that basically looks like it will.
It aims to adapt, of course, and navigate through a world made for humans and to eliminate the dreaded repetitive and boring tasks we’re doing by setting it in such a way, that it’s on a mechanical level rather than a physical one.
So that you can run away from it, and most likely, that uh hope it never happens.You never know it’s a round, uh 5.8 um uh there’s a screen where the head is for useful info, but it’s got otherwise basically, autopilot system in it, so that’s the camera got eight cameras And what we want to show today is that Tesla is much more than an electric car company.
We have uh deep AI activity, uh hardware level at training level on uh guess um and uh basically we think, that we think of leaders in the real world, AI because it applies to the real world, And for those of you who have seen the full self-driving beta.
I can appreciate the rate at which Tesla is learning to drive neural nets, so here I’m showing a post of raw inputs that fall into a stack, and then neural processes that fall into vector space, and you can use that vector. Looking at parts of space in the instrument cluster.
That is, we’re effectively building a synthetic animal from the ground up, to understand the car as an animal that moves around it, senses the environment, and you know, that Works autonomously and intelligently, and we are building all the components from scratch in-house so we are of course building all the mechanical components of the body.
The nervous system that is all electrical components, and for our purposes the brain of the autopilot and the synthetic visual cortex in particular for this section we are just processing the individual image, and we are making a large number of predictions about these images. For example, here you can see the predictions of stop signals.
stop lines, edges, cars, uh traffic lights, curbs uh or car not parked, all stationary objects like garbage cans cones and so on and here everything is coming out of the net, in this case the hydra is out of the net, so This was all well and great, but as we worked towards FSD, it was quickly found that this was not enough.
So where it first started to break down, when we started working on the smart sum here, I’m just showing some predictions of the curve detection function, and I’m not showing it, w for each of the cameras so We want to spin our way around the parking lot to find the person who is calling the car.
Now the problem is that you can’t directly drive on image location predictions. The ones you really need to put in. Let’s take them out and create a vector space of some sort around you, so we tried to do that using c plus, and developed what we call occupancy tracker at the time.
So here we see, that curve detection from images is being stitched into camera views, camera limitations and over time are now two major problems. I would say that with setup number one we found out very quickly, that tuning the occupancy tracker and all its hyper parameters was extremely complicated.
Basically if you can’t predict these cars, if you’re only looking at a small part of the car, you’re not going to be recognized, to be very good and they’re not going to be in good shape, but Not a problem with a multi-camera network, here’s another post from a more modest type of situation.
We see, that in this tight space these cars cross the limits of the camera, there isn’t a lot of junk that enters the predictions, and basically the whole setup makes no sense especially for very large vehicles like this. No, and we can see, that multi-camera networks struggle much less with these kinds of predictions so here we are.
Predictions about road boundaries in red intersection areas in blue road centers and so on we’re only showing some predictions here just to keep the scene clear, and yes it’s done by spatial RNN, and it’s the only one Showing clip.Excellent Kinematic Label Production If we put this together, we get the training optimized chip Our D1 chip.
it was completely designed internally by the Tesla team all the way from architecture to GDS out, and it The chip is like a GPU level calculation with the CPU level.Our compute plane is completely orthogonal to the power supply and cooling which makes high bandwidth compute planes possible.That’s nine petaflops of training tile. It becomes the unit of scale for our system, and it’s real.
Have a nice day!!❤️