Abstract

When I had taken notice of him, and my interest at the time, I was deep in machine learning, and really wanted to put a brain on this thing, productize it, take it to market, scale up manufacturing, etc. We hit it off superfast. We thought the exact same way about how we wanted these things to work.
We did not want to compromise anywhere, to set the vision as making these biomimetic superintelligent androids, and not settle for anything less than that. To go as one-to-one with the human body as possible. Łukasz was leveraging principles of biomechanics that he was studying for years to make these prototypes. For my own research, looking at various classes of artificial muscles, it was clear that there was nothing better than McKibben's.
While I was looking for something that could be an electric, purely electrically actuated fiber or something like that, there was nothing that really came close to the performance of an actuator with a rubber tube essentially surrounded by some kind of fiber-braided sleeve. This is just the basis of the actuator, and from there you can take it very far, much farther than what people would imagine.
I think the story of Clone is about the relentlessness of making possible what other people do not think is possible in every dimension. Everything we make is custom: custom bones, custom valves, custom muscles, custom sensing, like a custom pump motor. It was done out of need, not because we wanted to do everything custom. We cannot get muscles on the market. We bought Festo muscles, but they would not fit inside a hand. They are pneumatic and would require a bulk compressor. It is just not going to work. You need to make your own muscles. When you make the muscle, you realize that there are issues with the fluidic actuator, so you need to make it better in many dimensions.
You need to improve the braided sleeve, you need to improve the way that you manufacture it, so it can be done very quickly and cheaply, and can be scaled up. The story of Clone is “How do we get biomimetic androids?” Well, you need to figure out a million problems. And when we started the company, I was lucky enough to have a cofounder like Łukasz who already has shown that relentlessness.
But also on the control side, it turns out that when you are making such a high-dimensional system to begin with, it is so difficult to control classically, the number of nonlinearities at the joint level, the muscle level, the tendon level. They accumulate so quickly, and it is silly to even attempt to classically control it and pretend like you could deliver a product to customers where the robot is doing real work, and you are classically controlling the whole time.
That is not going to be tenable, so why even pretend like it is something you should do in the early days? It helps you on the hardware side too, because if you can make the assumption that it is not tenable, then you can start making much easier trade-offs on the hardware side where you can just say, “Hey, I don't care about these non-linearities. It doesn't matter at the end of the day, because I'm going to be controlling you with a neural network, so I can optimize my hardware design in all these other dimensions that make the hardware better in terms of power density, strength, speed, etc.”
Obviously, if you use components like that, there is going to be noise. It is very steampunk and not exactly optimized for a real product. As we went about optimizing the design for a real product, you can get rid of the noise in the valve, you can get rid of the noise in the pump and motor. This is even without any acoustic insulation. You can imagine that with acoustic insulation, noise goes to virtually zero. I mean in the most recent videos of the hand, the noise that you can still hear in the videos is squeakiness in the servo valves, and we are not using those anymore Our new valves are completely silent.
Up until 2023, since we started the company, the focus was on durability. Getting the hand durable enough so that you can run it without any human interventions. You can run a learning algorithm in the real world, and let the hand continue learning to solve some manipulation task without any human intervention. That is the thing we got to by 2023. We are very happy about that, it makes it ready to be sold.
But then the final thing was just that researchers did not want to have to deal with the hand that was going to change in design very quickly, iteration after iteration. We have been updating the hardware quickly. We are just about there, where we are finalizing the valve design. It is miniaturized; it is consuming very little power. It fits in the vein, opening up the entire torso volume for the energy supply.
In terms of actuation, you have a skeleton and then you put muscles on it. This is different from every other way. Nobody else makes robots like this. Everyone makes a rigid exoskeleton and then you stuff as many batteries and motors inside as you can. That is not how you have to make robots. You can just start with the human skeleton. You can just put your actuators over it, and then wrap it up in skin, and wow, you have an android. And that is the approach we are going for, and it scales nicely.
If you want to teleoperate, it is much easier to control some linear motors than it is to control these nonlinear muscles. And that was l the problem. Łukasz had this certainty, even without knowing where it was going to go, just knowing, “Hey, AI is going to get better and better and so it should be fine.” And I jumped in as I knew, “Hey, AI is good enough now. We can train very large world models that can learn all kinds of rich information about reality that'll let us control these very high dimensional systems.”
We finally put together a teleoperation demonstration with proportional-integral-derivative (PID)-type control, that is still rather finicky to work with. But teleoperation is, I think, going to be more short term than people realize. It is just not fun to scale if you do not have a nice rig where it is fun, the fine control is good, you have haptic feedback, so it is a faster learning curve to use the teleoperation rig. For us to do haptic feedback, we would have to make some kind of glove, and we would just be spreading ourselves way too thin at that point.
For us it is going to be limited teleoperation, trying to get some number of teleoperation play experience in virtual reality (VR) and, with that, augment the data as best as we can. Use some diffusion models to take a single experience and then change up the environment on that experience a thousand times over.
Then, also, you have the teleoperated experience. You can turn offline RL on that operated demonstration and let the robot trial and error that same type of short-horizon task over and over until it is just getting more active offline data, besides your motor data of course, to solve that task. At the end of the day, if you can see the trend on the language side of things, the aim is to get these robots working autonomously in the real world. We are talking about something like billions of neurons that humans have.
Getting the analog of that, it is a ton of data that is required, a ton of experience. Where the story of language, of course, has been deep learning, Stochastic Gradient Descent on large amounts of data, that is what works, and that is what lets you see the magical results you see in on the language side of things from deep learning. But of course you are working with the copious amounts of Internet data.
I think in 2022 what changed was that people started to figure out, “Hey, you can leverage video data of human hands on the Internet,” the millions of hours of video data to start leveraging it toward motor control. The way they were doing that was something like making large pretrained visual representations, and then separating that from low-level control policies. I think that going forward, that will all be end-to-end, where you were just pretraining entire behavior representation, rather than visual representations. To do that, of course, you all need a ton of video data, but you all also need some sensory motor data.
The video data you can get from the Internet, you can get from the robot just standing there, and watching you do stuff. Past that, though, you are still going to need some threshold amount of sensory motor data. The first part of that you need to get from teleoperation, so it is high quality, and after that you can get it through trial and error. You can get it from data augmentations, which you can do in various ways these days.
The material itself is more about the property. If the material has a property where you can run a small current through some magic material fiber or something, and then the fiber contracts 60%, it sounds like a magic material that would be great to replace our muscles with.
The muscles we are using give you the best properties, the best contraction ratio, and the strength because of the contraction ratio. Speed can actuate quickly, with very fast response times, extremely low cost to make, and follows the shape of skeletal muscle approximately very linear. It is more about approximating all these various properties as close as possible, rather than saying it feels like what a human muscle feels like.
In terms of what Will Jackson has said about not getting anywhere close to a human skeletal muscle, I thought I would address that. In the forearm, at least, I think human muscles are contracting around 20% to 30% and ours do the same, 30% with no load, over 30% with a 1-kg load.
These are great muscles already, but in some parts of the body, I think in the leg, especially, your human skeletal muscle contracts up to 60%, I think. There is some gap there. For all intents and purposes, you will not notice it in our design. You will not be able to tell, “Oh, it's contracting 20% less. It's not a skeletal muscle.” We have one more improvement we can make on our muscle that can achieve the human skeletal muscle; it is the same type of muscle as what we would make, but with a slight modification, and it would be able to reach I think those 50% to 60% contraction ratios that some parts of the body, some muscle regions in the body, contract at.
I think upon making that work, chances are some of the know-how I mentioned that we have, that I cannot talk about in too much detail, including how we make this energy dense enough to operate for several times longer than Atlas, for example, will be critical in making a prosthetic in the future that will have a very light and portable powering system that is just so small that you do not notice it as a wearer.
We had to do that so many times over before we got to where we are now. There is probably a couple more trades for us to make before we have the full android next year. But we see the path in front of us, and we have cleared most of the way. I think, you will likely see more of our design out there as soon as we put out our first torso, our first android, and people can see the delta in performance of having an actual two-handed full android, able to do both strong tasks and dexterous tasks, and walk around without needing to recharge often. I think this will blow minds, and it is likely that we will end up seeing more designs like ours.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
