How Speeders in Star Wars Outlaws Are Powered by Machine Learning
Using reinforcement learning in a sensible and practical way.
AI and Games is a fan-funded publication and is only possible thanks to our paying subscribers. Support the show to have your name in video credits, contribute to future episode topics, watch content in early access on Discord, and receive exclusive supporters-only content.
If you'd like to work with us on your own games projects, please check out our consulting services provided at AI and Games. To sponsor our content, please visit our dedicated sponsorship page.
Released in 2024 and developed by Massive Entertainment, Star Wars Outlaws has players run a variety of missions across various planets in the Star Wars universe, be it the moon of Toshara, the dry and arid Tatooine, or the damp jungles of Akiva. In each case you ride around on a speeder bike, a classic vehicle from the franchise that hovers above the ground and moves at high speed.
While the non-player characters you meet around the world are built using traditional game AI, all of that changes when an NPC gets on a speeder bike. These vehicles need to navigate complex terrain, respond to in-game physics, and are often used to chase the player down in key sequences. As a result they need a fast and effective solution to compute how and where to move in the heat of the moment. And as given away in the introduction, the solution was to adopt machine learning.
Like its contemporaries such as EA and Activision, Ubisoft has an internal technology group that has teams of specialists that work to builds tools and tech to support their game development teams, be it for game engines like Snowdrop or other bespoke toolchains. As part of said technology group, there is a Machine Learning team that works to provide a suite of tools that allow studios to use machine learning in ways that see fit, and this led to a collaboration between the Technology Group and Massive to ship these ML-controlled speeders.
As detailed in a GDC 2025 talk by Colin Gaudreau and Andreas Lasses, the team at Massive were excited about the prospect of adopting the ML tools built by the technology group. But given their previous games, notably their work on Tom Clancy's The Division - which we've covered at length on the channel - had no machine learning in them, they found themselves at an interesting crossroads. Of trying to figure out whether machine learning could be used to suit their design goals. But ultimately, they found a good balance that enabled for it all to work as intended. So let's take a look!
Follow AI and Games on: BlueSky | YouTube | LinkedIn | TikTok
Understanding the Problem
So as stated, the team needed controllers for the speeders, this handles both their movement in the open world as they simply drive on the road, or in chase sequences with the player as a result of encounters or specific story missions. While there were existing proportional-integral-derivative controllers (PID) controllers within Ubisoft, which are common for controlling dynamic objects such as vehicles in games - which had been used in other titles, the team were more confident in the ML solution being able to handle the faster paced driving style necessary for Star Wars Outlaws.
And so Outlaws uses an existing ML solution within Ubisoft known as SmartDrive: a machine learning controller designed for handling the movement of vehicles. Smart Drives uses reinforcement learning to train neural networks such that they can follow paths, adhere to speed limits imposed on them, and avoid both static and moving obstacles - and we'll discuss how that works in a second.
Smart Drive has actually been a part of the technology groups Machine Learning Framework for several years now and has been showcased in a variety of demos during that time (see above). Said framework is in essence a similar idea in principle to that of Unity's ML-Agents toolkit, or Unreal Engine's learning agents APIs, where they interface game engines with a Python backend for training AI agents to solve specific tasks. Though of course in this instance, the ML Framework is designed to work with Snowdrop - which is the engine used for the likes of The Division and Star Wars Outlaws - and Anvil - which most recently was used to ship Assassin's Creed Shadows.
And in fact when they simply dropped in an existing model, that had been used to drive a regular car and then hooked that up to the speeder, they found that it already had a good chunk of the behaviour working from the get-go. So the next step, was to figure out how best to adjust it, so that the model could be trained to handle the specific requirements set out by the games designers.
Training an ML Speeder
Okay, so how do you train a Star Wars Speeder using machine learning? The model is built using an artificial neural network, and of course we've seen neural networks crop up in a variety of shapes and sizes over the years. As a reminder, a neural network is a non-linear data structure where we process input signals through data processing units referred to as neurons. You train these systems by modifying the weights between each neuron, which modifies the strength of the input signals as they move from the input layer, to the output layer. All of this reminds me, we've still not done an AI 101 episode on neural networks, I should really look into that. But, there is a case study from 2024 on Little Learning Machines that has you train neural networks also using reinforcing learning to guide little NPCs to solve puzzles akin to games like Creatures and Black & White. So yeah, check that out if you're curious!
Now the network using in Outlaws are actually rather small, at least when compared to what we're often talking about in contemporary AI - in fact when compared to the generative AI models we talked about in episode 78 that are trying to simulate the behaviour of entire video games the speeder network is miniscule. Generative networks to simulate games require billions of parameter weights, but this one had only tens of thousands to resolve. But as we'll see even at this size they're a really good fit for the problem they're trying to solve - and the tech works well once they started to make some smart concessions. But yeah, the actual network, very small. It feeds in inputs from the game engine, and passes it through two layers of 256 neurons each, and this is then passed into three output neurons: two that handle the throttle and steering of the speeder (which are mapped as continuous values given they're mapped to the right-trigger and left stick on a controller), and the final neuron to the handbrake, which is a simple yes/no boolean.
But how does the speeder see the world? Well the devs break down the game world into two distinct elements: the immediate vicinity, and the path the speeder is meant to be moving along. The paths themselves are computed courtesy of the navigation tools built into the engine, and then it breaks it down into a corridor of triplets, containing the left point, the centre point and the right point of the path at fixed distances along the way.
Meanwhile nearby obstacles are recorded by breaking up the space around the speeder and finding if the sensor intersects with a navmesh edge that would line up with an obstacle. But of course, that only works for a static obstacle - because even if the navmesh could capture moving objects, they'd be moving too quickly to wait for the navmesh to update itself - so dynamic obstacles are tracked by their relative position to the speeder, though it can only track two at a time.
But it's not just the environmental information, there's a lot of other data being fed into the network as well, including the current forward, angular and drift velocity of the speeder, the elevation angles of the speeder given the surfaces are seldom flat, but also the previous actions taken by the speeder over the previous frames. This is super important, given if it's trying to make turns or fast movements it's vital that it can retain context of what's happening, rather than like... forget what it was doing halfway around a turn.
Now the way this works is that the neural network gets in all this input, processes the information, generates the output. But when starting out it's as smart as a bag of hammers. You next need to build the infrastructure for it to learn! The bots are trained using a setup whereby the ML framework is running in Python underneath separate from the game engine. Whereby it sees how well it's performing, and feeds the information back to update the model. Now the algorithm used, for all the ML fanatics out there is soft-actor critic, which is a rather popular reinforcement learning algorithm for this kind of problem. Now I won't go into detail how the algorithm works in the video, but I'll leave some resources in the description, but they key thing to understand is that it is assessing how well in every training segment, known as an episode, the agent succeeds at controlling the speeder in the scenario it is given. It then adjust the weights of the neural network as it goes, so it can then try all over again, and over time it gives the illusion that the speeder is learning.
So speaking of, let's dig into how training episodes and the environment needed to make it happen were built.
The Training Process
The setup we've discussed thus far means that there's a mechanism for feeding information to the neural network speeder, it can control the speeder, and there's an algorithm setup for improving the model. So now it's about setting up the problem so it's well scoped. By setting up the training environment and the reward function, it creates an environment that tells the agent what behaviour is desired, without dictating exactly what the end result should look like. This is one of the strengths of machine learning in practice, in that you can build up the problem around it, and then leave it to figure out the answer, and critically it's learning a general behaviour (or as we call it, a policy) to handle any path it could meet in the game. This makes it rather resilient to any changes made to the game map during development. But despite this, it can still be quite difficult to get the end result the way you want it. So let's start by discussing how the training environment was built, and then we get into the reward function, and all the little problems that cropped up that the dev team had to work around.
One of the very first challenges faced is how to build a training environment for the speeders to run in, typically known as a gym. The development team originally created a fairly basic game environment like shown here, in which the agents could spawn randomly on the map and then be fed a simple script telling them where to go, and then they figure it out along the way. Leaving the bots to train in this limited space after only around 12 hours of training was enough for them to figure out how to move around it successfully, and the final result when deployed in-engine was good, but not great. What you see below is an example of a speeder bot they trained in just 12 hours on a single development machine, and already it's getting pretty decent output with little effort to tweak or adjust the learning algorithm and everything else. But it still needs a bit of work.
One of the biggest issues is that they need to make sure it can handle all of the possible scenarios that will emerge in the open world. But it wasn't feasible to just have them training in the main open world maps of the game. To do that, you'd need to customise a lot of the backend code in the game to create a custom version of the game world where everything except the relevant scripts for training the ML bots was deactivated. That's a lot of work, effectively creating a unique copy of Star Wars Outlaws that only works for the ML bots, and this opens up all sorts of production risks when you're on a tight schedule for a game that needs to ship! So instead they expanded the gym to include a lot more features you'd expect to see in the main game map, including the likes of road networks, narrow canyons, and even adding wildlife NPCs so they learn to dodge around them. They then worked to automate the whole process so it became easier for the external training codebase to interface with the game engine and run the training.
Plus the process had to be conducted for each type of speeder that exists in the game world, so once a type of speeder was trained and ready to go, they also populated the gym with the these pre-trained vehicles, so as to normalise the idea that you're going to be driving around them on the regular.
Building Reward Systems
So now that the training environment is in place, the second critical aspect for the training is to properly define the rewards. When using a reinforcement learning algorithm, we need to define what is known as the reward function. This helps the AI speeder figure out what it's doing that is considered good, versus what is bad or undesirable, and that's how it learns over time to do what we want.
During training, the speeders are rewarded per-step, which means every time point after they have taken an action, which is critically when its going to act again. They are rewarded each step for three things:
How far along the prescribed path have they reached?
How well are they staying on the path?
How well are they matching a target speed set by the designers?
But, given this is a reinforcement learning algorithm they need to be punished every time they do something wrong (thereby reinforcing good actions over bad ones). Hence the speeder receives a penalty when it makes a collision with something else, but this is actually scaled between 0 and 1, and its based on severity over time. So if it slightly clips something, it's fine. But, if it is grinding against another vehicle for several seconds, then that's really bad.
It would consider the training episode to be completed successful if it actually reaches the end of the path, and it gets a big chunk of reward for doing so - thus incentivising it to reach for the goal. But there's also a time limit on how long the episode is, and it receives a big penalty if it never reaches the end of the path. But also, if the vehicle falls of the map, hasn't moved for a while, or simply hasn't moved along the track in any meaningful way, then they terminate the episode and denote it as a failure. This last one is critical, given as we mentioned the reward function gives credit when a bot continues to move down the track, but it could conceivably score a lot of reward by simply moving back and forth along the same segment of the track. So the success and failure of the episode has to factor in that the AI speeder might learn to cheat, by exploiting a loophole in the scoring system.
So with this, it brings the bots much closer to the intended behaviour. They can stay on the road, and generally perform a little more intelligently. But as you can see, there was still a bunch of issues in how it handled movement, plus a bunch of extra problems along the way. So let's wrap it up by looking at all the weird edge cases that the dev team had to factor in.
Adjustments
When I was researching this episode, the thing that I really liked about the work that Massive did here is that they spent a lot of time evaluating the performance of these speeders. Both quantitatively - meaning they checked how fast and efficient it was to calculate - but also qualitatively, meaning they're looking at how good the model is and then making tweaks to suit - be it in the model itself, or how it was trained.
Returning back to that clip from a minute ago, where the behavior still had some issue with the throttle, but it's also waving a lot and not staying consistent on the track. This was reduced somewhat by limiting the width of the corridor they follow, but also by modifying the reward function to say that they get slightly more reward if they stick to the centre of the path. While they don't overtly punish being off the centre, the small additional reward encourages them over time to snake on the track a lot less. But this also resulted in another problem, in that by they now stay closer to the centre, but the bot was learning to constantly wobble the steering column - something of a by-product of the centering reward perhaps? So they altered the reward function again to penalise the bot for large changes in steering input between each time step, and that cut it down drastically.


But critically, the secret to getting the final bits of desirable behaviour, is there are specific edge cases where they turn off or override the SmartDrive controller. Regular viewers may recall this was something that the Forza Drivatars do as well as discussed back in Episode 60. So yeah there are two distinct scenarios where they do this regularly.
The first, was for when a speeder needs to stop. The devs were struggling to get it to slow down and stop where intended. They would overshoot their final destination. So the first phase of resolving this was to tune the inputs to the model so that it had a clearer indication of the desired target speed it should be going at relative to the distance to the target location. This made a small improvement, but still wasn't perfect. So to get over this, once the speeder gets within a radius of the destination, the AI controller simply ignores SmartDrive, and there is a hand written controller that handles the braking more aggressively.
The final issue, was actually something that the bots weren't trained to do, and that's to crash! After all they've been told to behave intelligently, but the devs wanted to create a scenario where they would drive erratically and show them losing control before they blow up. So they override and ignore Smart Drive to run a custom behaviour that has it swerving everywhere until it blows up like so!
Closing
As we move to wrap up this episode, the reason I like this machine learning project in Star Wars Outlaws was that it as soon as I read about it, it just made perfect sense. It's exactly the kind of problem that machine learning is well suited for, and it's being deployed in ways that facilitate a very specific design goal, without interfering with any of the other AI systems in the game. But also that we've reached a point where this kind of idea can not only be explored but implemented quite efficiently. Naturally I make it sound a lot easier than it was in reality, and we're still at a point where it's a challenge for ML toolsets and philosophies to be integrated into game development workflows. But in the past decade or so the technology and the knowledge on how to make this work has reached a point that you can deploy ML for a specific part of your game, and it's just another aspect of the overall experience.
Related Reading
This episode is based on the talk by Colin Gaudreau and Andreas Lasses presented at the Game AI Summit for GDC 2025.
As mentioned, Ubisoft's ML tools have also been used in the likes of For Honor and Rainbow Six Siege, check out the GDC 2025 talks on these as well:
https://gdcvault.com/play/1035522/Streamlining-Bot-Development-in-For
https://gdcvault.com/play/1035224/Machine-Learning-Summit-Rainbow-Six
For those interested in learning about SmartDrive's RL algorithm - Soft Actor Critic - here's some relevant links: