Discover more from AI and Games
Exploring Convai's LLM-Powered NPC Creation Tools
The recent surge in Large-Language Model (LLM) technology, has led to a lot of speculation as to how it can be employed in the video games industry. But it’s not as straightforward a process as many would appear to believe, given the tech is increasingly volatile and needs a series of necessary guardrails and other mechanisms to ensure it is deployed safely and practically in a game engine.
To that end, I recently caught up with the team at Convai: an AI company that specialises in 'Conversational AI for Virtual Worlds'. The name might be familiar given if you've seen the recent demo of NVIDIA's ACE platform, it was Convai's tools that were being used so that the AI could strike up a conversation. Right around that time, I had a chat with the CEO of Convai Purnendu Mukherjee, who asked if I would fancy getting to know Convai's tools a little better.
AI and Games is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
For this entry of AI and Games, I'm digging into the tools that Convai have built for creating intelligent NPCs, how they've utilised them so far, and even having a chat with Mukherjee to find out a little bit more about Convai's background and where they plan to take the tool in the near future.
But in amongst all that, I’m also building my own Convai-powered character, whom I have named Dave. I’ll detail the experience of trying to build this character within the Convai ecosystem, and how he’s built to my ‘production assistant’, having been trained on old episode scripts.
Who Are Convai
All of the recent buzz surrounding Generative AI for non-player characters is rather exciting, but from my perspective, it's been difficult to see how it could all work in practice or at scale.
If you're not familiar, generative AI is all about taking the recent innovations in deep learning for the likes of language generation, image generation or voice lines and even interpreting and responding to human input. And recently companies big and small across the sector have been advocating for how this can be used in games.
As you'll know if you caught my recent episode of Artifacts discussing the current state of generative AI, for me while the technology is there in theory, it requires additional work to be palatable for a video game production and for it to fit in existing game projects beyond making small sample projects or even for the purposes of fun and silly videos on social media.
I've been watching these developments for a while, and there are several companies cropping up that appear to be working to address the specific issues of using generative AI in a game.
Convai is indeed one of those companies, given they're advocating for creating a full stack of tools to help developers apply generative AI for non-player characters games. Given the rate at which this technology is developing, this was why CEO Purnendu Mukherjee felt it was time to throw his hat in the ring.
[Convai CEO Purnendu Mukherjee]: Two major reasons was first is that how fast I saw large language models that are improving and getting at superhuman level in some cases, human level in some cases. But if people want to make this available inside virtual worlds, add them for non player characters. The stack wasn't easy. What we tried doing was. Solve that, make it as easy as possible for game developers to add this kind of intelligence to their characters, right. Existing assets that they have. The other part of it was the kind of like research bent of mine which is like, llms are cool. It learns from text primarily. That's not how we humans learn the world and language. And overall, we learn from the 3d world around us. And I wanted the sandbox environment to happen where AI can have more contextual, awareness, and understanding, and be more humanlike.
Convai's API and tools allow developers to create virtual characters via their platform, and then export and hook them up into not just game engines like Unity and Unreal, but also the likes of Nvidia's Omniverse platform, Roblox, and even Discord. As stated earlier, in May of 2023, it was showcased as one of the interaction systems provided within NVIDIA's Avatar Cloud Engine or ACE.
But for me, the big questions are how exactly does it work? Does it work as well as advertised? What potential is there in taking these tools and using them in the future? But also what is missing, what are the gaps in the technology and whether Convai are going to work to address them?
Convai were happy to oblige and not only gave me access to the platform as well as some of their product demos, and after spending time not just with their own prepared non-player characters, but also building some of my own, I can see some real potential in the tools provided, but also areas that merit improvement.
And I'll give Convai credit, Purnendu was happy to discuss not only the things that they've got right but also the elements of the tools that still need ironed out and even gave me a sneak peek of some things coming up in the pipeline.
How Do You Build NPCs with Convai?
So how does it all work? Convai is all built around two systems, each having its own programming interface or API, The Character Tool and the Voice API.
The Character Tool is responsible for all conversational dialogue that a character can create. But in addition to this, the key component for end-users is the ability to establish the non-player characters’ identity. For each character, you can establish their name, avatar and backstory. There's even a separate system that allows you to dictate their knowledge: you can either write out everything the character knows, or simply upload text files complete with data, and then 'connect' them to the character.
And secondly, there's the Standalone Voice API. This provides a set of tools for a character built in Convai to interact with players through the use of two distinct voice interaction systems.
There is a text-to-speech API that allows for a non-player character built in Convai to be able to speak any dialogue that they generate, using a series of pre-established voices.
And then secondly, the speech-to-text API is there to help parse any spoken word by the user, into something that the character will be able to understand.
Now as mentioned, you can establish the knowledge base of an existing character it's important to understand how that works.
Convai's conversation systems can be configured to use a variety of LLMs, such as OpenAI's GPT, Meta AI's Llama 2, or NVidia's NeMo. But in each case, there is still a risk involved, given they can waffle on about whatever they like if they don't receive guidance.
This is often referred to by some people as the LLM hallucinating, which is a very polite way of saying it makes stuff up. As stated in my recent Artifacts video, GPT is essentially a super-intelligent parrot of the internet. So it's kind of difficult to know what it's going to say lest you take the time to direct it.
Now this is something that is possible using what are known as 'embeddings'.
Embeddings are essentially a database of vectors that are used to encode what are known as tokens. Tokens are the basic building blocks of how large language models understand text. These tokens can be used to help classify the text, summarise it, translate it, or generate new text inspired by it.
Essentially the embeddings create a scaffolding that surrounds GPT with useful information it might need. Hence if you were to ask the NPC questions that are tied into what has been uploaded to the knowledge base (such as say what their name is, where they come from, or what they know about a given subject), this information can be pulled out of the embeddings and then used to help formulate a query to the text generation system. This is a process known as prompt engineering, in which we carefully construct the question so that we get an intelligent response. So instead of asking GPT about my NPC without any context, we instead ask it to summarise the NPC backstory, while also feeding it lots of information about them.
Mukherjee: GPT works only when you get it to work, meaning you give it an input and that's when it produces an output, right? It is not always on as we humans are. We are always doing some kind of inference to the environment. You know, we are seeing something, we are like reacting to what not: it’s always on. GPT is not always on.
You know, when you send in the input and that's when it's kind of like inference works or it becomes ‘aware’ during that particular moment when it's processing. And you only have so much of a budget to to get it to work with, which is the prompt size basically, or the input length. But if you have a big story or a world, you know, or a lot of experience of the character and that you want to add, that may not be enough.
And also, it's kind of expensive to provide that kind of tokens and computation. So what you can do is basically add a layer of memory augmentation. I mean, it is still that you have that kind of budget, let's say a 4000 token limit that you are working with for your particular LLM. But what you can do is be intelligent enough to provide the the most relevant part that the LLM would need to answer that immediate response or give the response [to the user]. So it's basically a search-and-find the most relevant piece and provide to the LLM.
And so all of this, combined with a guardrail system, act as a form of content moderation, is what's being used to keep these NPCs on point and acting as intended. And with that, let's check it out in practice.
Building My Own NPC
Convai introduced me to their tools and left me alone to go and make my wee Dave character. Dave has his own backstory that is provided for him, and from that you can engage in conversation.
One thing that was more apparent to me when I wrote Dave's fictional biography, is that when asked about it he follows it quite strictly.
Often if you ask a basic question, he'll start a response with the answer, but will then continue with a chunk of what was written in the backstory afterwards. Given that the system is GPT plus the embeddings, you have an interesting counterplay between relying on the knowledge in the embeddings or relying on what GPT itself can generate. So for example, if you ask Dave where he grew up, he'll tell it was in the town of Dunfermline in Scotland. And then will often tell you something else, like who his parents are, given that's literally the next thing I wrote in the knowledge base.
However, while this is still an issue with the public version of the tool, I was shown some behind-the-scenes work of how this is being addressed, with the ability for the character’s personality to be more fleshed out, and detail how chatty or curt verbose they are in a given conversation.
Alongside the basic backstory is also the expanded knowledge base. The tools enable you to add even more information into the system, which is analysed for use in the LLM embeddings, and you can add or remove them from the active database if you wish.
Back in episode 69 of AI and Games, I wrote an AI and Games scriptwriter using freely available tools with GPT2 and fed the system my episode scripts, which are all written in simple text files. It wasn’t particularly successful, and you can watch the video above to find out what worked and what didn’t.
So I took all of the scripts that are about the AI of games, and not episodes on academic research projects, and then fed as many of them as I could until I hit the limit given I was running on a free account. However, it's worth saying that paid accounts have a near-unlimited cap on uploads.
Once these are 'connected' to the knowledge base, Dave was able to regurgitate a lot of knowledge from earlier episodes. But again, much like before, he can provide information but often sticks to what I told him. You really have to start asking questions that are outside of what I've discussed in previous games in order for the system to start considering other information sources.
All that said, it performs far better than my own script generation system I mentioned before. My script generator would routinely invent words, create nonsense statements, and straight-up lie about stuff. Every now and then there would be a glimmer of knowledge that made sense, but by and large, it was all rather useless. Hence I focussed on making a video about it as a joke more than anything else.
By comparison, Dave is able to reproduce a lot of my information a lot more coherently, and intelligently. That isn't to say it's perfect: as we've seen already he is prone to getting things wrong on occasion, but he does have a much higher success rate than before.
Testing Out the Tour Builds
I had a chance to try out two different demos that are designed as tour guides, the first being a walkthrough of a museum and its exhibits. Meanwhile the other is a tour of a temple. In each case, NPCs are designed to have knowledge of their local environment, and from that, they can answer questions relevant to what’s on display. It's an interesting application of how these kinds of technologies could be applied outside of traditional game settings - thought it's worth mentioning both of these demos are built in Unreal Engine. So they're not games per se but they are using game tech, which is a useful way to see how it could be applied in an actual game.
Each demo highlights the ability to communicate information about a particular element, respond to my queries, and even provide more information than was initially prompted.
The tour guide will know what exhibits are in the museum, and has a set script to follow, but can also give more information about specific elements when prompted. As mentioned before, that balance between how much it relies on the embeddings versus what it can pull from GPT is an area for improvement.
That said, the ability for the NPCs to largely remain on point, is a big part of the embedding process.
You can try and get these characters to deviate, but the knowledge base is largely keeping it on point. It knows what information is accurate for the tour.
So if I ask about elements that aren't in the museum - or just random conversation topics - it will tell me as such. But even then it's going to bring me back to the point and steer the conversation towards the tour once again.
Mukherjee: You know, because we are setting the context there of what could be relevant, useful information. They tend to focus on those information. Right. And giving you the answer. So it helps with the guardrail or the, you know, the contextualization or solving the hallucination problem to quite an extent. Effectively, what all these knowledge base is, is a memory system, augmented memory system.
If you played with those demos, as you are going from sport to sport, there is a context that is coming from the scene. So let's say there is a statue, the Buddha statue or the broken statue for example. What it's doing is fetching from the scene information what you are seeing or the character is seeing and seeing and it's querying from the knowledge base.
So you could upload to your character a lot of information about the statue. So it is able to see what are the relevant conversation aspects, but what is the scene information around me, based on which I should search from my memory. And contextualize even further.
I was given a chance to try what were, at the time of writing, the most up-to-date versions of these demos, and they included what was, for me, a pretty critical feature - in that, the conversation systems are to some extent hooked up into the game engines AI tools. Meaning that I could call upon the NPCs to follow me around, and respond to my questions rather than simply running off and not listening to me. This ability to have characters that don't just respond to my questions, but whose behaviour can actually be influenced by what I'm saying is a pretty big deal. It's early days, but that is pretty significant to me.
The characters can follow me around, but it's worth mentioning that at present, they lack spatial awareness in the grand scheme of things.
Now what we've seen thus far is pretty impressive, but there are areas in which the tools will improve both to provide new features as well as address issues with the conversation system. Some of these are Convai-specific, which I'll get to in a second, but a common bugbear for me is always speech-to-text and the ability for the AI to recognise my voice correctly. The reason for that is I'm Scottish, and if there's one thing that all Scottish people are painfully aware of is that voice recognition to this day still struggles with our accent.
It's fair to say this isn't a problem unique to Convai, it's largely sector-wide. Voice recognition technology has improved drastically over the past 10 years. But it still struggles with my voice fairly frequently. That said, it succeeded more than it failed. The majority of the time it could understand me, though funnily enough when chatting with Dave, mentioning the Scottish town of Dunfermline did add to the hilarity. A Scotsman mentioning Scottish locations is a vicious mix...
Now, larger technology issues aside, there are areas that Convai is looking to improve to handle the myriad of ways that game developers may use them for. One comment that I raised with Purnendu, is the weird situation that building these tools - to my mind at least - that any NPC that uses conversational AI systems will then receive an increased level of scrutiny.
Perhaps for me, the big area of improvement is going to be in the audio delivery. The voice lines often lack a lot of emotional weight, and there's a lack of appropriate cadence in parts. This isn't a Convai-specific problem, but rather one that is sector-wide, the state of the art is still improving and making real progress each and every year.
At the time of writing, the platform is up and running, but is still going through significant updates. The internal product demos shared with me were updated several times while I was still producing this video. Plus the team currently have a multilingual version of Convai in beta, with it supporting communication in Spanish, French, Arabic, Hindi, Chinese, Korean, Japanese and more, and this is slowly improving over time as well.
Mukherjee: There are two ways to achieve this. One is to apply a translation layer both in the input and the output for that particular language and the rest of the LLM underneath working in English. The other one is get the LLM to just understand the language. Long term, the latter or the second option is the way to go.
Like the LLM understands by default from the context. Because what we are using right now is the translation layer. And the problem with that is - a little caveat there especially which will become more important in the games and arts aspect or entertainment aspect - you can't break emotion. So the goal is to not have translation and have the LLM understand that and all the nuances that goes with it directly.
Perhaps one of the most interesting elements that isn't quite addressed as yet, but is something that we can interesting conversation about, was the quality of the writing that GPT is generating and how to work around it. As you'll have heard throughout the video, every NPC that we've interacted with is painfully civil and polite. For a tour guide that's fine, but for a lot of other characters we need a little more nuance. What about a character that uses profanity? Or is designed to be a bit more villainous? That would naturally influence the language being used by the character.
And this all leads to an area that, thankfully, Convai are already looking into in some detail. How do you moderate the output? Not just for creative purposes, but also in terms of quality assurance. Their approach is to provide tools that handle moderation by default. This limits the conversation dialogue that could include the likes of references to violence, hate speech, or references to aspects of religious or more adult content. That said, Convai doesn't intend to be the arbitrator of content, given it's impossible as one company to handle the myriad of cultural and religious nuances that may emerge depending on how the tools are used. Nor do they want to dictate what developers can and cannot do with the tools. Hence on the higher subscription tiers, a user can disable the moderation parameters. As such it will then be in the hands of creators to ensure characters stay within the acceptable bounds of their proposed design, not to mention local laws and conventions.
As I wrap up this episode, the thing I've taken away from this is that Convai is on the right path to making this a useful tool for game developers.
Plus it's already in a state that you can start playing with right away. The technology will continue to mature in the near future, and there's plenty for anyone keen to try it to start using today you can head over to Convai.com to try it out yourself.
As I wrap up, a special thanks to Convai and to Purnendu for reaching out to have a chat with me. I'll be keeping an eye on these developments on my end, and we might see another episode on all of this in the future. But for now, thanks for reading this episode of AI and Games.
AI and Games is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.