What's your NAME?

By Yevhen Loza (EugeneLoza) (Website)


In roguelike or sandbox games procedural generation is used not only to create world areas and fill them with items and enemies. Friendly NPCs also want to be a part of the game and to do so, they need a worthy name. In this article I'd like to share a simple names and nouns generation algorithm I've made this summer.

A name is not a random set of letters. It's a word. Maybe, it had lost its meaning long ago, maybe it came from a long forgotten folk. But most often it follows the normal word building rules in native language. Therefore it looks like a good idea to build names from a usual dictionary.

Word Structure

First of all, let's make an important simplification, so that we won't be forced to analyze each syllable in a grammatically correct way. We may imagine each word structure as "(C)VC+VC+VC(V)", i.e. each "syllable" being a vowel+consonant block (VC). The "first syllable" may also heave "leading consonant block", and therefore should be stored and used separately. Let's also demand, that there should be no syllables with zero consonants to avoid generating names like "Miisaaaal".

Then we split each words into "syllables" and store them in an array or a list. However, some syllables are used frequently, and others are rare. Therefore the algorithm should also store the number of "hits" per syllable, so that generated words would follow the same pattern.

Namespace

You know, I have a very common name for a place I live in. And it took me quite some time to stop reacting to random people around calling my name :) So, we should avoid that problem by creating a "namespace" we're all used to in programming - no two variables may have equal names. So, while generating the name we just check it against the names that have already been generated.

Moreover, we've just used almost a million words to build our set of syllables and it's quite possible that the generated word will be... yes, exactly one of the words that was used to generate it. So, right after we've created the list of words, we just add them to namespace to avoid NPCs called "Rhinoceros", "Skyscraper" and "Watermelon".

Ban List

I know a Chinese scientist whose name sounds like [CENSORED] in Ukrainian language. Such things happen in real life and often result in stupid jokes and puns.

We should try to avoid such inconveniences by creating a "ban list". It works similar to namespace, but while the namespace checks the whole word, the ban list scans all the substrings. We don't want a character named "Mol[CENSORED]as" in our game, don't we? The probability is not as low as one might think and concerns about 2% of all possible names.

But of course, only human ear can tell what sounds nice or ugly. So, we may also create a manual ban list of letter combinations like "rdsp" or "thyth" which are hard to pronounce, but may by chance appear in generated names.

We should also control consonant-to-vowels ratio in words, so that there would be not too many complex consonant blocks. Average 2-4 letters per vowel in a word produces quiet nice results.

The End?

The last thing remaining is to add the word ending. With "VC" syllables structure we guarantee that the word always ends with a consonant. Such generated name suits a male character and adding a letter "a" in the end makes a female name. We may also use a simple trick to make short names (nicknames) for our characters by just using their first syllable and adding "i" letter in the end (e.g. "Lepuha aka Lepi").

Result

In an example below 4 Public Domain licensed English word lists were used with total of 900660 words, also ban lists were created. Processing them with the algorithms described above produces 11760 first syllables and 1802 normal syllables. On average desktop PC each name takes about 16-18 ms to generate.

With up to 4 syllables per word that provides for over a trillion of unique names. And of course, this algorithm may be used not only for names in English, but for most other languages too.

And let's welcome our new NPCs named Inscrana, Bernof, Phimowax, Baltela, Sniggit, Kagudum and Degluca!

You may find the corresponding FreePascal code and word lists here:

https://gitlab.com/EugeneLoza/DecoNouns