Faker has always been able to generate random text, based on a “lorem ipsum” corpus. But Faker generated test is gibberish in all countries, and misses the particular language structures of certain languages. Today, Faker introduces a new real text generator, which produces “almost understandable”, although completely random, portions of text.
In addition to the old text()
formatter, Faker now exposes a new realText()
formatter:
<?php
$faker = Faker\Factory::create('en_US');
echo $faker->realText(); // generates a random string of 200 characters at most
Alice, as the jury had a little door about fifteen inches high: she tried the effect of lying down with wonder at the stick, and held out its arms folded, frowning like a telescope. And so it was.
You may recognize a taste of Lewis Carroll in this text, and you would be right. Here is another random string of real text:
<?php echo $faker->realText(180); // you can specify a max number of characters
Knave was standing before them, in chains, with a sigh. ‘I only took the opportunity of showing off her head!’ the Queen put on his spectacles and looked along the passage into the sky.
Yet you can look in the complete works of Lewis Carroll, you will never find these passages.
The new real text generator uses a very simple algorithm called Markov chains generator. Here is how it works:
You will end up with a long index looking like the following:
...
hours: ['to', 'a', 'the'],
you: ['see', 'grant', 'know', 'see', 'must', 'gave'],
everybody: ['minded', 'else', 'executed', 'minding', 'laughed'],
jumping: ['up', 'merrily', 'about', 'up'],
executed: ['for', 'on'],
...
The next step is easy: choose one key randomly in the index, which will be the first word (for instance: “everybody”). Then, choose a random word among the words following the first word (for instance: “executed”). Use this new word as a key to the index to choose the next word randomly (for instance: ‘for’), and repeat until the final text has the desired length.
This will produce some perfectly English-looking text, although not quite grammatically correct. If you need a text with a more natural structure, it’s possible to create an index based on two words instead of one:
...
'hours to': ['turn'],
'hours a': ['day'],
'hours the': ['first'],
...
But correctness has a cost: randomness. There are less words following a combination of two words than a single word, so the produced text will be less random.
Faker’s Markov chains generator in Faker uses a two-words index. If you want to use a one-word index, override the second parameter of the realText()
formatter:
<?php
// generate at most 200 characters with a single-word index
echo $faker->realText(200, 1);
It doesn’t mind.’ The Gryphon replied to measure herself in a furious passion, Alice indignantly, and the next walking away. She is the earth takes twenty-four hours to get in that it would be.
This great new formatter was contributed by Tim Düsterhus. Thanks a lot to him!
The algorithm needs a large piece of text, for instance a novel, to build a good index. The larger the novel, the better. But too large a novel will slow down the generator execution, as the initial computing required to write the index will take time. Ideally, a novel between 300kB and 700kB if fine.
Faker already has three locales with a real text generator, based on the following works:
en_EN
: Alice’s Adventures in Wonderland, by Lewis Carrollde_DE
: Die Leiden des jungen Werther, by Johann Wolfgang von Goethefr_FR
: Madame Bovary, by Gustave FlaubertYou can contribute a real text generator for other locales by following the implementation of these provider. There are a few constraints, though:
Project Gutenberg is a great source of free ebooks in all languages. Look there first for the most famous novel in your language.
There is an important thing I would like to say before you leave.
The Queen smiled and passed on. ‘Who ARE you talking to?’ said one of these cakes,’ she thought, ‘and hand round the court with a cart-horse, and expecting every moment to think this a very grave voice, ‘until all the rats and–oh dear!’ cried Alice, with a sigh: ‘he taught Laughing and Grief, they used to say.’ ‘So he did, so he with his head!”’ ‘How dreadfully savage!’ exclaimed Alice. ‘That’s very curious.’ ‘It’s all his fancy, that: they never executes nobody, you know. But do cats eat bats? Do cats eat bats? Do cats eat bats?’ and sometimes, ‘Do bats eat cats?’ for, you see.
Tweet
Published on 04 Mar 2014
with tags faker php