MB-01/Implementation
From ThorxWiki
(Difference between revisions)
m (link fix) |
m (Fixed the MarkovChains link) |
||
(One intermediate revision by one user not shown) | |||
Line 1: | Line 1: | ||
− | At the moment, Moby is a first-order [[MarkovChains|Markov chaining]] bot. |
+ | At the moment, Moby is a first-order [[MarkovChains|MarkovChaining]] bot. |
− | After some consideration, I *don't* need an SQL backend for a Markov bot. I don't even need a DBM backend. I can make do with a text-file of tab-separated values. The only three columns I need are "current word", "next word", and "speaker". |
+ | After some consideration, I *don't* need an SQL backend for a Markov bot. I don't even need a DBM backend. I can make do with a text-file of tab-separated values. The only three columns I need are "previous word", "next word", and "author". |
− | For the moment, I'm storing these as an ordinary Python list of Python tuples. This worked well, up until about 40,000 rows, when my 300Mhz PC spent about ten seconds of full CPU usage to generate one chain. |
+ | To find the first word of a chain from all #[[afda]] users, I'd do the equivalent of a statement like this: |
− | Then I had the bright idea of putting the data in a DBM file, and saving much time and energy, and re-using code from dagny's infobot module. I quickly added code to add rows to the DBM file, started Moby back up, and waited. |
+ | <pre> |
+ | select NextWord from Markov where PrevWord=''; |
||
+ | </pre> |
||
− | Next morning, I woke up to find a 78MB data file, and Moby crashed with an unspecified error (the tab-delimited file, stored on disk, is only 1.1MB now, and 302k zipped). |
+ | Then I get a list of words, pick one a random (say, "jordanb:"), and then run: |
− | Currently, I still store data tab-delimited and in a Python list in memory, but now I have two hash-tables - one keyed of "current word" and one keyed off "next word" - to help speed the chaining process. It looks something like this: |
+ | <pre> |
+ | select NextWord from Markov where PrevWord='jordanb:'; |
||
+ | </pre> |
||
− | self.data[[ByCurrWord]]["roses"] = [ |
+ | Lather, rinse, repeat. |
− | ("roses","are","Screwtape"), |
||
− | ("roses","stink.","jordanb"), |
||
− | ("roses","seem","bbz") |
||
− | ] |
||
− | where [] denotes a Python list, and () denotes a Python tuple. |
+ | For a markov based on a particular person, something like the following: |
− | For the most part, markov chains, even chains for a particular person (an un-indexed column), are instant. Sometimes there is lag, I suspect that is due to Moby saving his current data to disk (still an expensive operation, which occurrs every 60 seconds), or Moby being swapped to disk at that point. |
+ | <pre> |
+ | select NextWord from Markov where PrevWord='' and Author='Screwtape'; |
||
+ | </pre> |
||
− | For "markov by <speaker>", each time it builds a list of potential next words, it then goes through that list and filters out words that aren't spoken by <speaker> before choosing a next word. A mite slower, but meh. |
+ | <nowiki>I stop, of course, when NextWord is empty.</nowiki> |
Revision as of 16:07, 17 April 2002
At the moment, Moby is a first-order MarkovChaining bot.
After some consideration, I *don't* need an SQL backend for a Markov bot. I don't even need a DBM backend. I can make do with a text-file of tab-separated values. The only three columns I need are "previous word", "next word", and "author".
To find the first word of a chain from all #afda users, I'd do the equivalent of a statement like this:
select NextWord from Markov where PrevWord='';
Then I get a list of words, pick one a random (say, "jordanb:"), and then run:
select NextWord from Markov where PrevWord='jordanb:';
Lather, rinse, repeat.
For a markov based on a particular person, something like the following:
select NextWord from Markov where PrevWord='' and Author='Screwtape';
I stop, of course, when NextWord is empty.