MB-01/Implementation

From ThorxWiki
Jump to: navigation, search

At the moment, Moby is a first-order Markov chaining bot.

After some consideration, I *don't* need an SQL backend for a Markov bot. I don't even need a DBM backend. I can make do with a text-file of tab-separated values. The only three columns I need are "current word", "next word", and "speaker".

For the moment, I'm storing these as an ordinary Python list of Python tuples. This worked well, up until about 40,000 rows, when my 300Mhz PC spent about ten seconds of full CPU usage to generate one chain.

Then I had the bright idea of putting the data in a DBM file, and saving much time and energy, and re-using code from dagny's infobot module. I quickly added code to add rows to the DBM file, started Moby back up, and waited.

Next morning, I woke up to find a 78MB data file, and Moby crashed with an unspecified error (the tab-delimited file, stored on disk, is only 1.1MB now, and 302k zipped).

Currently, I still store data tab-delimited and in a Python list in memory, but now I have two hash-tables - one keyed of "current word" and one keyed off "next word" - to help speed the chaining process. It looks something like this:

self.dataByCurrWord["roses"] = [
       ("roses","are","Screwtape"),
       ("roses","stink.","jordanb"),
       ("roses","seem","bbz")
]

where [] denotes a Python list, and () denotes a Python tuple.

For the most part, markov chains, even chains for a particular person (an un-indexed column), are instant. Sometimes there is lag, I suspect that is due to Moby saving his current data to disk (still an expensive operation, which occurrs every 60 seconds), or Moby being swapped to disk at that point.

For "markov by <speaker>", each time it builds a list of potential next words, it then goes through that list and filters out words that aren't spoken by <speaker> before choosing a next word. A mite slower, but meh.

Personal tools
Namespaces

Variants
Actions
Navigation
meta navigation
More thorx
Tools