WeightedRandom

From ThorxWiki
Jump to: navigation, search

A random choice, where different percentile ranges have different probabilities of chance

The ability to do weighted randomness is essential to the NIPL project - the intent being to choose a song at random, but to weight the choice in favour of songs that are known (or estimated by the system) to be preferable choice, and weight away from undesirable choices.

So how do we do it?

That is to say, how do we set the weighted scale?

Well, first let's look at what might be considered ideal cases. (these examples are in the context of NIPL)

Ideally, the higher scored choices will have a higher probability of choice. Three possible curves showing this are here: http://www.cheeky.house.cx/~nemo/extras/randomfu/random1.png

Note that unweighted randomness whould be a horizontal line.

However, the probability-against-score curve here doesn't take into account the relative population densities of the different scores. No point in choosing the highest scored result in 9 times out of 10, if there is only 1 item to choose from... so we need to look at the population-against-score values.

Again, we'll start off by looking at ideal/starting cases. At first, everything will be at a neutral/midpoint score, and as the scores database is populated, I expect this to spread into a "normal" sort of bell curve:

http://www.cheeky.house.cx/~nemo/extras/randomfu/random2.png

Of course, we have no way of really determining if this will be the natural/typical spread till after first-round implementation and analysis is complete. In the worst case, the population could jump up and down more or less arbitrarily:

http://www.cheeky.house.cx/~nemo/extras/randomfu/random3.png

So, how do we reconcile an arbitrary population spread against a desire for an increasing probability?

First up, let's map the population curve into the probability curve - some nice normalising algorithm can handle the exact changes, but let's just assume that's been done... we now have two competing curved on the same graph.

One solution would be to average between the known(population) curve, and the desired(probability) curves. It would look something like this:

http://www.cheeky.house.cx/~nemo/extras/randomfu/random4.png

(applying the same to the other probability curves, and the bell-curve population curves, I leave as an exercise to the reader)

Another solution is to consider the basic aim of the desired probability curves - that the probability consistently increases as the score increases. So we have another potential solution:

http://www.cheeky.house.cx/~nemo/extras/randomfu/random5.png

(again, I leave as an exercise to the reader to consider the same solution applied to the bell curves)


So the question is, what do we use? Stepping, or Averaging solutions?

Nemo is inclined to feel that the averaging solution is best - given the expected bellcurve shape of the population distribution. For simplicity sake, it would seem that a true unweighted average would be simplest! :)


Nemo wonders if Bayesian probabilities might be worth looking at? No doubt some other probability combining methods also.

Personal tools
Namespaces

Variants
Actions
Navigation
meta navigation
More thorx
Tools