In this, the second test of Blood Bowl dice, I used CHESSEX block dice and again rolled them 1000 times.
I again decided to set the measure of statistical significance
again at 5% (p-value >= 0.05). That would mean that the actual dice outcomes would have to deviate by 5% or more from the expected outcomes for the test to indicate a potential problem. I figured 5% over 1000 rolls should be safe enough.
Test results:
Outcomes | Total | Expected | % diff |
Pow | 179 | 166.67 | 7.40 |
Pow! | 155 | 166.67 | -7.00 |
Skull | 174 | 166.67 | 4.40 |
Pow/Skull | 165 | 166.67 | -1.00 |
Push | 327 | 333.33 | -1.90 |
Check | 1000 | 1000 | 0.00 |
Conclusions
The dice finished outside the 5% threshold for both Pow and Pow! and so from this test, I have to highlight a potential issues as to the fairness of the CHESSEX block dice. More specifically, teams heavy in the DODGE skill could have problems due to the higher than normal occurrence of Pow and lower than normal occurrence of Pow!
As for the test itself, rolling more dice would, as always, provide a greater degree of certainty, so the results would be safer after 10,000 or even 100,000 rolls. Also, any study could provide a set of fluke results and so these outcomes would need repeating in a second test for significance to be assured.
Interesting! I guess we could try a similar test to check on custom D6.
ReplyDeleteYou should do a bootstrapping of your results to get an idea of the error inherent in your data set.
ReplyDeleteRandomly draw 1000 results from your data with replacement. That is the probability of drawing a certain die face does not change as you construct the set of 1000. Determine the per cent deviation from expectation for each face from this set. Then repeat this process many times ~100 but you'll do this with a computer so it doesn't really matter. The result is a distribution of per cent differences for each face, you can calculate the standard deviation and get an idea of the error.
It's a bit more work but it will give you a good idea of just how significant the 7% deviations are.
I would love to do tests like this on my chessex dice but the time required for rolling and recording is significant. Good job doing this 1000 times!