Actually, it’s 5 4 10 12 2 9 8 11 6 7 3 1 for me, but too lazy to edit the image

  • tetris11@lemmy.ml
    link
    fedilink
    arrow-up
    70
    ·
    edit-2
    19 hours ago

    Which language provides the most random alphabetically sorted sequence?

    Data
    |  N | Eng | Dut | Ger | Tur | Lex |
    |----+-----+-----+-----+-----+-----|
    |  1 |   8 |   8 |   8 |   6 |   1 |
    |  2 |  11 |   3 |   3 |   5 |  10 |
    |  3 |   5 |   1 |   1 |   1 |  11 |
    |  4 |   4 |  11 |  11 |   9 |  12 |
    |  5 |   9 |   9 |   5 |   4 |   2 |
    |  6 |   1 |  10 |   9 |   2 |   3 |
    |  7 |   7 |  12 |   6 |  10 |   4 |
    |  8 |   6 |   2 |   7 |  11 |   5 |
    |  9 |  10 |   4 |   4 |  12 |   6 |
    | 10 |   3 |   5 |  10 |   8 |   7 |
    | 11 |  12 |   6 |   2 |   3 |   8 |
    | 12 |   2 |   7 |  12 |   7 |   9 |
    

    Sourced from comments in thread (English from image, Dutch from Vinny93, German from TJA, Turkish from some rando, Lexicographical from monogram)

    Plot with Pearson Score
    Code
    gnuplot -p -e '
      set xlabel "Base Sequence";
      set ylabel "Alphabetic";
      set xtics 1,1,12;
      set ytics 1,1,12;
      set title "Alphabetic Number Plot with Correlation Score";
      set key outside left;
      set size ratio 0.45;
      stats "alphabetic.tab" using 1:2 name "E";
      stats "" using 1:3 name "D";
      stats "" using 1:4 name "G";
      stats "" using 1:5 name "T";
      stats "" using 1:6 name "L";
      plot "" using 1:2 with lines title sprintf("%s (%.3f)", columnhead(2), E_correlation),
           "" using 1:3 with lines title sprintf("%s (%.3f)", columnhead(3), D_correlation),
           "" using 1:4 with lines title sprintf("%s (%.3f)", columnhead(4), G_correlation),
           "" using 1:5 with lines title sprintf("%s (%.3f)", columnhead(5), T_correlation),
           "" using 1:6 with lines title sprintf("%s (%.3f)", columnhead(6), L_correlation)
    '
    

    It looks like the most random language is Dutch (closest to zero), and Turkish appears to be the least random (probably the 10,11,12 sequence skewed it).

    Although Lexicographic also appears to have a near zero score, despite being the most ordered. I think Pearson is a bad measure here, and maybe a Serial Correlation test might be better.