Non-genetic sequencing

[Sequence straight to the tournament data]

Perhaps seeing how the games were added to a <CG> tournament might be useful. This can roughly be done via the gid. I wrote a simple routine to track this, here’s the results for  <56th US Open (1955)>, a tournament discussed on the forum:

May-30-15
Premium Chessgames Member
chessgames.com: Here’s what I’m investigating right now: An anomaly discovered in the 56th US Open (1955).This tournament contains 315 games and was inducted into the Tournament Index Sun Jan 19 2014.Taking a header at random from the event gives us this:<
[Event “56th US Open”]
[Site “Long Beach, CA”]
[Date “1955.08.11”]
[EventDate “?”]
>

Maybe not perfect—no event date, and “USA” should be part of the the Site tag—but that’s how it was inducted and all of the games should conform to that standard.

However, they don’t all conform. 16 of them mysteriously have headers that look like this:

<
[Event “US Open”]
[Site “Long Beach, CA”]
[Date “1955.08.15”]
[EventDate “?”]
>

It doesn’t say <56th US Open> it says <US Open>. What happened to the <56th>?

All of the games without the <56th> were uploaded by Phoni Benoni back in 2011, and as such they have nearly consecutive GIDs:

1751663, 1751664, 1751666, 1751667, 1751668, 1751669, 1751671, 1751672, 1751673, 1751674, 1751675, 1751676, 1751677, 1751678, 1751679, 1751680.

So it would appear that gremlins snuck in during the night and stripped out <56th> from 16 consecutive games in the database, but left all the others alone. What makes these 16 games so special?

May-30-15
Premium Chessgames Member
chessgames.com: Another observation/revision: although I swear that I see these games in Phony’s upload of 2011, the GIDs of these 16 games acts as a kind of loose timestamp. It places them as being uploaded AFTER March 2014, which is after the tournament was inducted. That’s very confusing in itself.Anyhow, now knowing they were uploaded after the tournament was inducted, that would certainly explain why they would not have normalized headers.Then the new question becomes, why would they be added to the existing tournament? After all, the names aren’t even the same, and even if they were the same, a special clause prevents new games from being auto-inserted into existing Tournament Index events.To top it off, no admin has touched any of these 16 games in the past year.
May-30-15 zanzibar: <chessgames> now you’re scaring me.That’s enough ghost stories for tonight!
May-30-15
Premium Chessgames Member
Benzol: <That’s enough ghost stories for tonight!>Is this the chessgames forum or a story by M R James? You be the judge.🙂
Jun-02-15
Premium Chessgames Member
Tabanus: <1751663, 1751664, 1751666, 1751667, 1751668, 1751669, 1751671, 1751672, 1751673, 1751674, 1751675, 1751676, 1751677, 1751678, 1751679, 1751680> Perhaps all these games had “alternate scores”, and the wrong game was deleted? If that’s so, maybe there’s not any ‘bug’ but a ‘bugger’. Or you are hacked! Or the software deleted an old duplicate for a “new” reason?I’m not sure but think that all the problem games in “my” voted-in tournaments had alternate scores. Just trying to contribute with my chisel.

(Forum posts extracted according to thread)


I forgot about the discussion and chimed in again. <Tab> reminded me, and I went off to code up a small routine which produced this:

>>> seq(t)

Tournament: 83016  56th US Open (1955),  N_games = 315

# tid gid pid1 pid2   -- game data
 83016  1070984  11209  18720     1955.08.12   D10  23     (R5)  1-0   Reshevsky, Samuel -- Burger, Karl

 83016  1119619  18982  21925     1955.08.19   C98  66     (R12)  0-1   Evans, Larry Melvyn -- Rossolimo, Nicolas

 83016  1649308 134474  43983     1955.08.08   E30  28     (R1)  0-1   Blaney, John W -- Zalys, Ignas
 83016  1649337 133669  21871     1955.08.12   A04  73     (R5)  1-0   Yarmak, Saul -- Steiner, Herman
30

 83016  1649396  19206  11209     1955.08.13   D32  50     (R6)  0-1   Brandts, Paul -- Reshevsky, Samuel
 83016  1649519  43983  62488     1955.08.17   E11  68     (R10)  1-0   Zalys, Ignas -- Rozsa, Bela
124

 83016  1649521  92422 134498     1955.08.18   B50  36     (R11)  1-0   Almgren, Sven Elias -- Koutz, Russell
 83016  1649575  43983 134475     1955.08.19   D39  37     (R12)  1-0   Zalys, Ignas -- Wallace, Charles
55

 83016  1751663 138066  97792     1955.08.13   C57  12     (R6)  0-1   Hart, Michael -- Jacobs, Robert Merton
 83016  1751664  24250 135064     1955.08.09   A53  43     (R2)  0-1   Pupols, Viktors -- Kaufman, Allen
2

 83016  1751666  15474 149557     1955.08.08   E66  32     (R1)  1-0   Byrne, Donald -- Colby, Kenneth Mark
 83016  1751669  15474  13157     1955.08.11   D02  41     (R4)  1-0   Byrne, Donald -- Bagby, Charles L
4

 83016  1751671  26427  10176     1955.08.16   D07  48     (R9)  0-1   Graf-Stevenson, Sonja -- Levin, Eugene
 83016  1751680  40284  92435     1955.08.19   C02  51     (R12)   =    Gross, Henry -- Myers, Hugh Edward
10

 83016  1785004 152478  19232     1955.08.08   B02  43     (R1)  0-1   Markus, Albert -- Seidman, Herbert
 83016  1785091  19204 133669     1955.08.11   E81  32     (R4)  1-0   Sherwin, James T -- Yarmak, Saul
88

>>> 104+55+124+30+2
315

Again, apologies if you have word-wrap in your browser. The games come in singlets, or in sequences – where a sequence is a series of tournament games with sequence gid’s (i.e. gid0, gid0+1, gid0+2, … etc.).

A singlet is a game whose gid is isolated. Like the first two games listed above.

The sequences show up as doublets, a starting and finishing game, with the tally of games in the sequence.

I can list this another way:

1070984, 1119619, 1649308-1649337, 1649396-1649519, 1649521-1649575,
                               30              124               55

1751663-1751664, 1751666-1751669, 1751671-1751680, 1785004-1785091
              2                4               10               88

And I can use color coding to match up interrupted sequences (where there is some small number of injected games not belonging to the tournament).

Singlets (brown), Pure Sequences (blue), Interrupted Sequences (green)

For some reason, displaying data visually like this makes it easier for me to see patterns. Or would that be “imagine” patterns?!

zz

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s