US Open (1996) – first pass

Having developed the machinery with ’97, let’s explore ’96.

First step is to load in the USCF Swiss xtab, and hope it gets parsed out correctly:

C:Users\..\Chess> py -i parse-xtab.py "USCF - US Open (1996).xtab.txt"


Processed input:  N_players = 543,  N_rounds = 12,  N_games =  2700

                  N_byes = 542,  N_unplayed = 522,  N_forfeits = 26

Well, that looks about right. Another ~500 player, 12-round tournament.

The N_players number is mislabeled, it should read N_entries. Let’s see how many multiple entries we have:

>>> len( [n for n in N if '(2)' in n] )
28
>>> len( [n for n in N if '(3)' in n] )
0
>>> 543 - 28 = 515

I maintain two data parallel structures, one indexed by standing, M, the other by entry name, N. To allow unique names for multiple entries I add “(2)”, or, if necessary “(3)”, to the name. That makes a convenient tag to find multiple entries. Unlike 1997 there are no triple entrants for this year – maybe things will be simpler (one can only hope). The number of duplicate entries is 28, making the number of unique players entered into the tournament 515.

The first stage of exploration involves trying to divide the players into different subgroups according to early round play. For the 1997 tournament this was very complicated due to liberal scheduling (i.e. cross contamination as I called it). Let’s see what we have this year…

We begin by diving straight in, looking at grouping for the first 25 players-

>>> for i in range(1,26):
        print "%2d %-*s %3d" % (i, max_len, M[i].name, group(i)[0] )

 1 Gabriel Schwartzman            1
 2 Alex Yermolinsky               2
 3 Gregory Serper                 1
 4 Aleks Wojtkiewicz              4
 5 Alexander Ivanov               4
 6 Angelo Young                   4
 7 Jonathan Schroer               4
 8 Alexander Shabalov             4
 9 Alexandr V Reprintsev          9
10 Lawrence Charles Kaufman       9
11 Ronald Wayne Burnett          11
12 Greg Shahade                   9
13 Michael J Wierzbicki           9
14 Federico Angel Garcia          9
15 Irina Krush                    2
16 Vladislov Fedorov              4
17 Miles F Ardaman                4
18 Emory Tate                     4
19 Walter J Shipman              19
20 John A Curdo                  20
21 John C Meyer                   9
22 Arthur B Bisguier              4
23 Steven J Szpisjak              9
24 James Allen                    9
25 George Mortimer Kramer         2

So it appears that the groupings already look very clean without having to use any exclude_list. I don’t know if color-coding helps or hurts to make it clearer, but it does show the pattern. The groups are enumerated by the index of their top player – 1, 2, 4, and 9 form the main groups. We will return to the singlets, 11, 19 and 20 later, but they are expected to be BP-ers (aka Busy Person, i.e. players taking byes in R1-6 and so forming no R1-4 groups).

Here is a breakdown of the groupings found just by looking at the Swiss xtab:

>>> for i in [1,2,4,9]:
        print "%3d  %-*s %3d" % ( i, max_len, M[i].name, len( group(i) ) )

  1  Gabriel Schwartzman          100
  2  Alex Yermolinsky              45
  4  Aleks Wojtkiewicz             76
  9  Alexandr V Reprintsev        254

Listed are the player index, name, and the number of entries in the group they lead.

Reprintsev played in a D5 schedule in 1997, and perhaps did the same here. I don’t know without help from <Phony> or some other knowledgeable authority on US Opens, and on 1996 in particular. The size of Reprintsev’s group is a bit large, and might end up being broken down into two smaller groups. Quite possible, but maybe not.

This is an awesome start, since we have such a clean breakdown into groups. Lucky.

OK, then, let’s try to put the tournament together. Since I don’t know the exact sections, I’ll use G1, G2… to label and track the groups. One 1997 group which I assume is present in 1996 is the Busy Person, or BP, group of players with byes in R1-R6. Let’s find them, and the other groups in my python session:

>>> BP = [ m for m in M if all( x[1] == 0 for x in M[m].X[:6] ) ]
>>> len(BP)
58

>>> G1 = group(1); G2 = group(2); G4 = group(4); G9 = group(9)

>>> A = G1 + G2 + G4 + G9 + BP

This a little python list comprehension in the first statement. I take a R1-6 slice from the xtab “M[m].X[:6]” and require byes in each round “all( x[1] == 0 …)”. One needs to be a little careful and since the byes are scoring. This code is perhaps best.

Anyways, it’s just some coding detail – the important thing is that there are 58 BP-ers.

Then I form a master group of all categorized entries in the list A. It consists of the four known groups + BP entries. The question is whether this list is complete, i.e. does it include all the entries from the tournament, and if not, why not. Let’ s look:

>>> A = G1 + G2 + G4 + G9 + BP

>>> miss = [ m for m in M if m not in A]
>>> len(miss)
10

>>> Gp( miss )

  38   4  NY  2281   Gelman, Geoffrey                    0    0    0    0  199  347  123   77    5  107   52  108
 142   4  MD  1718   Ita, Eyo                            0    0    0    0  136  166  111  278   15  215  149   63
 169   5  MD  1952   Griffin, Wayne C                    0    0    0    0    0  297   68   73   66   84  293   87
 171   4  PA  2011   Shutt, Stephen D                    0    0    0    0  293  185  348  298  392   49  232   74
 276   6  NY  2200   Kreitner, Ilan                      0    0    0    0    0  336  172  323  221  178   97    0
 303   4  NV  1700   Horne, Kenneth B                    0    0    0    0  157  439   93  217  229   97  412  180
 356   4  MD  1716   Andree, Mark                        0    0    0    0  165  417  428  162  441  247  409  227
 365   4  VA  1636   Johnson, Scott D                    0    0    0    0  458  489  403  470  419  180  343  245
 532  11  PA  1459   Traldi, Lorenzo                     0    0    0    0    0  542    0    0    0    0    0    0
 542  11  KY  1303   Pary, Raymond                       0    0    0    0    0  532    0    0    0    0    0    0

>>> Gp( miss, group_flag=1 )

  38   4  NY  2281   Gelman, Geoffrey                    0    0    0    0    9    9    9    9    4    4    9    9
 142   4  MD  1718   Ita, Eyo                            0    0    0    0    4    4    4    4    2    9    9    1
 169   5  MD  1952   Griffin, Wayne C                    0    0    0    0    0    9    9    2    9    9    1    9
 171   4  PA  2011   Shutt, Stephen D                    0    0    0    0    1    1    1    1    4    1    4    9
 276   6  NY  2200   Kreitner, Ilan                      0    0    0    0    0    4    4    4    9    1    9    0
 303   4  NV  1700   Horne, Kenneth B                    0    0    0    0    9    9   93    9    1    9    1    9
 356   4  MD  1716   Andree, Mark                        0    0    0    0    9    9    9  162    9    9    4    2
 365   4  VA  1636   Johnson, Scott D                    0    0    0    0    4    4    4    4    4    9    1    1
 532  11  PA  1459   Traldi, Lorenzo                     0    0    0    0    0  542    0    0    0    0    0    0
 542  11  KY  1303   Pary, Raymond                       0    0    0    0    0  532    0    0    0    0    0    0

So there are 10 entries which were not grouped, these are singlets. Why? Listing them out reveals these entries all have byes (or forfeits, or unplayed games) during R1-4, and several for R1-5. My grouping algorithm only uses R1-4 by default. So these entries were all missed, i.e. were singlets.

Looking at R5-6 shows that most of these entries can also be categorized in a “standard” fashion into our canonical groups. This can be done by adding them in by hand, or rerunning my group() function with different parameters (best to understand the data with the more stringent cut first).

However, this year, like in 1997, there is one pair of amateurs that seem to play one game in a “featured” section all its own. I highlighted it in red.

Moving on, at this point we can return to the players with multiple entries and prepare a “duplicate” report:

>>> dReport()

Yermolinsky, Alex                   2 -   2 /  45     527 -   9 / 254
Schroer, Jonathan                   7 -   4 /  76     500 -   1 / 100
Neff, Elliott                      28 -   9 / 254     512 -   2 /  45
Johnson, Ben C                     32 -   1 / 100     501 -   9 / 254
Traldi, Matthew R                  57 -   9 / 254     480 -   2 /  45
Hahn, Anna                         64 -   2 /  45     535 -   9 / 254
Massie, Curtis C                   86 -   9 / 254     460 -   2 /  45
Rusyniak, George                   95 -   4 /  76     503 -   1 / 100
Rufty, Alan E                     150 -   2 /  45     502 -   9 / 254
Neff, Raphael                     203 -   9 / 254     538 -   2 /  45
Rotenberg, Marc S                 206 -   2 /  45     384 -   4 /  76
Slack, David Richmond             211 -   9 / 254     481 -   2 /  45
Traldi, Arthur                    253 -   9 / 254     504 -   2 /  45
Brand, John                       265 -   9 / 254     506 -   2 /  45
Kreitner, Ilan                    276 - 276 /   1     459 - 459 /   1
Neff, Dr Robert                   322 -   9 / 254     539 -   2 /  45
Neff, Ethan                       324 -   9 / 254     536 -   2 /  45
Hendershot, David                 343 -   1 / 100     515 -   9 / 254
Wilson, Wylie                     346 -   9 / 254     505 -   2 /  45
Markowski, Alina F                362 -   9 / 254     520 -   2 /  45
Traldi, Lorenzo                   372 -   9 / 254     532 - 532 /   1
Cohen, Jason B                    383 -   2 /  45     479 -   9 / 254
Woods, Charles E                  468 -   2 /  45     523 -   9 / 254
Randolph, Donald E                476 -   2 /  45     525 -   9 / 254
Bradley, Dan                      490 -   2 /  45     524 -   9 / 254
Robinson, Dick A                  521 -   2 /  45     540 -   9 / 254
Dorsey, Darren C                  528 -   2 /  45     529 -   9 / 254
Pary, Raymond                     533 -   9 / 254     542 - 542 /   1

Again, having done this much it seems that the tournament is manageable, and only requires a seasoned hand to guide it in for the landing.

And so, once again, I’ll just declare victory while the declaring is good. Cheers.

z

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s