CG vs TWIC – round 1

Well, the title is a little more provocative than the content. But the issue remains – should <CG> try to be as inclusive (or even more inclusive) than <TWIC>?

This question arose after I saw yet another request by a <CG> member for games missing from <CG> to be included:

PhilFeeley: I wonder why <> “only” has about 800,000 games in its database. Mark Crowther over at TWIC is advertising his 1 million+ database for sale. Other databases (e.g. ChessKing, MegaDatabase, etc.) have multiple millions. Why doesn’t have at least all the possible games from 2000 on? I keep coming across games in books and DVDs that are not here.

Another, much older post shows that much of what <CG> does collect is submitted by users:

Aug-05-09 Troller: <Dredge> If you find that a tournament is missing, then contact I suppose someone has done so in this case, which is why Dutch Open has a page. Politiken Cup only got a page when I sent an email with the link to the tournament and the pgn download.

Of course, it has long been noted that <CG> does not have as extensive a collection of games as other online sites – which is partly by plan. Still, <TWIC> is a well known source of current events, and generally includes all tournaments of interest to the general public. <CG> has been known to rely on it in the past:

Premium Chessgames Member I swear, Mark Crowther has some sort of magic wand. Just when we’re halfway finished cleaning up the BNblitz PGN he provides the clean scores at The Week in Chess ( ready for download. And so, partially thanks to TWIC, we can now present the BNbank Blitz (2009). Enjoy!

The beauty of <TWIC>, besides its coverage, is that the tournaments are almost always properly normalized, and the PGN is cleaned up. In fact, “almost always” might be just be better put as simply “always”.

So, why doesn’t <CG> generally “shadow” <TWIC>, and republish the tournament/games that it doesn’t already have?

As the above post shows, it already does for selected tournaments, when pressed by need. I argued for a more comprehensive strategy:

zanzibar: I call attention to this post:
Kibitzer’s Café (kibitz #168167)

So, why doesn’t <CG> contain all of <TWIC>?

Are there copyright issues?

I know there might be conflicts, given that <CG> often distributes on-going tournament games before <TWIC> gets to publishing them.

But, I also know that such games often need cleaning up, and can lead to erroneous results otherwise. <TWIC> does a great job at cleaning up games.

But <CG>‘s structure makes such clean-up more difficult. I’ve often felt that there should be a two-stage process for live games. A preliminary stage, then a <TWIC>-quality or better stage.

None of this is relevant for the many <TWIC> tournaments that<CG> lacks, which can just be incorporated wholesale (that is, once players are matched up correctly).

<tpstar> send a reply which should be read (just click on the date on any of these comments to jump to the relevant <CG> forum post). But let me quote <chessgames> reply:

Premium Chessgames Member On the topic of why we have less than a million games and not gazillions of them, not much has changed our opinions, but I have three things to add:<1> Zanzibar suggested “if it’s good enough for TWIC it should be good enough for Chessgames.”

I strongly disagree with this. Understand how TWIC works. I could mail Mark Crowther a PGN of a local high school event, and as long as the PGN is in good shape, and there’s not too many players with 3-digit ratings, he’d be happy to merge it into his next release. He effectively has no criteria for worthiness other than whether the PGN is syntactically correct.

I am not criticizing Mr. Crowther for this approach—I praise him for it! He wants to disseminate as much chess as possible and leave the job of sorting the gold from the dross to others. I wouldn’t have him change a thing.

<2> From time to time we resynchronize with TWIC by seeking recent games by highly rated players that are missing in our database. Ideally there should be none but quite often we find that we missed a fairly major event or two and insert the games after the fact.

<3> Most importantly, please trust me that one day you (Zanzibar et al) will get what you want: the ability to look up millions of games here, new games, amateur games, fairly high level games that we slip through the cracks, internet games, you name it. We will still retain the concept of the “master database” but allow other methods to get to the vast ocean of secondary games.

That serves as a fair documentation of the discussion thus far.

I should have included some additional user requests for tournaments to be included on <CG>, but it would just be a somewhat random sampling, and I’m having a bit of difficultly finding representative posts anyways.


