Skip to content

Society for American Baseball Research Projects

Login SABR Home  

Blueprints
You are here: Home Projects Encyclopedia SABR Encyclopedia: First Update
SABR Encyclopedia: First Update
Written by Ted Turocy   
Monday, 13 April 2009 12:49

This is the first in a series of occasional updates on the development of the SABR Encyclopedia wiki.

It's been about six weeks since the Board approved the concept of the Encyclopedia and authorized us to begin work. After a month of planning, on April 1, the first automated "bot" went into action, creating a page for each person listed in the Minor Leagues Database, which is the single largest dataset anywhere of people involved with professional baseball. About five days later, the upload process was completed, and a few intrepid souls, Jack Morris, Cliff Blau, Joel Dinda, and John Zajc among them, have begun the task of organizing and expanding biographical knowledge about this set of people. In the meanwhile, pages have automatically been built out for (most) professional leagues and teams. Pages for each ballpark to host at least one Major League game have also been created.

A major focus of development in the coming weeks will be organizing these pages and "stubbing" out pages for other persons, leagues, and concepts. This breadth-first approach is motivated by the belief that most potential contributors will be more comfortable expanding existing pages rather than creating new ones from scratch. Organization and navigation, through categories, navboxes, and the like, will make it possible for contributors to find the best pages on which to make their contributions.

We have begun making use of the Semantic Mediawiki extension within the wiki. We are very excited about the possibilities this extension offers, to allow us to autogenerate information within the wiki. We currently generate roster tables for each club using this extension, and have just implemented a similar feature to autopopulate executive roles for leagues. Similar features for club managers and general managers, and umpires for leagues, are intended. We are also using this feature to create an automatically-updated necrology for 2009, which we hope will help Rod Nelson and the Emerald Guide crew get a head start on next year's edition. (Even a few days later, it still affects me when I see Nick Adenhart's name at the top of that page.)

A key design feature is the use of templates to record information systematically about entities in the wiki for easy extraction down the road. Some of these templates wrap Semantic Mediawiki properties, so contributors don't need to learn how SMW works; the creation of properties happens automatically behind the scenes. Even where templates do not wrap SMW properties, they are easy enough to parse that tools will be able to spider the wiki to extract and cross-check information.

One such spider program being developed now is a program to extract the basic biographical data and update the Persons table in the Minor Leagues Database. The Encyclopedia wiki is now the primary place to update biographical data, both the basic demographics (name, height, weight, date of birth, and so on), and the assignment of playing, managing, and other records to each person. We are hopeful that this will ease the task of processing this information in a timely fashion, as well as minimize the chance of errors. Early experience indicates that this will be a viable solution, if managed properly.

We will continue expanding the breadth of the Encyclopedia in the coming weeks. One of the next datasets to come will be minor league ballparks, based on Gord Brown's register. A few states' worth have been wikified already, and we will soon be seeking volunteers to carry out the rest. Also on the shortlist of major tasks are work on the collegiate summaries Gary Benner has, and updating the wiki with major league managers and umpires, which records are currently largely missing.

For the sake of posterity, as I write this, the front page of the wiki report 217,779 pages, including 169,789 people, 4933 league-seasons, 31003 team-seasons, and 301 ballparks. There have been 250,509 page edits, and 245,292 pages, which means there have been at least around 5,300 non-bot edits. I don't put too much stock in raw edit statistics; after all, mechanically adding navboxes to all the seasons of a league creates a lot of mindless edits that don't directly do very much yet. Even at that, given the small number of us who are active right now, that's a sizeable number, and I take it as an indication that we're off to a good start.

 

Last Updated on Wednesday, 16 December 2009 12:04