Tuesday, July 13, 2010

A Semantic Wikipedia

Wikipedia has long been the one stop shop for quick and cheap information. If you know what you're looking for, and where it is, it can be pretty easy to find. Many have complained about Wikipedia's accuracy, but it's mere usefulness has kept it up and running for many years now. The Wikimedia Foundation, the organization behind Wikipedia, plans to keep up that trend and take the word "useful" to a whole different level. Just two words: Semantic Wikipedia.

The back end of Wikipedia is a free wiki software called MediaWiki. This software is highly extensible, as the PHP-based scripts provide hooks in various class methods so that plugins can add on functionality without having to change the core code itself (if you didn't understand that, just read on). One plugin that is of particular interest is the Semantic MediaWiki extension. When added to a MediaWiki installation, it provides the ability to tag specific information in articles. This functionality allows computers as well as humans to understand what is being said in the article, and it allows search engines and other creepy crawlers to pick up these tags and use them to provide more accurate content to users as well as more easily process non-textual data, such as images.

The Semantic MediaWiki project debuted its version 1.0 in 2007, yet Wikipedia is still without this awesome functionality. A tradition search for information today takes a Google search (or Wikipedia search) and then a manual scan for the data you are looking for. With this new extension, a search engine could extract the information for you automatically, drastically reducing the amount of time spent and thus improving user experience. Many other wikis have already implemented this extension, and the Wikimedia Foundation seems to be next. At the 2010 Semantic Technology conference, Wikimedia Foundation representatives reached out to conference attendees, developers, and users to find a way to help semantic information find its way into Wikipedia.

However, all merits are not without disadvantages. Though a semantic Wikipedia would improve the user experience on the reader's end, it would decrease the UX on the editor's end. The wiki syntax used in MediaWiki is hard enough for new users, and with the Semantic MediaWiki extension, a new item to learn is added to the ever-growing list. Semantic MediaWiki steals from MediaWiki's syntax for internal links in order to create semantic "tags" as they are called. The structure for writing a tag in an article is:
[[property name::property value|displayed text]]
The displayed text and bar can simply be removed if you want the property value itself to be shown in the article, and the property name and double-colon can be removed if you want to have a normal internal link. However, the syntax becomes very confusing when viewed in line with the text from the rest of the article, and it becomes more difficult to actually understand what the wiki-text is trying to say.

Despite the difficulties for editors, I would still like to see the age of a Semantic Wikipedia. This would mean a skyrocket in potential uses for an online encyclopedia that is already so popular. Hopefully the Wikimedia Foundation will act soon so we can get a glimpse of what Web 3.0 is like.


  1. If Semantic MediaWiki tags are confined to infoboxes and other templates (as I think they should be), the average Wikipedia editor will never have to see them, so usability won't be affected at all.

  2. True, but there are a number of things worthy of tags that are not in the infobox. Examples would be images (since that would allow search engines to associate pictures in their article with the appropriate information rather than just take the first photo and put it next to some text), a company's products (though there is an infobox item for a company's products, many have too many products to be listed, and a fuller description is in the article's main text), and there are probably more (I just found those examples by looking at the Google article alone).