Rank and File

While storing and searching through categorized, indexed data helps make corporate knowledge more accessible, it doesn't address the problem of assigning value to the stored content. Web search engines are faced with a similar challenge, how to direct the user to the most relevant content about a particular topic. One method is to use information ranking, where retrieved documents are presented in ranked order based on how closely the phrase being searched for is matched, the frequency with which it appears, or where it appears in a given document.

The most common type of self-organizing Web site employs a similar approach. However, instead of performing a search on documents, the documents are automatically ranked and presented based on the importance assigned to them by their readers. This importance ranking can be arrived at through a system of voting, say on scale of one to ten. Documents with the highest average votes are presented in descending order.

Many community sites where users contribute written works or community news items use this approach. Phpnuke.org, for example, lets you set up polls for voting on a posted topic. The poll is presented as a set of multiple-choice questions, and the results of the votes cast are displayed in a horizontal bar chart.

Qualitative classification is a more complex permutation of the same idea, in which individual content items are organized based on feedback reviews that the system receives about them. The feedback is scanned for keywords or phrases by which the product, service or article can be qualitatively grouped.

You can extend the concept to form classifications based on a composite set of criteria that may include keywords, counts, number of purchases or hits, and so on. This criteria is similar to that used by search engines to determine the relevance of a Web site. However, the key difference is that qualitative classification provides grouping based on subjective validation that's provided by the Web site's users and visitors.

For our hypothetical company, information ranking lets users sift through the most useful documents in a knowledge database. The ranking can also benefit the company's customer support representatives, by automatically ranking the most common problems posted to the customer support Web site and intuitively presenting solutions. In addition, customers can be easily directed to experts in a given subject area through searches of qualitatively classified feedback and comments from other site users.

Everything in Moderation

Sites that draw upon their users for content ranking and classification face a pesky problem: the potential for dishonest contributors to skew the results. People have devised numerous technical strategies to counteract such behavior, including tracking users' IP addresses or limiting how soon a user can vote after reading a document. However, even with solutions like these in place, it's difficult to stop someone bent on subversion.

Another approach to overcoming the problem of dishonest or biased information ranking is to employ a moderation system. In such systems, a moderator assists in filtering and categorizing useful information. But in this model, the moderator has primary responsibility for the valuation of site data, rather than opening the decision up to a vote by every user.

Note that this doesn't necessarily mean that an organization must dedicate a particular employee to act as moderator. To do that would only bring back the old problem of requiring administrative intervention before new content could be posted or ranked. Instead, the self-organizing moderation model uses a site's user base to identify moderators. Typically, moderators are granted special status to modify or present a different content organization. How a moderator is chosen differs from site to site.

Some Web sites, like Plastic.com, offer the role to a single user by judging the frequency and quality of his or her contribution to the site. Plastic.com picks a heavy user at random. That user is allowed to moderate comments positively or negatively. Comments tagged negatively tend to be filtered out, while comments tagged positively by many moderators filter to the top. The user's role as a moderator expires after two days.

Slashdot.org, which is based on the open-source Slash software package, employs an even more random approach. Any user is potentially eligible to be chosen as a moderator at random. Many moderators can be active at once, and each is given a certain number of moderation points that can be expended to rate a given piece of content up or down. Even if the user hasn't used all of the moderation points, his or her moderator status expires after a few days. As a further defense against malicious moderation practices, even non-moderators are given the opportunity to "meta-moderate," or to point out cases of abuse.

Our hypothetical company might look over user profiles to select users with a particular level of expertise and let these people rate the usefulness of information provided by other users. Over time, the results of this moderation could even become a rating system for the site contributors themselves, with ratings stored in contributors' profiles. This system further aids in identifying contributors who are experts in a given subject.

Adding a Personal Touch

Site builders can take advantage of public domain software like Slash or PHP-Nuke, or they can build customized content management systems. Either way, for self-organization to reach the next level of sophistication, there's some work to be done. Despite attempts to rank content and moderate submissions, not every user will have the same goals when accessing a site, nor will every user's interests tend toward the same information. For this reason, the most effective self-organizing sites will need to continually evolve their personalities to serve individual users' preferences.

User registration is the starting point of such a customized site, because it formalizes a Web site's relationship with its users. For some sites, registration may only entail providing a login name and password, while others may require more detailed information. Whatever the approach, registration is an essential tool for identifying users, so that you can then give them customized information.

Personalization and customization features take many forms. In one form, information content is customized to suit the user based on his or her stored user profile. In another, the user selects the type of specific information he wants and chooses how he wants it arranged on the browser. Sophisticated personalization systems attempt to track user preferences automatically, and invisibly build profiles over time. (For a complete discussion of personalization and customization techniques, see "Getting Personal" in the November 2001 issue of Web Techniques.)

When companies use better personalization and customization technologies to augment existing techniques, self-organization will become even more valuable. Although the technology hasn't been perfected, self-organization already lets users sift through the deluge of information more confidently. For site-builders, it automates the process of collecting and filtering relevant information from all sources. In addition, it's achievable with current Web technology. Self-organization is still in its infancy, but it has the potential to add a new functional dimension to Web capabilities.

Hisham Alam is a principal consultant for PriceWaterhouseCoopers and has extensive experience implementing strategies and solutions in Web-based applications, data warehousing, and technical architecture. You can email him at hisham@hisham.net.