Search engines strive to provide users with the highest quality results. In addition to the personal experience of every search engine user, numerous retrieval tests (see, for example, Griesbaum et al., 2002, Griesbaum 2004) show that this can only be partially achieved, and that it can be inferred that the search engines succeed only partially in finding relevant hits in the result lists bring to.
A quality assessment takes place in the ranking by finding relevant documents using classical text-statistical methods, the order of which is then additionally changed by static or dynamic quality measures calculated by linktopological methods. In addition to the evaluation of the linking structure of the Web, additional quality measures are proposed to further improve the ranking (Mandl 2005).
In addition to the sole ranking, which takes into account all documents contained in the database of the search engine, some search engines also consider sources that are listed separately (Lewandowski 2004, 186ff.). This can be both external sources that are considered to be of high quality, as well as sources from the offer of the search engine itself. The last case is mainly seen on search engines, which simultaneously operated a portal offer (for example, Yahoo).
While in the ranking individual documents are listed according to their quality in relation to the search query, in the latter case entire information resources (ie sources) are displayed preferentially. In the case of the popular search engines, these are in each case only a few sources which are highly relevant for their subject area, and in addition they are those sources which can not be accessed by the search engines or which are difficult to access on the search engines not be fully developed. An example of this is the patent database of the US Patent Office, whose documents (in the form of individual HTML pages) are basically contained in Google. When entering the word patent and a patent number, Google also refers to the patents database above the regular search results.1
Finally, a reference to high-quality sources is made in the hit lists of the search engines by displaying hits from a web directory or by displaying suitable directory categories. As with the integrated information resources, the sources in the directories are manually selected and thus offer a tested quality. Thus, such hits are also less susceptible to spamming attempts.
In recent years, the general web directories compared to the search machines have fallen significantly behind. Sole directories only rarely exist, they are usually offered in conjunction with an algorithmic search engine. But even with the search engines, the directories are now less prominently placed; Perhaps the clearest example is Yahoo, whose original offer consisted of only one directory. Meanwhile, the directory is only a little prominently placed among many other offers.
That directory hits are currently suitable for a high-quality search in algorithmic Suchmaschi¬nen will be shown in this article. The biggest obstacle to the use of the directory hits is their previously poor integration into the hit lists. This does not fully exploit the great value that could be derived from these intellectually selected information resources.
Classically, search engines and web directories fulfill different search paradigms. To clarify briefly here again the paradigms of the web search for Dennis, Bruza u. McArthur (2002). These are
1. unsupported keyword search (unassisted keyword search)
2. The assisted keyword search, whereby the support mainly comes from automatically generated suggestions to restrict the search.
3. the directory-based search (directory-based search)
4. finding similar documents (query-by-example)
Search engines support point 1, partly also point 2 and point 4. Point 3 concerns the web directories. Following the description of the previous approaches of the combination of search engine and directory will go to the question of how the directory-based search can be primarily combined with the simple keyword search.
2 Development of the web using search engines and directories
The main distinguishing feature between web directories and search engines is that the former are created by humans, i. that editors provide for the selection and development of suitable sites. Therefore, compared to the search engines, only a relatively small number of sites can be detected. While search engines have built indexes up to about eight billion documents2, the largest web directory indicates
to have accessed over four million websites3. At this point, however, it is important to distinguish between the indexing of web pages, as happens in search engines, and the indexing of web sites, as is done by web directories. A single site can consist of thousands of pages; The number of documents accessed in search engines and directories can therefore only be compared to a limited extent. Individual documents are usually not indexed in directories.