In this chapter:
An explanation of how search software works
■ The differences between ranking, relevance, precision and recall
■ The value of a taxonomy
■ How to ensure that document confidentiality is not compromised by a search engine
Introduction
In order to evaluate and implement a search application it is important to understand the basic elements of search technology, which form the basis of both search software and search application products. There are a number of distinct elements in the process, from capturing content to presenting it within a browser on demand:
■ content definition
■ indexing
■ query management
■ ranking of results
■ results formatting
■ document access.
The challenge in selecting a search application is that every vendor carries out these processes in a slightly different way. Understanding the similarities and differences between the products on the market is important in ensuring that user requirements are fully accommodated. A good understanding of search technology is also important when it comes to implementing and ‘tuning’ a search engine. There will be a continuing need to look at the types of search being carried out, and the results they obtain, and then changing various elements of the search process (especially the way in which ranking is carried out) to ensure that users obtain the maximum number of relevant results.
Content definition
The initial task is to define the content that needs to be searched, and hence indexed. For smaller applications the index is generated on the server that also contains the content to be indexed, a common situation with website searching. The search query is then run against this index. However, the index to a set of documents may typically be between 15 and 50% of the file size of the content collection and querying the index also requires a substantial amount of processing.
As a result the indexes and query management are accomplished on a separate server. A number of options are then available.
To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.