Sign in

E-mail *, (xx@domain.com)
Password *

Register | Forgot password

Search quest [3/3] - improvements

October 17, 2008

This is the third and last part in a small series of blog posts about search engines. In the first post I wrote about types of search engines and their requirements. Relevance was the main topic of my second post. In this third post I’d like to share some hands-on tips and talk about implementing a search engine improvement process.

But first

I am very glad to announce that the entire GX WebManager search engine documentation has been fully rewritten. From installation, to every day usage, to implementing improvements, it’s all there. Additional chapters have been added with best practices, how-to’s and a troubleshooting guide. You can download the new documentation here:http://www.gxdeveloperweb.com/Software/Documentation.htm. I hope this shows that we are very serious about the future of the GX WebManager search engine and that we will continue improving it.

Besides updating the documentation we also spent time improving the configuration of the search engine. We also added an advanced search element that contains a lot of useful filters. Especially for developers this will definitely save some hours re-inventing the wheel. The new advanced search element

The following paragraphs should not contain any new information for people who have read the new search documentation. It is basically a summary of the chapter “Improving the search results”.

Measuring

When the question arises “How can we improve our search engine/search results?” then the number one activity to start is: measuring. Get to know your visitors. Know what they are looking for. Know how they search for information.

There are several ways to learn about their search behavior and the queries people use. The easiest way is to use a web analytics tool and use a filter to search for the search engine URL. The search engine URL always contains “&keyword=”, so it’s relatively easy to filter out these URLs. The search engine documentation contains steps to do this for Google Analytics. Besides heavy empirical statistical analysis there is always one better option: talk to your customers. They usually receive visitor complaints so they should be well aware of errors and improvements.

Besides looking at the behavior on your website you should also be aware of the information that is indexed by the search engine. Take some time to investigate things like:

  • Which information is indexed in which fields in the search index? Remember: Garbage in = Garbage out.
  • What is the ratio between HTML documents and PDF/Word/other documents.
  • What is the average size of the documents? (Large documents tend to lower the relevance)

Analysis

Once you have gathered enough information from your web analytics tools, customers and index, it’s time to sit down with a group of content owners. These could be editors, administrators or other domain experts who are very familiar with the information on your website. A very useful exercise is looking at the top-20 query terms and asking the domain experts which documents/urls/pages should be returned for each one. Comparing these documents with the documents that are actually returned hopefully leads to conclusions such as: certain keywords are not getting enough weight, some pages are not found at all, some pages need a higher relevance for certain keywords etc. The point of this exercise is really to analyze what goes wrong. The how to solve this is of later concern.

Besides analyzing the search results you can ask the domain experts fundamental questions such as:

  • Is the relevance of the documents more important, or is the recency of documents more important?
  • Do visitors expect a) answers b) links or c) direct information? A, B or C can lead to a totally different search approach. If A is the answer than it might be smart to use other methods than a text search engine, for example using natural language search engine, or more question driven technology such as Q-Go (http://www.q-go.com)
  • Do we really need to index our 100.000 document database? Are people actually looking for information in this database?

Improve

After carefully analyzing what goes wrong you have probably come up with some improvements yourself. Without going to much into detail here, the most common improvements are:

  • Improve the way information is indexed. Omit irrelevant information and optimize relevant information.
  • Tune the keyword factors (the factors that determine which fields are most relevant)
  • Offer advanced filtering options on the search page (searching in parts of the site, searching for certain document types)
  • Provide search tips and examples queries (based on your top-20 queries of course)
  • Remove irrelevant pages and documents from the index. This will increase the relevance of other documents
  • Implement a ‘best bets’ search. This is still a proven way to add handpicked search results on top of the normal search results. Because the search results are handpicked they are most certainly relevant.
  • Implement a taxonomy search: if you have a website with a lot of tags (or ‘terms’), then adding a taxonomy filter or search option helps to get to relevant and related information too.
A best bets search

Is that all?

No, I’m afraid this is only a small summary of best practices. There is a lot more information about information retrieval and search engines available in books and on the net. But this should keep you occupied for a while and help you take the first steps towards a better search experience.

I do realize that this is not exactly a set of ‘quick wins’, but that it takes anywhere from several days to several weeks to structurally improve the search engine. For a lot of organizations this is still worth the effort because of all the time it saves when visitors don’t call your employees. And let’s not forget that your website can be really fancy looking, but when people can’t find stuff then there won’t be any conversion on your website.

About the Author

Return to all blogs


Martin van Mierloo is Product Manager and has many years of experience with GX WebManager. Martin writes about the GX WebManager roadmap, new product features and WCMS related topics..
Read all Martins blog entries

Other blog entries:

May 29, 2009
Watch content!
May 12, 2009
Traffic and Conversion
April 17, 2009
The new Community Forum in 980
April 2, 2009
10 Years Cluetrain Manifesto
March 18, 2009
The CMS Vendor Meme
March 3, 2009
jQuery and GX WebManager
December 24, 2008
The year has almost ended...
October 22, 2008
New certification process
October 17, 2008
New in 9.6: Import/export
September 17, 2008
Using Google Custom Search on your site


Share:

del.icio.us
digg
Technorati
Slashdot
Reddit
YahooMyWeb
NewsVine
ekudos
© 2009 GX creative online development B.V.

Disclaimer

This website (GXdeveloperweb.com) may discuss or contain opinions, (sample) coding, software or other information that does not include GX official interfaces, instructions or guidelines and therefore is not supported by GX. Changes made based on this information are not supported.  GX will not be held liable for any damages caused by using or misusing the information, software, instructions, code or methods suggested on this website, and anyone using these methods does so at his/her own risk. GX offers no guarantees and assumes no responsibility or liability of any type with respect to the content of this website, including any liability resulting from incompatibility between the content of this website and the materials and services offered by GX. By using this website you will not hold, or seek to hold, GX responsible or liable with respect to the content of this website.