Sign in

E-mail *, (xx@domain.com)
Password *

Register | Forgot password

Blogs

  • Bram de Kruijff
  • Ivo Ladage
  • Mark van Cuijk
  • Martin van Mierloo
  • Martijn van Berkum
  • Michel Teunissen
  • Patrick Atoon

Recent blogs

RSS - Blogs
March 9, 2010
State of OSGi in the Java world
March 4, 2010
Reach more people with Google Translate
March 3, 2010
Get My Advice
February 26, 2010
What? Where!?!
February 11, 2010
Split it!

All Blogs...


Search quest [1/3]

March 13, 2008

In my daily job as a product manager I receive a lot of feedback from customers, partners and our own solution units. Occasionally people take the time to drop me a message when things went extremely smooth, but hey, let’s face it: in most cases people contact me when they have something on their mind that involves improvements in our product GX WebManager. These suggestions range from correcting typos in the manuals to moving to the .NET platform (never in a million years).

Some suggestions can be characterized as a more thematic problem. One of the frequent problems that lands on my desk is the problem – and I quote – “your search engine doesn’t work.”. Notice the dot after ‘work’, because usually that’s all the information I, or our customer service department receive. We usually respond with questions like ‘What did you do? What keywords did you use? What did you expect?’ etc. After someone checks if the search engine is running at all, this is the point where you could end up in discussions about which information is relevant for a user of the search engine and which information is not. This is as customer intimate as it gets, because to answer questions about relevancy we have to try to become our customer’s customer.

This article is not about tuning your site for Google but about the search engines we use on our website and intranet. This series of three blog posts is about search engines influences (part 1), about looking at relevancy (part 2) and about making improvements to your search engine to improve your customer and your customer’s customer satisfaction. Of course as a dedicated GX employee I will promote our own search engine (which I really believe fits the larger part of our install base), but I won’t ignore other types and brands of search engines.

Types of search engines

Search engines can be classified in a number of ways. By retrieval algorithm for example, where you have 3 main types: ‘Boolean’, ‘Vector’ and ‘Probabilistic’ algorithms. Or by indexing algorithm, where you have several types like ‘directory’, ’social’, ‘federated’, ‘semantic’  search engines, which can also be combined.  Another way to classify them is by looking at their application. By doing this we could end up talking about ‘web’, ’enterprise’, ‘directory’, ’desktop’ search engines, and so on. But here also counts: these categories can also overlap and be combined. A public web search engine like Google can also be used as an enterprise search engine with Google custom search or Google Mini search appliance for example.

All in all choosing the right type (or category) of search engine won’t automatically bring us the best search engine for our application, because there are so many combinations and so much overlap. It is estimated that there are 300.000 web search engines and each has its own combination of search and indexing algorithms. And besides ‘Web’ search engines there is so much more…basically anything with a search box has a search engine behind it and could have a different search approach. It makes more sense to begin with looking at the requirements of a search engine for your company.

Search engine requirements

In order to come to a set of requirements for your specific environment there are several questions you have to ask yourself in order to choose the right search engine. Important questions include:

  • Which data sources do you want to include? Are 99% web pages, or are there many file types? Are there are also structured data sources (CRM, databases), or external data sources with special or unknown formats?
  • What is the ratio between structured (=information from relational databases) and unstructured data (=everything else: html, docs, video, email etc.)? Research shows that 80% of the information on our websites and intranets is unstructured information. How is this for your internet/intranet/external sources, and how does it affect your searches?
    Example 1: when 80% of the pages on your website comes from a Product Information System, you are most likely best of with a boolean type search engine that can be tailored for your product meta data.
    Example 2: when 80% of your intranet consists of research papers in PDF format you might be better off with a concept based search engine
  • What are the common types of queries your visitors/employees use? What is the ratio of 1-word queries, 2-word queries, x-word queries, use of natural language, use of advanced meta search options etc. Try to think as your visitors/employees and imagine what information they will be looking for, and how they are probably going to search for it. Tip: write down several use cases and try to recreate them with one or more search engines.
  • What kind of users use the search engine? See the next paragraph for more details.
  • Are there authorization mechanisms required? Especially in intranet or extranet situations roles and authorizations play an important role that could rule out certain types of search engines.
  • Do you want to be able to influence the ranking algorithm? Some search engines are very open in terms of being able to change configurations and settings, but the larger part is built around a specific ranking algorithm. Consider whether it is likely the algorithm needs to be adjusted now or in the future, and to which extent configuration settings can influence the algorithm.  

Besides these questions there are of course questions about costs (TCO,make/buy), management, integration, licenses etc, but for now I’ll leave these out of scope.

Search engine users

In the first paragraph I mentioned ‘our customer’s customers’, or in other words: the actual users of the search engine. It's important to carefully look at the users of a search engine, because they have some interesting features. The largest part of search engine users consists of so called ‘casual users’. These are users that use the search engine maybe several times per year, maybe for the first time and hopefully not for the last time. These users basically want to enter one or two search terms, hit the search button and find their result in the top 3 returned results. The exact opposite are the heavy search users. These are experienced search users that use the search engine several times per week or more, maybe for research purposes or for finding information in their most used application (CRM, bug tracking system, product database).

Besides the frequency of use there are two other main differences:

  1. Casual users are always in a hurry, more than their heavy counterpart. This sounds funny – isn’t everybody always in a hurry – but tests prove that casual users spend far less time in one search session than heavy users. They expect it to work right away, or they’re gone. We must not forget that users are used to search engines like Google. Most people expect every search engine to work as fast and efficient as Google, like it or not.
  2. Casual users are far less efficient. They are usually not familiar with the search engine interface, its options, the indexed data, tips to improve the search results etc. This could also be a matter of language, vocabulary, education or experience. The result is that they use irrelevant keywords, make mistakes and don’t use extra features. Just as in point 1 the end-result is that users are scared away after one or two tries. And they say “your search engine doesn’t work.”

So what can we do to avoid this? First, we’ll start with the open door: make it simple. Provide a simple interface for the casual users with tips about how to improve the search results. Provide examples of well working search keywords (for example product numbers). Provide other ways to find information by pointing to sitemaps or product homepages – if searching doesn’t work, maybe navigating will. If most of your users are casual users then hide advanced search options behind a mouse click.

Secondly, adjust your search algorithm to the behavior of your average user. If people mainly want product results, or download drivers or get customer support then make sure your return those results first. We'll dive deeper into this subject in part 2 and 3 of this series.

Bottom line is to think about your users and their behavior. It matters if you have only scientific researcher using your search engine or potentially the whole world population. When you have the time and budget it will certainly be rewarding to do desk research or usability tests, maybe as part of your website acceptance.

For the next time

For now these questions should leave you with some thoughts about your own search strategy. As this blog is certainly not intended as one-way propaganda, I would like to invite you – my dear readers – to share your experiences with your current search, whether it’s the GX WebManger search engine or any other search engine. Does it work as expected? What should be improved? Etc.

As a matter of fact we are currently making arrangements for several improvements to our search engine. We already contacted several customers to start a dialog about possible improvements. If you feel you like to join the conversation then you can always contact me at martinvm -@- gx.nl or give me a call.


About the Author

Return to all blogs


Martin van Mierloo is Product Manager and has many years of experience with GX WebManager. Martin writes about the GX WebManager roadmap, new product features and WCMS related topics..
Read all Martins blog entries

Other blog entries:

March 4, 2010
Reach more people with Google Translate
July 20, 2009
How to benefit from the improved inline mode
April 17, 2009
The new Community Forum in 980
April 2, 2009
10 Years Cluetrain Manifesto
March 18, 2009
The CMS Vendor Meme
March 3, 2009
jQuery and GX WebManager
December 24, 2008
The year has almost ended...
October 17, 2008
Search quest [3/3] - improvements
September 17, 2008
Using Google Custom Search on your site
July 16, 2008
New in WebManager 9.5 part 2: Personalization API


Share:

del.icio.us
digg
Technorati
Slashdot
Reddit
YahooMyWeb
NewsVine
ekudos
© 2010 GX creative online development B.V.

Disclaimer

This website (GXdeveloperweb.com) may discuss or contain opinions, (sample) coding, software or other information that does not include GX official interfaces, instructions or guidelines and therefore is not supported by GX. Changes made based on this information are not supported.  GX will not be held liable for any damages caused by using or misusing the information, software, instructions, code or methods suggested on this website, and anyone using these methods does so at his/her own risk. GX offers no guarantees and assumes no responsibility or liability of any type with respect to the content of this website, including any liability resulting from incompatibility between the content of this website and the materials and services offered by GX. By using this website you will not hold, or seek to hold, GX responsible or liable with respect to the content of this website.