Monthly Archives: October 2008

Discovering Search Terms

More trawling through old code I had written brought this one to the surface. One of the requirements of the system I’m working on was to intercept a 404 (Page Not Found) response and determine if the referrer was a search engine (e.g. google) to redirect to a search page with the search term. Intercepting the 404 was quite easily done with a Http Module…

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Web;

namespace DemoApplication
{
    public class SearchEngineRedirectModule : IHttpModule
    {
        HttpApplication _context;

        public void Dispose()
        {
            if (_context != null)
                _context.EndRequest -= new EventHandler(_context_EndRequest);
        }

        public void Init(HttpApplication context)
        {
            _context = context;
            _context.EndRequest += new EventHandler(_context_EndRequest);
        }

        void _context_EndRequest(object sender, EventArgs e)
        {
            string searchTerm = null;
            if (HttpContext.Current.Response.StatusCode == 404
                && (searchTerm = DiscoverSearchTerm(HttpContext.Current.Request.UrlReferrer)) == null)
            {
                HttpContext.Current.Response.Redirect("~/Search.aspx?q=" + searchTerm);
            }
        }

        public string DiscoverSearchTerm(Uri url)
        {
            …
        }
    }
}

Implementing DiscoverSearchTerm isn’t that difficult either. We just have to analyse search engine statistics to see which ones are most popular and analyse the URL produced when performing a search. Luckily for us, most are quite similar in that they use a very simple format that has the search term as a parameter in the query string. The search engines I analysed included live, msn, yahoo, aol, google and ask. The search term parameter of these engines was either named “p”, “q” or “query”.

Now, all we have to do is filter for all the requests that came from a search engine, find the search term parameter and return its value…

public string DiscoverSearchTerm(Uri url)
{
    string searchTerm = null;
    var engine = new Regex(@"(search.(live|msn|yahoo|aol).com)|(google.(com|ca|de|(co.(nz|uk))))|(ask.com)");
    if (url != null && engine.IsMatch(url.Host))
    {
        var queryString = url.Query;
        // Remove the question mark from the front and add an ampersand to the end for pattern matching.
        if (queryString.StartsWith("?")) queryString = queryString.Substring(1);
        if (!queryString.EndsWith("&")) queryString += "&";
        var queryValues = new Dictionary<string, string>();
        var r = new Regex(
        @"(?<name>[^=&]+)=(?<value>[^&]+)&",
        RegexOptions.IgnoreCase | RegexOptions.Compiled
        );
        string[] queryParams = { "q", "p", "query" };
        foreach (var match in r.Matches(queryString))
        {
            var param = ((Match)match).Result("${name}");
            if (queryParams.Contains(param))
                queryValues.Add(
                ((Match)match).Result("${name}"),
                ((Match)match).Result("${value}")
                );
        }
        if (queryValues.Count > 0)
            searchTerm = queryValues.Values.First();
    }
    return searchTerm;
}

The above code uses two regular expressions, one to filter for a search engine and the other to separate the query string. Once it’s decided that the URL is a search engine’s, it creates a collection of query string parameters that could be search parameters and returns the first one.

Unfortunately, there wasn’t enough time in the iteration for me to properly match the search engine with the correct query parameter, but as is most commonly the parameter comes into the query string quite early so it’s fairly safe to assume that the first match is correct.

Randomly Sorting a List using Extension Methods

I was trawling through some old code I had written while doing some “refactoring” and came across this little nugget. I wanted to sort a list of objects that I was retrieving from a database using LINQ to SQL into a random order. Seeing as extension methods are all the rage, I decided to use them…

public static class ListExtensions { 
  public static IEnumerable<T> Randomise<T>(this IEnumerable<T> list) { 
    Random rand = new Random();
    var result = list.OrderBy(l => rand.Next());
    return result; 
  } 
}

How does it work…? It adds the Randomise() extension method to the end of any IEnumerable<T> (e.g. List<T>) and uses the OrderBy function to change the sort order based on a randomly generated number.

var randomCategories = context.Categories.Randomise();

The above code will execute the Randomise function to reorder the list of Category objects retrieved from the context randomly and assign the result to randomCategories.

Follow

Get every new post delivered to your Inbox.