giovedì 24 giugno 2010

How to suggest alternate spellings/1 [ENG]

How to suggest a list of alternate spellings to the user when s/he is typing something?

Since you have a dictionary - a set of "allowed" strings for a field - you might first want to evaluate which of these strings are similar to the user input.

There are several strategies to accomplish that.

Here is a very naive and plain implementation of Levenshtein Distance:

        // Strings sanitizing
        if (s==null) s="";
        if (t==null) t="";
        s = s.toUpperCase();
        t = t.toUpperCase();

        // d is a table with m+1 rows and n+1 columns
        int m = s.length();
        int n = t.length();
        int[][] d = new int[m+1][n+1];

        for (int i = 1; i <= m; i++)
        {
            d[i][0] = i; // deletion
        }
        
        for (int j = 1; j <= n; j++)
        {
            d[0][j] = j; // insertion
        }
        
        for (int j = 1; j <= n; j++)
        {
            for (int i = 1; i <= m; i++)
            {
                if (s.charAt(i-1) == t.charAt(j-1))
                {
                    d[i][j] = d[i - 1][j - 1];
                }
                else
                {
                    d[i][j] = minimum(d[i - 1][j] + 1, // deletion
                            d[i][j - 1] + 1, // insertion
                            d[i - 1][j - 1] + 1 // substitution
                    );
                }
            }
        }

        return d[m][n];
    }

    public int minimum(int i, int j, int k)
    {
        if ((i < j) && (i < k))
            return i;
        if ((j < i) && (j < k))
            return j;
        return k;
    }

And here are some tests results:

PAMPLONA - PIMPLONA = 1
PAMPLONA - PIMPONA = 2
PAMPLONA - pippona = 3
PAMPLONA - RAMbONA = 3
PAMPLONA - RusTIcheLLA = 10

EDIT: there was a bug in the algorhytm, corrected

Nessun commento: