Posts Tagged ‘Lua’

A simple SQL/Flatfile abstraction in Lua

Posted in Development on July 7th, 2010 by Nayruden – 2 Comments

I’ve been wondering if it was possible to make a MySQL/SQLite/Flatfile abstracted interface for Lua for some time now. At first I thought I’d try something like LINQ, but realized that there was really no reasonable way to do that in Lua since we don’t have the power of expression trees. I then considered writing the queries in a simplified pseudo-Lua language, but that would take too much time to parse and no one really wants to learn another language anyways.

What I settled on instead was a very simplified abstraction of the data access. It can’t do everything you can do yourself from SQL or raw file I/O, but I found that it serves up exactly the kind of interface I’d need for all my past projects in Lua that use some form of I/O.

Here’s an example of declaring a table of data:

users = CreateDataTable( "users", "steamid", "string(32)", "The steamid of the user" )
users:AddKey( "group", "string(16)", "The group the user belongs to" )
users:AddKey( "name", "string(32)", "The name the player was last seen with" )
users:AddListOfKeyValues( "allow", "string(16)", "string(128)", "The allows for the user" )

We’re defining a primary key ‘steamid’ for each of the rows in the table ‘users’. We’re saying that steamid is going to be a string of max length 32 characters, and we define a comment that’s used in MySQL and Flatfiles. We then go on to add regular keys ‘group’ and ‘name’ to the table in a similar fashion, and you should note that regular keys are optional in the table. Finally, we’re making an additional key-value table named ‘allow’ (A key-value table basically means a regular, unrestricted Lua table). So, to make sure you’ve got the idea, the Lua table structure of would look like this:

users = {
    my_steamid1 = {
        group = "admin",
        allow = {
            slap = "*",
            kick = "*"
        }
    },
    my_steamid2 = {
        name = "Bob",
        allow = {}
    }
}

The API around such a clean representation of data couldn’t be much simpler. You have four operations: insert a new row by primary key, fetch an existing row by primary key, delete a row by primary key, and get the entire table. When inserting a row you can optionally pass in the data for the row. With both the insert and fetch functions you get back a table for the row that’s being “tracked”. When you change any of the contents, the change is immediately reflected into the DB or file.

There’s a caching system built around the system so if you fetch the same row multiple times, it won’t be going out to the DB or reading the file each time. You can request the cache be flushed or disabled altogether if you need to. Unlike file I/O I’ve done in the past, this system doesn’t need to parse the whole file to get a single row, which means it should probably work okay using flatfiles on very large files.

Though I haven’t coded this portion yet, I’m also planning on adding the ability to convert between MySQL/SQLite/flatfiles on the fly. This may be problematic for very large databases, so I’ve also taken care to make sure that this system can be run as a standalone script apart from any specific application.

So… if you’ve read this far, your next question is probably, “Why bother with such a system? Why not just stick with a single format?”. A single format (usually flatfiles) is great for about 95% of my users. The other 5%, however, are the power users who want to do crazy things like hook up a PHP billboard with blinking lights showing how many times a second someone says the word ‘the’. It’s the power users I prefer to cater to, so I’ve always felt guilty in the past that this was one area I couldn’t do much for them in. But now I can!

Damerau–Levenshtein Distance, Lua Implementation

Posted in Development on June 6th, 2010 by Nayruden – Be the first to comment

I stumbled across Levenshtein distance today and had to try my hand at writing an implementation in Lua. I choose the slightly more complex Damerau–Levenshtein distance, and I think it turned out pretty well.

Some notes of interest:

  • Complexity is O( (#t+1) * (#s+1) ) when lim isn’t specified.
  • This function can be used to compare array-like tables as easily as strings.
  • This function is case sensitive when comparing strings.
  • Using this function to compare against a dictionary of 250,000 words took about 0.6 seconds on my machine for the word “Teusday”, around 10 seconds for very poorly spelled words. Both tests used lim.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
--[[
    Function: EditDistance

    Finds the edit distance between two strings or tables. Edit distance is the minimum number of
    edits needed to transform one string or table into the other.
   
    Parameters:
   
        s - A *string* or *table*.
        t - Another *string* or *table* to compare against s.
        lim - An *optional number* to limit the function to a maximum edit distance. If specified
            and the function detects that the edit distance is going to be larger than limit, limit
            is returned immediately.
           
    Returns:
   
        A *number* specifying the minimum edits it takes to transform s into t or vice versa. Will
            not return a higher number than lim, if specified.
           
    Example:

        :EditDistance( "Tuesday", "Teusday" ) -- One transposition.
        :EditDistance( "kitten", "sitting" ) -- Two substitutions and a deletion.

        returns...

        :1
        :3
           
    Notes:
   
        * Complexity is O( (#t+1) * (#s+1) ) when lim isn't specified.
        * This function can be used to compare array-like tables as easily as strings.
        * The algorithm used is Damerau–Levenshtein distance, which calculates edit distance based
            off number of subsitutions, additions, deletions, and transpositions.
        * Source code for this function is based off the Wikipedia article for the algorithm
            <http://en.wikipedia.org/w/index.php?title=Damerau%E2%80%93Levenshtein_distance&oldid=351641537>.
        * This function is case sensitive when comparing strings.
        * If this function is being used several times a second, you should be taking advantage of
            the lim parameter.
        * Using this function to compare against a dictionary of 250,000 words took about 0.6
            seconds on my machine for the word "Teusday", around 10 seconds for very poorly
            spelled words. Both tests used lim.
           
    Revisions:

        v1.00 - Initial.
]]

function EditDistance( s, t, lim )
    local s_len, t_len = #s, #t -- Calculate the sizes of the strings or arrays
    if lim and math.abs( s_len - t_len ) >= lim then -- If sizes differ by lim, we can stop here
        return lim
    end
   
    -- Convert string arguments to arrays of ints (ASCII values)
    if type( s ) == "string" then
        s = { string.byte( s, 1, s_len ) }
    end
   
    if type( t ) == "string" then
        t = { string.byte( t, 1, t_len ) }
    end
   
    local min = math.min -- Localize for performance
    local num_columns = t_len + 1 -- We use this a lot
   
    local d = {} -- (s_len+1) * (t_len+1) is going to be the size of this array
    -- This is technically a 2D array, but we're treating it as 1D. Remember that 2D access in the
    -- form my_2d_array[ i, j ] can be converted to my_1d_array[ i * num_columns + j ], where
    -- num_columns is the number of columns you had in the 2D array assuming row-major order and
    -- that row and column indices start at 0 (we're starting at 0).
   
    for i=0, s_len do
        d[ i * num_columns ] = i -- Initialize cost of deletion
    end
    for j=0, t_len do
        d[ j ] = j -- Initialize cost of insertion
    end
   
    for i=1, s_len do
        local i_pos = i * num_columns
        local best = lim -- Check to make sure something in this row will be below the limit
        for j=1, t_len do
            local add_cost = (s[ i ] ~= t[ j ] and 1 or 0)
            local val = min(
                d[ i_pos - num_columns + j ] + 1,                               -- Cost of deletion
                d[ i_pos + j - 1 ] + 1,                                         -- Cost of insertion
                d[ i_pos - num_columns + j - 1 ] + add_cost                     -- Cost of substitution, it might not cost anything if it's the same
            )
            d[ i_pos + j ] = val
           
            -- Is this eligible for tranposition?
            if i > 1 and j > 1 and s[ i ] == t[ j - 1 ] and s[ i - 1 ] == t[ j ] then
                d[ i_pos + j ] = min(
                    val,                                                        -- Current cost
                    d[ i_pos - num_columns - num_columns + j - 2 ] + add_cost   -- Cost of transposition
                )
            end
           
            if lim and val < best then
                best = val
            end
        end
       
        if lim and best >= lim then
            return lim
        end
    end
   
    return d[ #d ]
end

Gist of the same source code.


Nayruden's /dev/random is Digg proof thanks to caching by WP Super Cache