Category Archives: Development

Lending Club statistical significance for loan categories

I’ve been investing with Lending Club lately. Lending Club is a form of P2P lending. In short, you’re lending more directly to the borrowers requesting loans. Since there’s less overhead, the investor gets a higher interest rate (with some increased risk) and the borrower gets a lower interest rate on his/her loan.

I’ve spent a lot of time going over the data provided by Lending Club. I’m fascinated to see what kinds of interesting information I can get out of the raw data. For example, there is a more than 95% probability that a loan that has repaid more than 65% of it’s principal will repay fully. In other words, you really don’t have to worry nearly as much once the loan is past 65% repayment.

Tonight I wanted to find out what loan information was statistically significant in regards to whether or not the loan would default. See below for the results, and keep reading if you’re interested in the technical details:

Data Statistically significant in regards to repayment? Confidence
Inquiries in the past 6 months YES >99.99%
Sate borrower lives in YES >99.99%
Credit Grade (A1, A2, etc) YES >99.99%
Loan Length NO N/A
Loan Purpose YES >99.99%
Home Ownership (Own, Rent, Mortgage) YES >99%
FICO Score YES >99.99%
Open Credit Lines NO N/A
Employment Length NO N/A

All of these are pretty much what I’d expect with the exception of the last two. I was avoiding borrowers with a lot of open credit lines or who hadn’t been employed very long. It’s good to see that this prejudice was unjustified.

Confidence factor can be a little confusing. For example, the confidence factor for “Loan Purpose” means that there is less than 0.01% chance that the differences between the observed and expected values of loan repayment for the loan purpose were caused by random chance. That’s why we are more than 99.99% confident that there must be some underlying reason other than chance that the data differed. This does not include any notion of how or why the loan purpose matters to loan repayment, only that it does.

These values were calculated using a Chi-square test. I took all the loans that were either fully paid, defaulted, or charged off. I further broke the loans down into two results: loss, which included all loans that had repaid less than 94% of the loan’s principal, and gain, which included all loans with more than 94% repayment of principal.

I only took categories that had more than 300 loans in the set. With smaller numbers you risk having your results greatly impacted by random chance. For example, only seven of the thirty-five credit grades met this criteria (A4, A5, B2, B3, B4, B5, and C1) and only four states (CA, FL, NY, and TX). Since we’re only interested in knowing whether different credit grades or states impact the likelihood of repayment, this restriction is fine.

If you’d like to see the expected vs observed tables for these results, you can grab here: Observed vs Expected Tables.

Hopefully I’ll find time to talk about future findings!

A lovely remake of Spacewar!

For those who aren’t familiar, Spacewar! is the second video game ever made. It took the original programmer 200 hours to create on a DEC PDP-1. Obviously technology has advanced a lot since when it was first created, so I thought it would be the perfect game to develop for ACM.

Over the course of two sessions at one hour each session, I was able to create a viable remake of Spacewar! using Lua and Love. The controls are the arrow keys for ship 1 (red) and wasd for ship 2 (blue). Running into the sun kills you, and each time you die (from anything) you have two seconds of immunity from dying. This immunity prevents spawn camping. I’d like to thank zerohdog on flickr for the ship image and fastcall for the sun image (also listed in credits in the game).


Download! (Needs love2d to run)

A simple SQL/Flatfile abstraction in Lua

I’ve been wondering if it was possible to make a MySQL/SQLite/Flatfile abstracted interface for Lua for some time now. At first I thought I’d try something like LINQ, but realized that there was really no reasonable way to do that in Lua since we don’t have the power of expression trees. I then considered writing the queries in a simplified pseudo-Lua language, but that would take too much time to parse and no one really wants to learn another language anyways.

What I settled on instead was a very simplified abstraction of the data access. It can’t do everything you can do yourself from SQL or raw file I/O, but I found that it serves up exactly the kind of interface I’d need for all my past projects in Lua that use some form of I/O.

Here’s an example of declaring a table of data:
[cc_lua]users = CreateDataTable( “users”, “steamid”, “string(32)”, “The steamid of the user” )
users:AddKey( “group”, “string(16)”, “The group the user belongs to” )
users:AddKey( “name”, “string(32)”, “The name the player was last seen with” )
users:AddListOfKeyValues( “allow”, “string(16)”, “string(128)”, “The allows for the user” )[/cc_lua]

We’re defining a primary key ‘steamid’ for each of the rows in the table ‘users’. We’re saying that steamid is going to be a string of max length 32 characters, and we define a comment that’s used in MySQL and Flatfiles. We then go on to add regular keys ‘group’ and ‘name’ to the table in a similar fashion, and you should note that regular keys are optional in the table. Finally, we’re making an additional key-value table named ‘allow’ (A key-value table basically means a regular, unrestricted Lua table). So, to make sure you’ve got the idea, the Lua table structure of would look like this:

[cc_lua]users = {
my_steamid1 = {
group = “admin”,
allow = {
slap = “*”,
kick = “*”
}
},
my_steamid2 = {
name = “Bob”,
allow = {}
}
}[/cc_lua]

The API around such a clean representation of data couldn’t be much simpler. You have four operations: insert a new row by primary key, fetch an existing row by primary key, delete a row by primary key, and get the entire table. When inserting a row you can optionally pass in the data for the row. With both the insert and fetch functions you get back a table for the row that’s being “tracked”. When you change any of the contents, the change is immediately reflected into the DB or file.

There’s a caching system built around the system so if you fetch the same row multiple times, it won’t be going out to the DB or reading the file each time. You can request the cache be flushed or disabled altogether if you need to. Unlike file I/O I’ve done in the past, this system doesn’t need to parse the whole file to get a single row, which means it should probably work okay using flatfiles on very large files.

Though I haven’t coded this portion yet, I’m also planning on adding the ability to convert between MySQL/SQLite/flatfiles on the fly. This may be problematic for very large databases, so I’ve also taken care to make sure that this system can be run as a standalone script apart from any specific application.

So… if you’ve read this far, your next question is probably, “Why bother with such a system? Why not just stick with a single format?”. A single format (usually flatfiles) is great for about 95% of my users. The other 5%, however, are the power users who want to do crazy things like hook up a PHP billboard with blinking lights showing how many times a second someone says the word ‘the’. It’s the power users I prefer to cater to, so I’ve always felt guilty in the past that this was one area I couldn’t do much for them in. But now I can!

Damerau–Levenshtein Distance, Lua Implementation

I stumbled across Levenshtein distance today and had to try my hand at writing an implementation in Lua. I choose the slightly more complex Damerau–Levenshtein distance, and I think it turned out pretty well.

Some notes of interest:

  • Complexity is O( (#t+1) * (#s+1) ) when lim isn’t specified.
  • This function can be used to compare array-like tables as easily as strings.
  • This function is case sensitive when comparing strings.
  • Using this function to compare against a dictionary of 250,000 words took about 0.6 seconds on my machine for the word “Teusday”, around 10 seconds for very poorly spelled words. Both tests used lim.

[ccn_lua]–[[
Function: EditDistance

Finds the edit distance between two strings or tables. Edit distance is the minimum number of
edits needed to transform one string or table into the other.

Parameters:

s – A *string* or *table*.
t – Another *string* or *table* to compare against s.
lim – An *optional number* to limit the function to a maximum edit distance. If specified
and the function detects that the edit distance is going to be larger than limit, limit
is returned immediately.

Returns:

A *number* specifying the minimum edits it takes to transform s into t or vice versa. Will
not return a higher number than lim, if specified.

Example:

:EditDistance( “Tuesday”, “Teusday” ) — One transposition.
:EditDistance( “kitten”, “sitting” ) — Two substitutions and a deletion.

returns…

:1
:3

Notes:

* Complexity is O( (#t+1) * (#s+1) ) when lim isn’t specified.
* This function can be used to compare array-like tables as easily as strings.
* The algorithm used is Damerau–Levenshtein distance, which calculates edit distance based
off number of subsitutions, additions, deletions, and transpositions.
* Source code for this function is based off the Wikipedia article for the algorithm
.
* This function is case sensitive when comparing strings.
* If this function is being used several times a second, you should be taking advantage of
the lim parameter.
* Using this function to compare against a dictionary of 250,000 words took about 0.6
seconds on my machine for the word “Teusday”, around 10 seconds for very poorly
spelled words. Both tests used lim.

Revisions:

v1.00 – Initial.
]]
function EditDistance( s, t, lim )
local s_len, t_len = #s, #t — Calculate the sizes of the strings or arrays
if lim and math.abs( s_len – t_len ) >= lim then — If sizes differ by lim, we can stop here
return lim
end

— Convert string arguments to arrays of ints (ASCII values)
if type( s ) == “string” then
s = { string.byte( s, 1, s_len ) }
end

if type( t ) == “string” then
t = { string.byte( t, 1, t_len ) }
end

local min = math.min — Localize for performance
local num_columns = t_len + 1 — We use this a lot

local d = {} — (s_len+1) * (t_len+1) is going to be the size of this array
— This is technically a 2D array, but we’re treating it as 1D. Remember that 2D access in the
— form my_2d_array[ i, j ] can be converted to my_1d_array[ i * num_columns + j ], where
— num_columns is the number of columns you had in the 2D array assuming row-major order and
— that row and column indices start at 0 (we’re starting at 0).

for i=0, s_len do
d[ i * num_columns ] = i — Initialize cost of deletion
end
for j=0, t_len do
d[ j ] = j — Initialize cost of insertion
end

for i=1, s_len do
local i_pos = i * num_columns
local best = lim — Check to make sure something in this row will be below the limit
for j=1, t_len do
local add_cost = (s[ i ] ~= t[ j ] and 1 or 0)
local val = min(
d[ i_pos – num_columns + j ] + 1, — Cost of deletion
d[ i_pos + j – 1 ] + 1, — Cost of insertion
d[ i_pos – num_columns + j – 1 ] + add_cost — Cost of substitution, it might not cost anything if it’s the same
)
d[ i_pos + j ] = val

— Is this eligible for tranposition?
if i > 1 and j > 1 and s[ i ] == t[ j – 1 ] and s[ i – 1 ] == t[ j ] then
d[ i_pos + j ] = min(
val, — Current cost
d[ i_pos – num_columns – num_columns + j – 2 ] + add_cost — Cost of transposition
)
end

if lim and val < best then best = val end end if lim and best >= lim then
return lim
end
end

return d[ #d ]
end[/ccn_lua]
Gist of the same source code.

GUI for NetTunnel

Designing the GUI for NetTunnel put my creativity to the test. I’ve never actually designed a GUI before, but I’ve seen and read a lot about GUI design theory, but theory seems to be fairly pointless for this design process. It was interesting for me to try to translate the idea in my head to the controls given in Visual Studios.

My first attempt ended up like this:

Main Window
Main Window
Services Window
Services Window

This is okay, but not great. Most of those elements are static elements that don’t move even if you resize it. It’s certainly not something I’d feel comfortable working with every day. After getting lots and lots of advice from friends, my second and final GUI design ended up like so:

Main Window
Main Window
Services Window
Services Window

A much cleaner and easier to understand layout. Services can be toggled just by clicking on the ‘service’ menu and then clicking on the appropriate service from the drop-down, or they can be toggled within the service window proper. All the most commonly used items in the gui are put in obvious places, while making sure that everything’s just a few clicks away. Everything resizes and can have the size proportions for it changed.

Now that I know how easy it is to create GUIs, I think I might start using them in future projects while retaining a command line version for power users.

Introducing NetTunnel

As part of my requirements for obtaining my degree, I’m doing a network capstone project this semester.  I’ve always been somewhat fascinated with NATs and specifically, how to break them, so I decided to work on a NAT traversal application. Enter NetTunnel: The purpose of NetTunnel is to provide a means for users with a lack of knowledge of networking or on a restrictive network an easy and simple means to share network services with other users.

Basically, say I’m on a restrictive network (like the dorm network) that does not allow me to host servers with the world due to NAT restrictions. I want to run ventrilo on my machine and have my friends join so I can chat with them. Unaided, this would be impossible, but if my friends and I are running NetTunnel a “hole” will be opened up in the network so they can connect.

It’s a pretty simple concept that’s already done in most modern desktop applications like Skype and peer to peer games. As far as I could tell this idea’s never been extended to a general level like NetTunnel, so there is a definite need for it. The closest application to it I could find is GameRanger, though GameRanger is aimed specifically at games.

I’ve set up a quick static page for NetTunnel at nettunnel.nayruden.com. Keep an eye out for further development, and I’d love to hear feedback about it!

CODN

While talking to LightSys, an organization that offers free IT services to mission organizations, I was told about the Christian Open Development Network or CODN (pronounced codin’). The site is in dire need of a refresh and a core community (last update is 2005?!). I felt like God was nudging me while they described what they wanted CODN to be, especially since what they need is exactly what I have experience with from managing the Ulysses community.

So, if you or any friends are interested in Christian open source development, be sure to watch the site over the following months as we work on it. If you’ve got any ideas on what you’d like to see there, be sure to drop a note in the comments.

Callback system in D

We’ve been evaluating D for use in Daydream, and I decided to see how easy it would be to create a callback system in the D  language (aka events or signals). This is a daunting task in C++ because C++ templates can only accept a static number of arguments… very bad when you have a function that can accept any number of arguments. To solve this problem in C++ you need to create a separate template for each number of possible arguments.

In D you can create templates that accept any number of arguments! You can treat these as a tuple, an array, or use them with tail recursion (à la PROLOG).

Combine this with the natural awesomeness of D and you’re setup for a power punch. Following this text is a very simple callback system in D.
A short but sweet 50 lines of code; it stores both functions and delegates and gives you a good launching point to create a more complicated call back system.

[ccn_d]import tango.io.Stdout;

// Converts a function to a delegate. Stolen from http://dsource.org/projects/tango/ticket/1174
// Note that it doesn’t handle ref or out though
R delegate(T) toDg(R, T…)(R function(T) fp) {
struct dg {
R opCall(T t) {
return (cast(R function(T)) this) (t);
}
}
R delegate(T) t;
t.ptr = fp;
t.funcptr = &dg.opCall;
return t;
}

class SimpleCallback(R, P…)
{
alias R delegate(P) callbacktype;
alias R function(P) function_callbacktype;

private callbacktype[] callback_list;

typeof( this ) opCatAssign( in callbacktype callback )
{
callback_list ~= callback;
return this;
}

typeof( this ) opCatAssign( in function_callbacktype callback )
{
auto dg = toDg!(R, P)( callback );
return this ~= dg;
}

R emit( P p )
{
static if ( !is( R == void ) )
R last;

foreach( callback; callback_list )
{
static if ( !is( R == void ) )
last = callback( p );
else
callback( p );
}

static if ( !is( R == void ) )
return last;
}

alias emit opCall;
}[/ccn_d]

Here’s some example code:

[cc_d]SimpleCallback!( void ) sc = new SimpleCallback!( void );
SimpleCallback!( bool, char[] ) sc2 = new SimpleCallback!( bool, char[] );
sc ~= function void() { Stdout.formatln( “#1” ); };
sc ~= function void() { Stdout.formatln( “#2” ); };
sc2 ~= function bool( char[] str ) { Stdout.formatln( “#1 called with {}, returning false”, str ); return false; };
sc2 ~= function bool( char[] str ) { Stdout.formatln( “#2 called with {}, returning true”, str ); return true; };
sc();
Stdout.formatln( “Last sc2 callback returned {}”, sc2( “coffee” ) );[/cc_d]
And here’s the output:
[cc_text]#1
#2
#1 called with coffee, returning false
#2 called with coffee, returning true
Last sc2 callback returned true[/cc_text]

Pleased with Daydream

The response I got with Daydream far exceeded my expectations! We now have four excellent developers onboard, myself included. We’ve been putting the rubber to the road and have been coming up with basic design documents for Daydream. This is tedious work for now, but well worth the time it will save us in the coming months.

Are you a modeler or texture artist looking to bolster your profile? Join Daydream and help us and your resume. We’re looking for all skill levels at this point.

Thank you all for your support, and watch this blog for various news about Daydream in the future. 🙂

Here’s some links for Daydream:

Website: http://daydreamonline.net (just redirects to forums right now)

Twitter updates: https://twitter.com/ProjectDaydream

Wiki: http://wiki.daydreamonline.net

Git repo: http://github.com/Nayruden/daydream

For those who may be interested, we’ve decided to use the LGPL license.

Project Daydream

Daydream has been a dream of mine (haha, get it?) for quite some time. It’s something that I believe is needed, but when I start looking at the time/effort needed to work on it, I usually run for the hills. I’ve compromised with myself in this regard, I do not expect to complete it. No matter how much I work on it or not, it will be worth the experience. I’ll take it nice and slow, baby steps the whole way. This is part of the reason I named it ‘Daydream’. If Garry Newman, whose math (see quaternion section) and programming skills never cease to amuse, can code something similar, I certainly can too.

So, what is it?

Daydream takes a page from Garry’s Mod‘s book, an open sandbox. It makes a good starting point for our explanation and most of the readers have experience in GMod, so we’ll list some of the major points of difference between Daydream and GMod below instead of starting from scratch.

  • Stability. This is the crux of what’s holding GMod back from wide public acceptence. Server owners simply have to expect a popular to crash roughly once an hour. If you don’t believe me, I’d happily get you in touch with some other server owners. GMod is built off HL2, which was simply not built to do what GMod is doing. It’s true that VALVe has put a lot of effort into supporting this, but it’s simply not enough. Daydream will have a high emphasis on making it as stable as humanly possible.
  • Cross-platform. Steam, which powers HL2 clients, does not support anything except windows. This does not hold true for servers, but Garry has repeatedly refused to support Linux. Our goal for Daydream is to have it run on Mac, Linux, and Windows. I have lots of experience programming on all the platforms except Mac, but now that I own a Mac, supporting it shouldn’t be a problem. 😉
  • FOSS (Free and Open Source Software). The community has been vital to GMod’s success, and the same would certainly be true of Daydream. What better way to do this than to have true community collaboration? I personally believe that the community is what would make or break Daydream. Making this open source also brings in the traditional benefits… more help with coding, greater bug visibility, easier to make modifications, etc.
  • Support for more players. Garry built GMod with exclusively single player in mind, telling people asking for multiplayer that they were essentially crazy. I have no idea if Garry still sticks by this design decision, but the impacts can be clearly seen. Garry is also relegated to the fact that Source cannot support more than 32/64 players on a server. One of Daydream’s long-term goals is to be essentially an MMO (similar to Second Life). This isn’t a feature that can happen overnight, and considerations for hardware must be made. Phsyics sandboxes are very intensive in memory and CPU.
  • More focus on learning. An online MMO physics sandbox is a great place to learn about the real world! What happens when a soccer ball is hit with a tornado? What’s the best way to light a wooden house on fire? Users should feel like they have nearly limitless possiblities while using Daydream.
  • Persistance. Disconnecting from a server shouldn’t be a death sentence. You should be able to bring your creations with you wherever you go, share them with others as you like (maybe even sell!). If someone’s created an awesome model (the building blocks of a sandbox game), they should be able to show the model to other people without intervention of the server owners (with reservations). Ideally (until Daydream is a true MMO), there would be some sort of cooperation between server hosts in order to store information about players.

Other features we want to implement:

  • In place script editor. You should be able to look at the scripts powering, for example, an ATV and change the behavior of the object on the fly. I want to be able to create a sphere, bring up the editor and see all relevant functions either stubbed out for me or easily dropped in the script, and make the sphere act like the the golden snitch from Harry Potter within a period of five minutes.
  • Clearly defined and standardized object I/O. If an object can move, other objects should be able to control this outside the script editor. IE, I should be able to create a button, give it a value of 0 when off and 255 when on, and hook it up to say, a cube, specifying acceleration along the z-axis or such. Then, when I press the button, the cube will shoot straight up into the air accelerating at 255 units/sec. This clean I/O interface allows even non-programmers to take full advantage of their surroundings in new and innovative ways.
  • Everything must be standardized, modular, and extensible. We realize that we’re a small team and relatively inexperienced. To counter this, everything should be as modular and extensible as possible so we can come along later and easily improve previous work. This also goes toward our MMO goal. We should litterally be able to replace the server software completely with as few changes to the clients as possible. Even if clients written in completely different languages were to join the server, this shouldn’t be a problem. I know this goal sounds lofty, but it is attainable.

An invitation for other developers:

If you are experienced in C++ or Lua, we’d like to have your help! Our development approach is very casual. So far the development team is made up of college students who are in this mostly to learn. We’re in no rush, we want to go slowly and make sure we get this right.

If this sounds like your type of project, please get in touch with me! Either leave a comment here, or email me, using megiddo AT (the address of this domain).

Technologies you should be willing to learn and become familiar with as a developer:

  • Ogre3d – This is our graphics engine
  • Git – Our source control (via github)
  • Lua – Our scripting language
  • SWIG – Our interface to Lua
  • Newton – Probably? We’re still toying around, PhysX seems to limited

An invitation for ideas:

Have an idea for this project? Leave it in the comments! We appreciate any feedback we can get at all.

Website for daydream:

Not much to look at yet, but it’s http://daydreamonlinet.net