Protected: Measuring Complexity

In this post I want to examine one way of measuring code complexity or other similar processes such as business logic. There are already several algorithms available for measuring code complexity such as Cyclomatic Complexity, the article Programming Tools: Code Complexity Metrics gives an overview of several complexity metrics.

We want to measure code complexity by somehow mapping the problem to one of packing. The idea is to map code to a set of 2d polygons (one way is to have each class be its own polygon) and then measure the complexity of the set of 2d polygons through packing. The complexity metric I use for a set, S, of 2d polygons is the density of the best 2d packing of S.

First the mapping of code to polygons needs to be defined. For an object oriented language each object (by object I mean a Java or C++ class) is mapped to its own polygon, similarly for a functional language each function is mapped to a polygon. For the remainder of this article I’ll assume the language is object oriented.

The mapping from a class to a polygon needs to use some simple measures of complexity.

1. The number of calls to distinct functions outside the class (by distinct I mean two calls to strlen() would count as 1 call to a distinct outside function).
2. The number of functions insides the class (this may need merging with 3).
3. The total number of branches inside the class (by branch I mean an if-then statement, case statement, or any similar construct).

Let P be the polygon to be mapped from an object O. The number of vertices of P equals 1. The average interior angle of P will be equal to the interior angle of a regular polygon with 1 vertices but the standard deviation of the interior angles of P will be equal to 2 + 3 (A linear scaling factor might be needed). By perturbering the interior angle of P a more complex shape is created. The element of randomness does mean that the mapping is non-deterministic but that’s not a large problem because measuring code complexity is fundamentally a fuzzy task.

My next post will finally describe a slightly different algorithm for solving tangram puzzles.

Protected: Tangram Applications

Today I want to give details on some possible applications of algorithm 3, that I outlined in my last post on tangram puzzles.

The first application is 2D packing, which is a variation on bin packing.

The 2-dimensional packing problem I want to solve is defined as follows:

Given a set of simple polygons S, find the smallest rectangle R and a packing such that the polygons in S are packed inside the rectangle R.

Notice that one of the constraints on problem is that we need to find the smallest rectangle that can be packed with the polygons in S. I don’t know if the algorithm I outline below satisfies this constraint, a later post will revisit this issue. My algorithm also restricts the rectangle R to be a square.

The algorithm for solving the above packing problem is as follows:
0. Set a tolerance ε.
1. Set a step size, dl, of 1.
2. Create a square, R, with sides of length l = 1.
3. Check using algorithm 3 (position_puzzle_pieces) if the polygons in S can be packed into R.
4. If S can be packed into R, then set l = ldl.
5. If S cannot be packed into R, then set l = l+dl.
6. If steps 3 through 5 have looped more than once, then goto step 7. Otherwise, if for the last two loops of steps 3 through 5:
(1) S couldn’t be packed into R, then set dl = 2dl;
(2) S could be packed into R, then set dl = 2dl;
(3) otherwise dl = dl/2.

7. If, in step 3, S could be packed into R and dl < ε, then stop; otherwise goto step 3.

The above algorithm uses a binary search in combination with position_puzzle_pieces to find the optimal length l. The algorithm does assume that if S cannot be packed into a square with sides of length x, then S cannot be packed into any square with sides of length less than x.

That’s it for today. There are a couple other applications I want to talk about tomorrow and hopefully I’ll also have time to delve more deeply into variations on the position_puzzle_pieces algorithm.

More spoilage

The question of minimizing spoilage still interests me. Today I want to consider the consequences of relaxing one of the constraints on the problem. In particular I do not assume that the two different types of fruit cost the same per unit. And I want to minimize the total amount of wasted money (i.e. the amount of spoiled fruit times the unit cost.)

Definitions:

1. There are two types of fruit $A$ and $B$. The unit costs are $c_A$ and $c_B$ respectively.
2. We can consume $z$ units of fruit per day.
3. The goal is to minimize the cost of the total spoiled fruit.
4. The percent of fruit that spoils at the end of each day is $m_A d$ and $m_B d$ respectively for A and B, where $m_A$, $m_B$ are constants and $d$ is the number of days since buying the fruit (i.e. $d = 0$ for the first day.)

Calculations:
First, note that by minimizing the cost of spoiled fruit at the end of each day the total cost of spoiled fruit is minimized (I should prove this and I might in a future post. I think this comes from the fact that we’re solving a convex optimization problem?).

Given $x$ and $y$ units of fruit for A and B respectively.
We can consume $z_A$ and $z_B$ units of $A$ and $B$ respectively, where $z_A + z_B = z.$

The amount of spoiled fruit at the end of the day is
$m_Ad(x-z_A) + m_Bd(y-z_B),$ the cost of the spoiled fruit is
$m_Ad(x-z_A)c_A+m_Bd(y-z_B)c_B.$

Using $z_B = z - z_A,$ the cost of the spoiled fruit is $m_Ad(x-z_A)c_A+m_Bd(y-z+z_A)c_B.$ Expanding we get

$m_Ac_Adx-m_Ac_Adz_A+m_Bc_Bdy-m_Bc_Bdz+m_Bc_Bdz_A$
which reduces to $= z_A(m_Bc_B - m_Ac_A)d + m_Ac_Adx + m_Bc_Bdy - m_Bc_Bdz.$

Taking the derivative with respect to $z_A$ we obtain $(m_Bc_B - m_Ac_A)d.$
If $m_Bc_B - m_Ac_A$ is greater or equal to zero, then $z_A = 0, z_B = z$ minimizes the cost of spoiled fruit; otherwise $z_A = z, z_B = 0$ minimizes the cost of spoiled fruit.

From the above it’s clear that introducing the factor of fruit cost does not fundamentally alter the solution. But it does introduce some interesting factors, that warrant further consideration.

Protected: Solving tangram puzzles

A description of the tangram puzzle is available at wikipedia.
My sister Katy and I had the opportunity to play tangrams while traveling on the trains in Europe last summer. It was then that I had the idea of trying to solve the puzzle computationally.

There are several different heuristics for solving a tangram puzzle. But first we need to make some definitions:

1. Silhouette means the given shape that we need to form.

2. Puzzle piece means one of the elements from the set of 7 shapes to be used in forming the silhouette (on the wikipedia page these pieces are called tans.)

The following are possible methods/algorithms to solve a tangram puzzle.

1. Use a heuristic to cut the given silhouette into a number of shapes, then check if the resulting set of shapes matches the set of puzzle pieces.

The following images demonstrate the above algorithm, using a set of two puzzle pieces.

2. A Heuristic Solution to the Tangram Puzzle, E. S. Deutsch & K. C. Hayes Jr., Machine Intelligence v7, p205-240, 1972 (TODO: Lookup algorithm and give short description)

3. Use a combinatorial approach, that is try each shape in all possible positions and quit when one of the configurations is a solution. In more detail the following is the algorithm. Note that the algorithm can solve more general problems than just tangram puzzles.

function position_puzzle_pieces(P = {set of puzzle pieces}, s = silhouette)
if P == ∅: return true
for p ∈ P:
g=position_puzzle_piece(p, s) #create a generator to position the puzzle piece p, inside the shape s.
P = Pp #remove puzzle piece p from the set of puzzle pieces.
while g.next_starting_position() == true: #continue loop, until p is in the last possible position inside of s.
s1 = s\p #subtract puzzle piece p from shape s and set s1 equal to the shape resulting from the subtraction.
if position_puzzle_pieces(P, s1) == true: return true
end for
return false

I’ve coded up method (3) for solving tangram puzzles. The algorithm name for position_puzzle_pieces is Poly-Solve. There are a number of low level issues to deal with when implementing Poly-Solve, but for this blog we’re interested in just the high level details.

My next post will go into some of the applications for method (3), maybe more of the high level details in implementing method (3) and running time/space analysis, and other related problems to tangram puzzles.l

Minimizing spoilage

Yesterday we bought some strawberries, cherries, apricots, and plums at the local farmers market. Infact we bought more than we can possible consume before some of the fruit spoils. The problem under consideration is how to minimize the total of amount of spoiled fruit.

This problem is interesting since different fruits spoil at different rates. I’ll consider the simplest case of just two types of fruits. There are different possible models for the rate of fruit spoilage. I think the simplest one is to assume that the percent of good fruit that spoils in a day increases as a linear function.

As an example we assume the following table for the percent of strawberries that spoil on a particular day

…………………Percent of fruit that spoils
Day 0……………………..0%

Day 1……………………33%

Day 2……………………66%

Day 3………………….100%

By the above table, on day two 66% of the strawberries that weren’t spoiled at the beginning of day two, are spoiled by the end of day two. For convenience sake I assume that we consume fruit at the end of a day.

Table for cherry spoilage
…………………….Percent spoiled
Day 0……………………..0%

Day 1……………………25%

Day 2……………………50%

Day 3…………………..75%

Day 4…………………100%

Now, given 100 strawberries, 100 cherries and the ability to consume 20 pieces of fruit per day, how should we schedule our fruit consumption to minimize the amount of fruit that spoils (i.e. goes unconsumed.)

My solution is to use a greedy algorithm, I’m not positive that it minimizes total spoilage but I believe it does (Another possibility is to use linear programming but I like the greedy algorithm more.)

The algorithm is as follows

Given two goods A and B, percent X of A will go bad the next day, percent Y of B will go bad the next day, we have n pieces of fruit A, m pieces of fruit B, and we can consume p pieces of fruit (where p Y.

The amount of fruit that spoils, S = X * (n – n1) + Y*(m – m1), where n1 and m1 are the amount of fruit A and B we consume respectively, and n1 + m1 = p.

Now note that n1 = p – m1, obtaining S = X*(n-p + m1) + Y*(m – m1) = X*n – X*p +X*m1 + Y*m – Y*m1 = X*(n-p)+Y*m +m1*(X-Y)

Looking at S, it’s apparent that S is minimized when m1 is minimized, that is n1 is maximized. So
we should consume the strawberries first and then consume the cherries.

That’s all for today.