Archive for the 'Tech' Category

Circular Dependencies in the real world

In programming there is a problem known as circular dependency, which you know is real because it has its own Wikipedia page and everything.  In summary the problem is that Class A needs to know about Class B, but Class B needs to know about Class A.  The problem arises because the compiler has to pick one to start with.

As an analogy let’s take Bob and Alice who have been married for many years and won’t go anywhere without each other.  Strolling through the park they find the bridge across a stream is out but they are in luck, someone with a two seater swan boat is willing to offer rides.  Since it is his boat (and no one is going to question why in the world he owns one personally) he must remain in it at all times.

Now since Bob and Alice, or Alice and Bob if you prefer, won’t go anywhere alone here is the problem.  The Cygnus captain only has room for one.  He points to Alice and says “I’ll take you across first” to which she responds “But Bob won’t be on the other side when I get there so I can’t.”  He turns to Bob and says ” if you go first then you will be on the other size and therefore she can cross right after.”  Bob replies “That sounds great but only if Alice meets me across the river when I get there.”  Exasperated, holding his face in his hands, the captain doesn’t see the iceberg coming downstream which proceeds to sink his swan.  Luckily the river is only a few feet deep but still he winds up wet and boat-less.

He hit an impass, there is no way to get one to cross without the other being there first.  Despite The Substitute teaching otherwise, sometimes the best solution is not to throw a boombox out a second story window.  Outside of programming this is known as the chicken or the egg problem.  Clearly one came first but there is no clear cut answer as to which is correct (the egg is the right answer btw).

Why am I talking about eggs?

Other than the fact that they are delicious, this problem comes up ALL THE TIME.  Just today I was listening to a cross discipline engineering project where the hardware people said they couldn’t finish without the software (true) but the software people couldn’t get started without the hardware it runs on (also true).  Think back and it shouldn’t take too long to come up with similar situations in your life.  Businesses (especially tech support) thrive on this.

“To recover your pin number, please type in your pin on the phone keypad now.”  Most cases aren’t this bad but stuff that stupid does come up now and again.  Sadly the people involved are likely alive and employed.

How do we fix this?

You’ve convinced me I hear you saying, now how do I solve these seemingly insurmountable cycles?  First I say: Get out of my house!  Secondly we have to take a look and see if we really need one before the other, if not hooray the problem just fixed itself.  The key to break the cycle is to do anything that breaks the cycle.  In the world of making things that depend on other things this would be a prototype.

Wait, did you say hardware Prototype?

Not exactly, you filthy word injector, but yes that is what was implied.  Certain situations make prototypes more than helpful, they become essential.  That situation would be any time you don’t know how to do something.  Since any interesting project will have at least some unknown aspect this covers 99.9999+% of all products.  People thought all vacuums were the same until James Dyson got tired of clogged filters and did something about it.

The way to  break the cycle is to build something, anything that gets you closer to the goal.  IT WILL BE WRONG  and that is okay.  Failing when you absolutely, 100% upfront intend to fail isn’t bad, it’s only bad if you don’t learn from the experience.

Say you are building a widget; I’ll wait until you finish speaking to continue.

This widget is revolutionary and will sell like hotcakes back when people used to buy a lot of hotcakes.  The only problem is like many products it requires hardware and software.  Damn, now we are really screwed because apparently programmers and engineers (not software engineers) have a fundamental mental disconnect in how they approach problems.

walrus and carpenter

The Engineer and the Programmer, one of which is about to get nailed in the bean with a hammer

The Hardware Engineer Says:

I have to design this perfectly up front, it is too expensive otherwise.   If I miss one detail it will take months to get the manufacturer to retool the assembly line, that will costs tons of money, hell it may kill the entire project.  Better take as much time as it takes to get it right the first time.  Wish that lazy Programmer would tell me the specifics he needs, then I could spec just enough power to make it work.

The Programmer Says:

I have never done anything like this before, there is no way I can get it right the first try.  Would love to make a quick prototype that supports the most important features of the product but the hardware to run it doesn’t exist yet.  What the hell is wrong with that Engineer wasting time on crap that will clearly be wrong the first time, if not multiple times?

Who’s right?

As one of my professor’s says “it depends,” which is sadly almost always the correct answer for a complex situation.  Both the Engineer and Programmer raise good points.  It does cost money to make mistakes but it is hubris to assume that by working really really hard you can get it perfect the first time.  In the computer world contradictions this strong cause your compiler to lock up, machine to blue screen (even on Linux), and the cpu to burn itself to ash to avoid the agony.  The real world, however, thrives on this shit.  Those familiar with quantum weirdness know that Schrödinger’s cat is both alive and dead at the same time, at least until you open the box.  Chickens and eggs don’t care which one came first, they just keep doing whatever it is they do (being delicious).

Pictured: deliciousness

That wasn’t an answer at all!

Damn, you caught me.  So here is my opinion on how you solve the problem.  In building our widget you get the engineer to create a really cheap prototype that does the absolute essentials probably right.  But it is cheap and fast.  He then gives this to the programmer who can start prototyping while the engineer works on fixing obvious and not so obvious design flaws.  You know, the kind that make you slap your head and say “wow, wish I had known about that back in the design phase.”  The really important part is that they share feedback throughout the product life cycle.  If the engineer flings a circuit board over the wall then bit later a CD comes bouncing back it absolutely won’t work.  Close feedback is the key to making this work.  Note: you can have good feedback without being able to physically talk face to face, these days this thing called the internet solves those kinds of problems.  In the software world this form of management is called Agile Development.  In the engineering world it is called Blasphemy.

So we are starting out this supposed year long project.  Team A is using the old standby methods to build, Team B is trying out new, more radical thinking.  Suppose the deadline is absolute.  There is no reasoning with it, no extending it, no nothing.  Here is how these projects work out often enough.  Numbers may or may not be accurate, but they prove my point.

DateTraditional EngineerTraditional ProgrammerAgile EngineerAgile Programmer
Month 1Got lots of research to do!Lots of research to do!Finishing touches built on first prototype after week one brainstorming session.Been working with engineer to get prototype working.
Month 3Moved on to designing boards and actual engineering.Toyed around with a few ideas but not really sure what the hardware will be so can't write anything.Got lots of feedback from first prototype, started on next, more polished one.Got a pretty good idea where the hard parts are going to be, creating software prototypes of tough stuff.
Month 6 (of 12)Design 90% done but still a few kinks to work out.Playing solitaire as there is no real work to be done until have some device to test out.Last prototype nailed it, now polishing design and adding optional features.Testing of required features mostly done, minor bugs remain.
Month 890% done.Trying to look busy.Final polish being added.Minor features being coded.
Month 10Done with board, passed off to programmers.Coding franticly.More final polish, working with marketing to improve product launch.Optimizing code and working on documentation.
Month 11Waiting on feedback from lazy programmers.Code harder than first expected, putting in extra hours.Started work on new project while waiting for launch.Final cleanup and ready for release.
Month 12Yelling at programmers to get done.FUUUUUUUUUUCK!Enjoying work on second product while success of first happens.Started on code for second product, not burned out from first project.

Yes this is an extreme exaggeration, but it’s my blog so tough crap.  And many real world products move along exactly the same way as the first team.

Summary

If you want to break the chain of impossible dependencies you have to do something!  Planning gets you no closer to shipping when you cover the same ground over and over and over again.  Take a chance and build something that might work, or might not.  As long as you aren’t making life critical products (like a pacemaker) this is not only a good way to go, but in my opinion it is the best way.  Look for future posts on how waterfall design methods (that all engineers use) are crap and agile hardware development.

DVD Ripping the Windows Way

I finally decided to digitize my dvd collection.  Two things prompted this, 1 the rack is running out of room (see below) and 2 they are all at my parents house.

dvdrack

This prompted research as how to break the annoying dvd decryption and obtain personal use backups of dvd’s I actually legally purchased (as opposed to many of my other collections).   Googling quickly showed a few programs that work and I find useful.  The first being the zombie program DVD Shrink.

While DVD Shrink is no longer being worked on,  you can still find it floating around the net.  The most recent version is 3.2.0.15, note it is freeware so avoid the sites that charge for it.  Or just download it here.

The next piece of software used is called Handbrake, which takes dvd formatted data and converts it to normal video files such as mp4, avi or ogm.  It is open source and therefore free.  The thing I am liking most about it is the ability to queue up multiple conversions and let it run overnight, which is important as it takes around 1.5 hours on my gaming machine per movie.

The two afore mentioned pieces of software work most of the time but since DVD Shrink is a few years old it can’t break some of the newest encryption methods (damn you Disney).  This is annoying because Wall-E is an awesome movie.  So to get around this I found AnyDVD HD.  It tears out the encryption on all dvd’s, the HD version even works on blue-ray and HD discs which is awesome.  The only not awesome part is the price, 79 euros = $104 real dollars.  There is no way I was dropping that kind of money on software for a single week long ripping party, luckily they have a 3 week trial version.  The only downside is it forgets your settings when it shuts down, which only matters if you turn off your computer (I don’t).

With my newly assembled arsenel of software tools I began tearing into the rack of disc’s like a rabit badger, or at least a wounded chipmunk.  Here is my method.

DVD Shrink makes fine backups but as they are huge (4+ gigs each, some over 8 ) I decided to convert them to the more portable and popular avi format.  Max quality avi’s of these work out to be around 1.6 gigs tops (Lord of the Rings), pretty close to 1.4 gigs on average though.  If space is an issue then they can be shrunk later on.  Let’s start making the MPAA cry.

First you insert the dvd in your machine and launch DVD Shrink.  Then choose open disk.  If you have it AnyDVD should be launched first as it intercepts dvds before you open them in DVD Shrink.  Closing or modifying AnyDVD settings while ripping with DVD Shrink causes it to lose connection to the disk and you have to start over.  When the fox icon is pink it is thinking, when it is red it is ready.

rip1

Wait while it analyzes the movie, this takes about 2 minutes.  Usually mine start off at a pretty slow speed, like 2,000 kbps then work up to 9,000 max.  The encoding phase does this as well.

rip2

If you want to back up the entire disk, menus extras etc, then just choose backup after the dvd is analyzed and skip the next step.  If like me you are only interested in the feature film go to re-author and drag the title from the “Main Movie” area over to the empty area on the left.  Now when you choose backup it will only save that one, cutting down on important disk space.  I have a 1 terabyte drive as the buffer for this but 7 gigs adds up pretty quickly.

rip3

Once you have the tracks to rip, let ‘er rip.  It takes anywhere from 7 minutes to an hour, depending on how much there is to work with and if you chose any other fancier options.  Generally get less than 10 minutes though.  Again it starts off slow and works up to a decent speed.

rip5

This is the message you are looking for, the a-ok to move on.  Congratulations, you are now out a large chunk of disk space.  If you are ripping multiple disks at once make you to switch back to “Full Disc” mode as otherwise it pops up an annoying message and beeps at you.  Also you can’t open a new disc until you change to full disc mode.  Note the discrepency in times between the previous picture and this one.

rip7

Next we can start converting the big honking mess of dot vob files into a usable format.   Start up Handbrake and choose Source in the upper left corner, then select the DVD / VIDEO_TS Folder.  Find the folder you want to convert on your hard drive and select it.  Sometimes you have to choose the VIDEO_TS folder itself, the immediate parent one won’t work.

rip8

Then you choose the file name, type and all the other settings I don’t mess with to create the actual file you wanted in the first place.  Normal has worked in all cases, I just change the file type.  Make sure to look at the title drop down and check that the file length is appropriate (seen here), some of them have multiple ones to screw you up.  Annoying to waste 2 hours to find out you chose the wrong chapter to burn.

rip9

Choose where you want to save the file and what to call it.  Also the file type, I went with avi as it is widely supported and ogm didn’t work on my machine (will figure that one out later).

rip11

Go over all the settings to make sure it looks good then start it going.  The start button makes it go right now, alternatively you can hit the “Add to Queue” and make a big list of them for your computer to work on while you sleep.  It is very cpu intensive so best to let it run alone on your machine if at all possible.  When you hit Start you get a friendly CMD.exe window that has an eta on it.  Don’t close this window.  It works in two stages, the first is encoding which is the fast phase.

rip12

Then there more encoding which eventually writes it to disk.  Takes over an hour for my machine per dvd so settle in.  This next picture is just to show what the second phase looks like.

rip13

The next to final stage is to verify the rip actually worked.  Choose your favorite media program (I highly recommend VLC) and load the file, make sure the sound is sync’d right and other such things.  Note the video did work, the screenshot capture method didn’t.

rip14

When you know the video ripped correctly go back and delete the raw dvd folder as it is just a waste of space.  Now you too in around 2 hours can make digital copies of stuff you already owned.  Sad fact of the day: with a fast internet connection it is  quicker to pirate the movies than convert the ones you own.  But piracy is wrong, and backing up dvds is mostly legal.

One final tip, you can run as many copies of DVD Shrink as you have dvd drives.  I crank on two at a time, tried to put in a third pulled from an old computer but my motherboard didn’t have a second ide cable.

rip151

Have fun backing up you movies!

Dreaming of clouds

Cloud computing is a somewhat nebulous term currently.  I interpret it to mean a way of writing code, it runs somewhere and just works.  You don’t care where the server is, what kind of database it uses or any of that.

IEEE has a wonderfully vague definition: “It is a paradigm in which information is permanently stored in servers on the Internet and cached temporarily on clients that include desktops, entertainment centers, table computers, notebooks, wall computers, handhelds, etc.”

Some examples that fit this definition are: Bittorrent, Facebook, Google Chrome, Grid Computing, Ruby on Rails; the list goes on.  That is sufficiently general as to be worthless.

One thing is certain, cloud computing is HOT.  It is flavor of the month of keywords.  People talk about using cloud computing for all kinds of things without knowing what it is ( it seems no one really knows either ).

But as it stands now, despite the continuous buzz, it will not work, because there is a fatal flaw.

Amazon has the Elastic Compute Cloud, Microsoft recently published they are planning a cloud based OS, Google has, well pretty much everything they do falls under cloud computing.  New companies every day say “I want a piece of that fat cloud pie!”  This will only accelerate the problem.  If the industry sees the problem soon enough, as I have, then cloud computing has a rich future.  If not it will join Web 2.0 in stagnation and, eventually, extinction. ( I know Web 2.0 is not dead but it certainly has a head start down the trail cloud computing is on. )

The problem stems from the fact that since no one knows what cloud computing is, everyone gets to form their own definition.  Definitions are only a symptom.  I feel the main goal of cloud computing is to write code and it will work.  You don’t care how, you don’t care where, it just works.  Really that is the goal of all code, I don’t really care that the file my program needs is located in My Documents in Windows or somewhere else in Linux and another place in Mac, I just want to open the damn file.

Stories help illustrate points, here is one about a devoloper named Dave.

Dave has a cool idea for a web app.  It has needs a database and some server space and Dave has no experience with either.  He hears Amazon has a cloud setup that allows him to just write code that works without knowing all the gory details.  Extra bonus: if his service takes off and gets thousands of users it will scale automatically.  Dave loves that and whats more he buys books from Amazon, so he knows they can be trusted.  Dave can’t sign up fast enough.  And for a time, things are good.  It does what he wants but isn’t everything he imagined.  One day while explaining his project to Joe at the water cooler, Joe nods and follows along.  He then says “Dave, that sounds good but all the problems you are having just don’t happen in Google’s cloud system, it really is a much better fit for what you are doing.”  This gets Dave thinking, he starts looking into it and Joe is right, it would be a better fit.  So he signs up for Google’s cloud computing to see the differences.  Lo and behold it really is perfect.  He decides to transfer immediately.  Now the problem happens: Dave has tons of fully functional code on Amazon’s Cloud and now it needs to work on Google’s Cloud.  The only way to do that is to re-write it from scratch.  Well not scratch per se, Dave knows everything it needs to do, he just needs to change all the API specific stuff out.  That could be easy or nigh impossible depending on a variety of factors.

This is called vendor lock in.  If you use one product for long enough your stuff tends to revolve around it.  This has the effect of cementing it into your foundation.  After a while the only way to remove it is to destroy the entire house and rebuild from scratch, something most people and companies will never do.

The solution to this is so simple it is scary.  Make a standard.  These are the functions that will be supported on all cloud server architectures come hell or high water.  They work the same no matter who is hosting it.  A vendor can make special non-standard functions and users can use them at their own portability peril but it doesn’t stop new ideas from happening.  Any sufficiently cool idea should be added to the standard.  This way Dave can take his code from Amazon to Google or Microsoft or any of the others and it will do what was promised: it will work.

Programming languages have been doing this for years.  They all come with a set of functionality ( how much depends on the language, compiler, etc ) and it works regardless of where you write that code.  C++ programmers can use the boost library but not everyone is guaranteed to have it, but int better damn well be declared and work the way I know it should in all my C++ compilers.

If cloud computing embraces a standard then it can save itself from death.  It would allow people to change vendors when it suited them, to not be locked in for all eternity.  More importantly it would allow the bad vendors to die, instead of leeching off of one group who is too entrenched to get rid of them.  Code portability is something that would elevate cloud computing to its place in the clouds, instead of wandering around buzz word marsh until it winds up rotting in forgotton bog.

Spore

Finally the long awaited day:  Spore is here!  I am sitting here staring at the EA Download Manager just waiting for it to let me in.  Last night felt like Christmas eve when I was six all over again.  Pretty safe to say I have never been this excited about any game, or movie for that matter.  Several web sites had reviews up but it doesn’t matter what they say, nothing could deter me at this point.  Might post again after some play but more likely will be sucked into the game all day.

Big O Notation

It is time for a primer on big o.
Big O

No, not that Big O. I am talking about the one we use in computer science. (If you don’t undestand the picture it is from an anime called The Big O)

What is big o notation?

Big O notation is a method used in computer science, and mathematics, to describe the efficiency of an algorithm. There are several other notations but big o is the most common for computer science. In short it is a function that describes the worst case performance of an algorithm. Big o means given n inputs the algorithm uses O(x), pronounced O of x, resources. x is a function that depends on n. Generally we are measuring operations, and thus speed, but it can also be used for memory usage or many other metrics.  Big o is useful because it tells us the algorithm will never be any worse than this. Little o tells you it can never be better than this and theta notation says it will be in this family, but those are beyond the scope of this post.

Figuring out big o

Actually proving some algorithm belongs to a certain set of big o families is not always easy but it is usually pretty simple to ballpark it. There are several commonly used types that we will cover.

Constant Time

Let’s look at a simple function, it just returns a value from an array given the index.  All the examples are going to be in C++, just a forewarning.

int getIndex( int Data[], int Index )
{
   return Data[Index];
}

This will take the same amount of time no matter what values we give it. We call this constant time or O(1). Now for a more interesting example.

Logarithmic Time

Logarithmic time is most often found where you partition the data into two even parts then continue working on one of them.  Binary search is a good example of this, it looks for a value in a sorted array.  It chooses a pivot value in the middle, determines if the value it is looking for is above or below the pivot then starts the search again using the new partition.

int binarySearch(int sortedArray[], int first, int last, int key) {
   while (first <= last) {
      int mid = (first + last) / 2; // compute mid point.
      if (key > sortedArray[mid])
         first = mid + 1; // repeat search in top half.
      else if (key < sortedArray[mid])
         last = mid – 1; // repeat search in bottom half.
      else
         return mid; // found it. return position /////
   }
return -(first + 1); // failed to find key
}

The proof showing that the relation between amount of work down and the number of inputs is not trivial so I will just say that these types of algorithms are classified as O(log n).  (Look up master theorem if you need more nitty gritty)  Technically it should be log base 2 n but few people complain if you just say log n.

Linear Time

Here is a function that takes an array of ints and prints them out.

void printInts( int Data[], int Size )
{
   for( int i = 0; i < Size; i++ )
   {
      std::cout << Data[i] << std::endl;
   }
}

We would say that Size is n because that is how many elements we have to look at. Since there is no way to get the job done without printing each of the n elements this is O(n) or linear time.

Polynomial

If you take a loop that is called n times and place a loop that does an additional n work what do you get? You get polynomial time.

void printAll( int Data[][], int SizeOne, int SizeTwo )
{
   for( int i = 0; i < SizeOne; i++ )
   {
      for( int j = 0; j < SizeTwo; j++ )
      {
         std::cout << Data[i][j] << ” “;
      }
      std::cout << std::endl;
   }
}

If we say SizeOne is n and SizeTwo is m, then we do n work, but each n work consists of m work as well. So we have to do n*m work. If n = m we would get n*n or n^2 work. Polynomial time is anything of the form n^m, where m doesn’t have to be an integer but it is greater than one. O(n^2) is referred to as quadratic because it appears often enough to get a special name.  This is written as O(n^2), O(n^3), etc.

Others

Exponential is O(c^n), where c is some some constant value. Examples that use this would be the traveling salesman problem, a notoriously hard problem in computing.

N to the n is O(n^n), something you should avoid writing if at all possible. These are slow. And by slow I mean the universe will end before this code finishes running.

In summary here are the most common forms and their names.
Big O Notation table

Why You Care

If each operation takes one second, then with n at 1 million this is how long it takes for each of the common ones to run.

Big O Timing

It should be obvious from the chart that a small change in the big o value of your algorithm can make a huge difference in computation time.  For most problems you can find something that is polynomial or better, log-linear being the best you can get for many problems.

Big o is also the worst case performance.  So getting O(n^2) may only occur when all the ducks are in a row, average case may be O(n log n) which is much faster for large n.  Another thing to consider is while many algorithms may have the same big o implementations affect performance as well.  So for small data sets a lean O(n^2) algorithm may out perform a O(n log n) one that has large amounts of overhead.  All things to keep in mind when you are writing code.