Feb 26, 2006
PS3 Woes and IBM’s Octopiler
Sony’s PS3 is supposed to be out this spring, but there seems to be several issues that may hold them back. Of course, Sony isn’t backing down from their date just yet…
First is cost. The financial people are saying that each box is going to cost over $800 at launch. OUCH!
Second seems to be the IBM Octopiler, better known as the software that is supposed to tame the beastly Cell Broadband Engine. See, most programs that run on the PC that you are reading this on, or even that run on gaming consoles are written in some kind of high level language (like C++, Java, etc.). This language is then turned into a program that you can execute by a “compiler”. In very simplistic terms, it turns the human readable source code language into the 0’s and 1’s (binaries) that your computer can read and understand.
Well, the Octopiler (multiple armed compiler…) is much more ambitious. Even its headline is complicated – “Supporting Single Instruction Multiple Data (SIMD) and Heterogeneous Parallelism Automatically!” Huh?
Well, for those of us who DON’T think in 0’s and 1’s – basically that means that the Octopiler is going to put out binaries that are already optimized for multiprocessor and massively multiprocessor systems.
What’s the wow factor? That just isn’t being done today.
And you say: “But I know that there are computers with 2 CPU’s in them, and they even have DualCore CPU’s now, with two processors on a single chip. There is nothing new in that…”
Oh, but there is. See, in today’s world, most of the heavy lifting of using multiple processors is left to the OS – a huge beefy chunk of code that needs to be BETWEEN the program you want to run and the multiple processors that will actually do the work. If an application is written to be multi-threaded (meaning it spawns off multiple little processes of data crunching), then the OS can hand off these multiple processing threads to multiple CPU’s. The program itself doesn’t think about multiple processors – it leaves that to the OS. And, people write multi-threaded code for many reasons – not just this benefit when running on a dual processor machine.
Well, the Octopiler is set to change all of that. It is intended to be the next level of compiler – one that can take a sequential program that’s written to a unified memory model, and output binaries that BY THEMSELVES make efficient use of beefy multiprocessing systems like Cell’s.
The closest analogy I can think of is this:
Task
Say you need 16 items, 2 each from 8 different stores. Your spouse has to follow your plan for a shopping trip and they have to be back home by 5pm with everything purchased.
Today’s Compiler
You are today’s normal compiler, the task is your source code, your spouse is the CPU and the goal is to have all 16 items in your house as quickly as possible.
So, as today’s compiler, you sit down and lookup all the addresses of the stores, map them out and come up with what you think is a logical plan of attack based on distances, etc. You factor in traffic patterns, pee breaks and lunch. You come up with an exact, optimized step by step process from point A through to Z. You hand the plan to your spouse and walk them out to the car. But, when you get outside, you find that there are actually 3 more people there to help your spouse achieve the results (spouse + 3 more = 4 processors).
Hmm.. Nuts. An agent for the other three people (the OS) takes your detailed A to Z one person plan and tears off individual pieces of it and hands each person 1 store to go to, directing them to come back to him when they are done to receive their next task. 1 person ends up going to 3 stores, 2 people go to 2 stores and the last person only went to 1 store.
The final results? Even with these inefficiencies, more is still less – time that is. The job is finished in 1/2 the time you originally expected because of the extra help. Why not 1/4 the time since you had 4 times the number of people? Easy – the inefficiencies of not knowing all the facts before you made the plan. A lot of time was “wasted” in extra round trips for each person to receive their subsequent instructions.
The Octopiler
Now you are the Octopiler, the task is still your source code, your spouse is still the CPU and the goal is still standing at home with your 16 items as quickly as possible.
But, this time you know that it won’t just be your spouse going on the trip. You aren’t exactly sure how many people will be there, but you do know that there will be more than 1. Instead of mapping out the procedural process (go to store 1 then store 2, etc.), you instead figure out all of the operations that can be done in parallel with each other – meaning that it is completely possible for multiple people to be going to store 1 and store 2 at the same time – that can be done in parallel. But it is impossible for efficiency to have people going to store 1 and be buying the items from store 1 at the same time – that needs to be done 1 step after the other.
So, once you have figured out the parallel nature of the tasks, you then figure out the relationships that exist between these parallel tasks, taking into account their relative distances from each other, the multiple pee breaks and lunches, traffic patterns, etc. You then make a little chart that outlines optimized plans for X number of people. When you find out how many people there will be, all you have to do is look up the plan for that specific number.
So, with your planning done, you walk outside and find out that 3 other people will be working with your spouse. You hand them each their portion of the 4 person optimized plan while the agent just looks on and nods – the 4 people zoom off to do their work. The 4 person optimized plan calls for 2 people of the people to each go to 3 stores and for the other 2 to only go to one store. Since you had all the facts up front, you provided the most efficient plan to accomplish your specific task for the specific number of resources you had at your disposal.
The results? The task is completed in 1/4 of the time allotted for your spouse alone and is also done in 1/2 the time taken by the ”regular” compiler plan. Why? Because you had an exact plan that was optimized for the specific number of resources for the specific task to be performed.
—–
Now, that is an extreme oversimplification and I even fudged some of the real workings of the two systems to better cram them into my silly analogy. But, I’m sure you get the point.
You can see how the Octopiler is much different than what currently exists today. It needs to be able to take high-level source code that outlines a unified memory model sequential program and output optimized binaries for a massive, heterogeneous multiprocessor system-on-a-chip. In fact, this task is so different from what exists today, that it prompted the Ars Technica folks to say this:
This isn’t just a tall order, or even a doctoral dissertation. It’s a generation’s worth of doctoral research. Meanwhile, the PS3 is due out in 2006.
I would say, “One long, expensive step for the PS3; One giant leap for mankind…”
And, that’s why I’m not a full time journalist, and the Ars guys are.





