Oct 7, 2011

What this blog is about and final project pitch

Dear visitors,

This semester I'm taking the CIS565 course in Penn taught by Joe Kider. Joe is a wonderful tutor and he gave us various options to choose for the final project. And we're required to write blogs to update our development progress. And that's why this blog is born.

I thought about the final project for several weeks. I've been strongly interested in iOS development for 2 years, so I have the idea of combining iOS dev with GPU. Right now, the graphics processor in iOS devices is getting better and better for generations. In iPhone 4, Apple used a PowerVR SGX 535 GPU embedded in their A4 chip, which had decent graphics performance. This article compared iPhone 4 with the popular game handhelds, and iPhone 4 is obviously outstanding. Here's a chart below from Wikipedia describing the technical details of the iPhone GPU:

ModelYearDie Size (mm2)[1]Config core[2]Fillrate (@ 200 MHz)Bus width (bit)API (version)GFLOPS(@ 200 MHz)
MTriangles/s[1]MPixel/s[1]DirectXOpenGL
SGX520 Jul 2005 2.6@65 nm 1/1 7 250 64 N/A N/A 0.8?
SGX530 Jul 2005 7.2@90 nm 2/1 14 500? 64 N/A 2.0 1.6
SGX531 Oct 2008 ? 2/1 14? 500? 128 N/A N/A 1.6?
SGX535 Nov 2007 ? 2/2 14 1000? 64 9.0c 2.1 1.6
SGX540 Nov 2007 ? 4/2 28 1000 64 N/A 2.1 3.2
SGX545 Jan 2010 12.5@65 nm 4/2 40 1000 64 10.1 3.2 3.2?

We can learn from above, though mobile GPUs might not have stunning GFLOPS as desktop GPUs, they're already capable of handling a lot of things. Since 2007, the Khronos Group introduced OpenGL ES 2.0, which had a major upgrade over OpenGL ES 1.1. OpenGL ES 2.0 eliminates most of the fixed function rendering pipeline in favor of a programmable one. Almost all rendering features of the transform and lighting pipelines, such as the specification of materials and light parameters formerly specified by the fixed-function API, are replaced by shaders written by the graphics programmer. So this opened up all the possibility for the programmers to leverage the power of the iPhone GPU.

Right now, there're tons of apps on the iOS platform. Last summer, in my internship in SAP US Newtown Square, I was doing iOS development under Martin Lang. Martin always showed some cool apps to fellow colleagues, and two of them are very impressive in my opinion. One is called Layar, which shows the names and details information of the building around you, in your camera scene

Mzl pddncsgw 320x480 75

The other app is called WordLens. This app detects text information from the camera in real time and translate it, which is really amazing

Wordlens

However, none of these apps are open source, and they're not using GPU to do the calculation. So here my idea comes around. I wanted to do an Augmented Reality app on the iOS, and leveraging the GPU at the same time.

Apple had done such thing in their latest iOS 4.3. When apple introduced iPad 2, they brought Photo Booth from Mac to iOS.

20110302-10373721-img4617.jpg

Photo Booth uses Core Image API to process the video frames, which have GLSL underneath. So, GPU powered video processing is doable on iOS.

About what I want to implement specifically, I read this paper from CVPR, and the algorithm introduced inside is cool and efficient. Therefore I want to do it on the iOS, which use the GPU to process the video frames, and fetching texts from within the frames. Ideally if I can speak out in iOS with a TTS(Text-to Speech) engine, that would be perfect, but right now I have no clue whether that's possible.

So my plan for the final project is listed below:

1. A simple app on the iPhone just to capture the video, and get the video frames.

2. Implement the algorithms from the paper, get the texts within frames

3. Speed up Step 2 with GLSL/Core Image to get better frame rates, ideally real time performance

And I think my project might be very useful in the future, like bind people can use their iPhones to read information from public places, and it's fun to play with for a normal person

That's it. Also suggested by the final project write-up, I use the twitter account nlyrics2 for updating news. Wish myself good-luck in future development.

1 comment:

Patrick Cozzi said...

I made a similar comment on Robin's blog, but if you find the iPhone's GPU is not fast enough, you could also implement this by sending frames to a server with a beefy GPU running CUDA. You'll have to balance the network latency vs. the speedup gained. On Android, this also has the advantage that all phones will run at a similar speed even though their GPUs are all over the place. On iPhone, this is less of a concern because the GPUs are much less fragmented, and it sounds like you are targeting a particular one.