April 18, 2014

Initial Machine Generated OpenCL for sqlite

I’ve been working on moving the OpenCL accelerated sqlite prototype toward being able to support the general case instead of just the contrived set of initial SQL SELECTs.

First, why did I have to start out with a contrived set of SQL SELECTs to accelerate? Consider:

SELECT id, uniformi, normali5 FROM test WHERE uniformi > 60 AND normali5 < 0

For a query we need to have the equivalent in OpenCL. For the prototype I hand coded these OpenCL kernels and called the kernels with the data as obtained from the sqlite infrastructure.  I had to start somewhere. A series of SQL statements to try and shake out patterns for generation I thought would be the best path to validate this idea.

The next evolutionary step is to generate an OpenCL kernel by reading the parse tree that sqlite generates as it pulls apart the SQL statement.

This is what a machine generated kernel looks like for previously mentioned SQL statement:

__kernel void x2_entry (__global int * id, __global int * uniformi, __global int * normali5, __global int * _cl_resultMask) {
__private int4 v0;
__private int4 v1;
__private int4 v2;
__private int4 _cl_r;
int i = get_global_id(0);
size_t offset = i * (totalRows/workUnits);
do {
v0 = vload4(0, id + offset);
v1 = vload4(0, uniformi + offset);
v2 = vload4(0, normali5 + offset);
_cl_r = (( uniformi  >  60 ) && ( normali5  <  0 ));
vstore4(_cl_r, 0, _cl_resultMask + offset);
} while(totalRows);

Why are we generating OpenCL kernel code there? Isn’t there a better way? Well there is. In later versions of the OpenCL standard (and HSA) there is something called an intermediate representation (IR) form which is very much akin to what compilers translate high level languages to before targeting the native instruction set of whatever that code will run on.

Unfortunately OpenCL’s IR otherwise known as SPIR isn’t available to us since the OpenCL drivers for ARM’s Mali currently don’t support it. Imagination’s PowerVR doesn’t either. (Heck Imagination requires an NDA to be signed to even get there drivers, talk about unfriendly!)  They might someday but that day isn’t today. Likewise HSA has an IRA as part of it’s standard called HSAIL.

Either one would be much better to emit of course presuming that the OpenCL drivers could take that IR as input.

None the less, as soon as I have “parity” with the prototype and a little testing I’ll commit the code that machine generates these OpenCL kernels to git. I’m getting close. The next step after that will be to make a few changes internal to sqlite use those kernels.

Posted in OpenCL, sqlite | No Comments »

April 15, 2014

OpenCL accelerated sqlite on Shamrock an open source CPU only driver

Within the GPGPU team Gil Pitney has been working on Shamrock which is an open source OpenCL implementation. It’s really a friendly fork of the clover project but taken in a bit of a new direction.

Over the past few months Gil has updated it to make use of the new MCJIT from llvm which works much better for ARM processors. Further he’s updated Shamrock so that it uses current llvm. I have a build based on 3.5.0 on my chromebook.

The other part about Gil’s Shamrock work is it will in time also have the ability to drive Keystone hardware which is TI’s ARM + DPSs on board computing solution. Being able to drive DSPs with OpenCL is quite an awesome capability. I do wish I had one of those boards.

The other capability Shamrock has is to provide a CPU driver for OpenCL on ARM. How does it perform? Good question!

I took my OpenCL accelerated sqlite prototype and built it to use the Shamrock CPU only driver. Would you expect that a CPU only OpenCL driver offloading SQL SELECT queries to be faster or would the  sqlite engine?

If you guessed OpenCL running on a CPU only driver, you’re right. Now remember the Samsung ARM based chromebook is a dual A15. The queries are against 100,000 rows in a single table database with 7 columns. Lower numbers are better and times

sql1 took 43653 microseconds
OpenCL handcoded-opencl/sql1.cl Interval took 17738 microseconds
OpenCL Shamrock 2.46x faster
sql2 took 62530 microseconds
OpenCL handcoded-opencl/sql2.cl Interval took 18168 microseconds
OpenCL Shamrock 3.44x faster
sql3 took 110095 microseconds
OpenCL handcoded-opencl/sql3.cl Interval took 18711 microseconds
OpenCL Shamrock 5.88x faster
sql4 took 143278 microseconds
OpenCL handcoded-opencl/sql4.cl Interval took 19612 microseconds
OpenCL Shamrock 7.30x faster
sql5 took 140398 microseconds
OpenCL handcoded-opencl/sql5.cl Interval took 18698 microseconds
OpenCL Shamrock 7.5x faster

These numbers for running on the CPU are pretty consistent and I was concerned there was some error in the process. Yet the returned number of matching rows is the same for both the sqlite engine and the OpenCL versions which helps detect functional problems. I’ve clipped the result row counts from the results above for brevity.

I wasn’t frankly expecting this kind of speed up, especially with a CPU only driver. Yet there it is in black and white. It does speak highly of the capabilities of OpenCL to be more efficient at computing when you have data parallel problems.

Another interesting thing to note in this comparison, the best results achieved have been with the Mali GPU using vload/vstores and thus take advantage of SIMD vector instructions. On a CPU this would equate to use of NEON. The Shamrock CPU only driver doesn’t at the moment have support for vload/vstore so the compiled OpenCL kernel isn’t even using NEON on the CPU to achieve these results.

Posted in linaro, OpenCL, open_source | No Comments »

April 12, 2014

Building Android KitKat on OS X Mavericks

I run OSX on my laptop. (gasp!) I ssh into my various linux boxes to work on various projects. As I’m doing a little work with Renderscript and my sqlite acceleration project I thought it would be handy to build Android on my OS X laptop. Turns out it’s not entirely difficult and required just one fix to the code.


There are several projects to bring various linux/unix tools onto OSX. I use MacPorts. Brew is probably another good option. Either way this gives us a foundation of tools that the android build system is going to need.

The install instructions offer an extra easy pkg option.

Next we need to install some software.

sudo port install coreutils findutils pngcrush gsed gnupg


Xcode is of course Apple’s development environment for OSX and iOS. You need it, and it can be installed directly out of the App Store.


Make sure you have java installed.

java -version
java version "1.6.0_65"

If you don’t, you’ll get a popup dialog that will ask if you want to install it. Do!


Make sure you have python installed. If I recall correctly that’s a default install with OSX Mavericks.  There is an option to install via ports.

sudo port install python


Pull down repo.

curl http://commondatastorage.googleapis.com/git-repo-downloads/repo > ~/bin/repo

Make sure you add your ~/bin to your PATH

export PATH="$PATH:~/bin"

Android SDK tools

You need to download the android sdk tools built for the Mac. Download these from here. Extract. At this point I created an android directory and put the tools inside of it.

mkdir -p ~/android
mv <whereever>/android-sdk  ~/android

Filesystem setup

OSX for all it’s joys doesn’t deal with case differences in it’s file system unless you specifically created the file system to do so. The default doesn’t. It’s not 8.3, but it’s still 1990s lame. So you’ll need to create a file system for the Android source code to live in.

Make sure you have the space in your file system. I created a 100 gig file system. I wouldn’t go below 50. I also put this onto my desktop. Makes it easy to double click later to mount it. Feel free to mount it where it works best for you. However remember this location!

hdiutil create -type SPARSE -fs "Case-sensitive Journaled HFS+" -size 100g -volname "android" -attach ~/Desktop/Android

Android source code

Download as you normally would. (note the cd to the location of where you just attached the new HFS case sensitive file system.

cd ~/Desktop/Android
git clone http://android.googlesource.com/platform/manifest.git
git branch -r   // this will show you all the branch options. I was after the latest.
repo init -u git://android.git.kernel.org/platform/manifest.git  -b android-4.4_r1.2
repo sync

Environment Setup

We need to setup a few environment variables. First add the android sdk tools to your path

export PATH=~/android/android-sdk/sdk/platform-tools:$PATH
export LC_CTYPE=C
export LANG=C

The One Fix

So in jni_generator.py there is a slight issue where it doesn’t handle that situation where one of the tool parameters isn’t available. So we need to defensively work around it. (yeah yeah I should just post the patch)

In external/chromium_org/base/android/jni_generator/jni_generator.py

At the top of the file (around line 20) add

import platform

Then lower down add the following if to check for Darwin so that -fpreprocessed isn’t passed:

531   def _RemoveComments(self, contents):
532     # We need to support both inline and block comments, and we need to handle
533     # strings that contain '//' or '/*'. Rather than trying to do all that with
534     # regexps, we just pipe the contents through the C preprocessor. We tell cpp
535     # the file has already been preprocessed, so it just removes comments and
536     # doesn't try to parse #include, #pragma etc.
537     #
538     # TODO(husky): This is a bit hacky. It would be cleaner to use a real Java
539     # parser. Maybe we could ditch JNIFromJavaSource and just always use
540     # JNIFromJavaP; or maybe we could rewrite this script in Java and use APT.
541     # http://code.google.com/p/chromium/issues/detail?id=138941
542     system = platform.system()
543     if system == 'Darwin':
544       cpp_args = ['cpp']
545     else:
546       cpp_args = ['cpp', '-fpreprocessed']
547     p = subprocess.Popen(args=cpp_args,
548                          stdin=subprocess.PIPE,
549                          stdout=subprocess.PIPE,
550                          stderr=subprocess.PIPE)
551     stdout, _ = p.communicate(contents)

Ready To Build

That’s it. Least I hope I captured everything I had to do. Build away.

Posted in android | No Comments »

April 10, 2014

Q1 2014 CEO Note

The announcement of five new members at Linaro Connect, including Club members Qualcomm, MediaTek and ZTE, is a significant endorsement of the value of Linaro’s business model, and further increases the ROI for every one of our members.

By coincidence the Linux Foundation recently published a report on collaborative engineering subtitled:  “Companies increase collaborative development, view as essential to success”.  The report points out that technology companies face enormous pressure to innovate faster and cut costs. A survey carried out for the Linux Foundation shows that professionals in a wide variety of leading technology companies recognize that collaborative software development is increasing in their organization and across industries, and indeed that it’s becoming central to their company’s mission.

Key findings are that:

  • Companies get involved in collaborative software development to advance business objectives and to be part of industry innovation
  • Investments in collaborative software development are on the rise

  • Individual developers and businesses both benefit from the trend toward collaboration
  • Business needs are driving increased adoption of collaborative development practices

This is a further strong validation of Linaro’s approach. Rather than each company itself developing non-value adding but key technology, duplicating effort and in many cases complicating and delaying the open source process, Linaro offers a different path. Instead, our members come together to engineer open source software to meet their common product needs once. The result is significantly lower costs for everyone and a higher quality deliverable given the shared effort and knowledge that has been utilized. Furthermore, using the same code base reduces maintenance costs and can accelerate time to market. Using this open source software as the basis of member’s products, coupled with the knowledge gained from being part of the development process, enables Linaro members to focus their own resources on their own value add and accelerates their ability to innovate in their chosen market segment(s). The Linux Foundation report supports the idea that members derive substantially more value from being inside Linaro than outside.

The current work on ARMv8 inside Linaro is a good example of the results that collaborative engineering can deliver. Many engineers in Linaro are working on ARMv8 projects, from boot architecture to virtualization, from kernel device drivers to Java, and the results speak for themselves. This effort is delivering code that all members with ARMv8 projects will be using.

While Linaro’s work is deliberately carried out as new work in the relevant open source projects (for example currently at the Linux 3.14 mainline Linux kernel tree) most members cannot easily leverage this new work directly into products. To address this challenge members asked Linaro to provide a stable kernel, based on the Linux Long Term Supported (LTS) kernel, but including backports of the latest stable Linaro technology.

The Linaro Stable Kernel (LSK) is based on the most recent LTS release (currently Linux 3.10). Today the LSK (base and Android versions) includes many of the latest ARMv8 patches from later kernels backported by both ARM into LTS (and therefore reflected into LSK) and by Linaro directly into the LSK. At present this functionality can only be tested on models, and in a limited manner on member hardware that we have in house under the terms of a restricted use license.

As additional member hardware becomes available to Linaro we are committed to having the LSK support the functionality, quality and stability that members need for their initial products. Of course a balance between stability and new features must always be made, but our goal is to enable our members to use the LSK as the basis of their products now. This will be a significant demonstration of the benefits of collaborative engineering that have been advocated by Linaro and the Linux Foundation.

1 Linux Foundation Collaborative Development Trends Report 2014

The post Q1 2014 CEO Note appeared first on Linaro.

Posted in armv8, connect, kernel, Linaro Blog, linux, Linux on ARM, LSK | No Comments »

April 5, 2014

A day in the life…

This video is so perfect... I could literally write a book about it. If you're studying to be a software developer and are wondering what your future will hold, don't imagine your life as the next Zuckerberg, watch this video. Watch this video until it is no longer funny, because it is not funny, this will be your career:

What it’s like to be an engineer in a sales meeting
Posted in Uncategorized | No Comments »