Archive for the ‘linaro’ Category

April 15, 2014

OpenCL accelerated sqlite on Shamrock an open source CPU only driver

Within the GPGPU team Gil Pitney has been working on Shamrock which is an open source OpenCL implementation. It’s really a friendly fork of the clover project but taken in a bit of a new direction.

Over the past few months Gil has updated it to make use of the new MCJIT from llvm which works much better for ARM processors. Further he’s updated Shamrock so that it uses current llvm. I have a build based on 3.5.0 on my chromebook.

The other part about Gil’s Shamrock work is it will in time also have the ability to drive Keystone hardware which is TI’s ARM + DPSs on board computing solution. Being able to drive DSPs with OpenCL is quite an awesome capability. I do wish I had one of those boards.

The other capability Shamrock has is to provide a CPU driver for OpenCL on ARM. How does it perform? Good question!

I took my OpenCL accelerated sqlite prototype and built it to use the Shamrock CPU only driver. Would you expect that a CPU only OpenCL driver offloading SQL SELECT queries to be faster or would the  sqlite engine?

If you guessed OpenCL running on a CPU only driver, you’re right. Now remember the Samsung ARM based chromebook is a dual A15. The queries are against 100,000 rows in a single table database with 7 columns. Lower numbers are better and times

sql1 took 43653 microseconds
OpenCL handcoded-opencl/ Interval took 17738 microseconds
OpenCL Shamrock 2.46x faster
sql2 took 62530 microseconds
OpenCL handcoded-opencl/ Interval took 18168 microseconds
OpenCL Shamrock 3.44x faster
sql3 took 110095 microseconds
OpenCL handcoded-opencl/ Interval took 18711 microseconds
OpenCL Shamrock 5.88x faster
sql4 took 143278 microseconds
OpenCL handcoded-opencl/ Interval took 19612 microseconds
OpenCL Shamrock 7.30x faster
sql5 took 140398 microseconds
OpenCL handcoded-opencl/ Interval took 18698 microseconds
OpenCL Shamrock 7.5x faster

These numbers for running on the CPU are pretty consistent and I was concerned there was some error in the process. Yet the returned number of matching rows is the same for both the sqlite engine and the OpenCL versions which helps detect functional problems. I’ve clipped the result row counts from the results above for brevity.

I wasn’t frankly expecting this kind of speed up, especially with a CPU only driver. Yet there it is in black and white. It does speak highly of the capabilities of OpenCL to be more efficient at computing when you have data parallel problems.

Another interesting thing to note in this comparison, the best results achieved have been with the Mali GPU using vload/vstores and thus take advantage of SIMD vector instructions. On a CPU this would equate to use of NEON. The Shamrock CPU only driver doesn’t at the moment have support for vload/vstore so the compiled OpenCL kernel isn’t even using NEON on the CPU to achieve these results.

Posted in linaro, OpenCL, open_source | No Comments »

April 2, 2014

sq-cl code posted and some words about vectors

I’ve posted my initial OpenCL accelerated sqlite prototype code:

Don’t get excited. Remember, it’s a prototype and a quite contrived one at that. It doesn’t handle the general case yet and of course it has bugs. But!  It’s interesting and I think shows what’s possible.

Over at the mali developer community that ARM hosts. I happened to mention this work which in a post that ended up resulting in some good suggestions to use of vectors as well as other good feedback. While working with vectors was a bit painful due to the introduction of some bugs on my part, I made my way through it and have some initial numbers with a couple of kernels so I can get an idea just what a difference it makes.


The core of the algorithm for sql1 changes from:

    do {
        if ((data[offset].v > 60) && (data[offset].w < 0)) {
            resultArray[roffset].id = data[offset].id;
            resultArray[roffset].v = data[offset].v;
            resultArray[roffset].w = data[offset].w;
    } while (endRow);


    do {
        v1 = vload4(0, data1+offset);
        v2 = vload4(0, data2+offset);
        r = (v1 > 60) && ( 0 > v2);
        vstore4(r,0, resultMask+offset);
    } while (totalRows);

With each spin through the loop, the vectorized version of course is operating over 4 values at once to check for a match. Obvious win. To do this the data has to come in in pure columns and I’m using an vector as essentially a bitmask to indicate if that row is a match or not. This requires a post processing loop to spin through and assemble the resulting data into a useful state. For the 100,000 row database I’m using it doesn’t seem to have as much of a performance impact as I thought it might.

For the first sql1 test query the numbers look like this:

CPU sql1 took 43631 microseconds
OpenCL sql1  took 14545 microseconds  (2.99x or 199% better)
OpenCL (using vectors) 4114 microseconds (10.6x better or 960%)

Not bad. sql3 sees even better results:

CPU sql3 took 111020 microseconds
OpenCL sql3 took 44533 microseconds (2.49x  or 149% better)
OpenCL (using vectors) took 4436 microseconds (25.02x or 2402% better)

There’s another factor why these vectorized versions are doing better. With the newer code I am using less registers on the Mali GPU and thus am able to up the number of work units from 64 to 128.

I do have one bug that I need to track down. I am (of course) validating that all the versions are coming up with the same matches. The new vector versions are off by a couple of rows. The missing rows don’t seem to follow a pattern. I’m sure I’ve done something dumb. Now that there is the ability for more eyes on the code perhaps someone will spot it.

Posted in linaro, OpenCL, open_source | No Comments »

March 27, 2014

Linaro 14.03 Release Now Available for Download!

When you have a great and difficult task, something perhaps almost impossible, if you only work a little at a time, every day a little, suddenly the work will finish itself.  ~  Isak Dinesen (Karen Blixen)

See the detailed highlights of this release to get an overview of what has been accomplished by the Working Groups, Landing Teams and Platform Teams. The release details are linked from the Details column for each released artifact on the release information:

This post includes links to more information and instructions for using the images. The download links for all images and components are available on our downloads page:


The Android-based images come in three parts: system, userdata and boot. These need to be combined to form a complete Android install. For an explanation of how to do this please see:

If you are interested in getting the source and building these images yourself please see the following pages:


The Ubuntu-based images consist of two parts. The first part is a hardware pack, which can be found under the hwpacks directory and contains hardware specific packages (such as the kernel and bootloader). The second part is the rootfs, which is combined with the hardware pack to create a complete image. For more information on how to create an image please see:


With the Linaro provided downloads and with ARM’s Fast Models virtual platform, you may boot a virtual ARMv8 system and run 64-bit binaries.  For more information please see:


More information on Linaro can be found on our websites:

Also subscribe to the important Linaro mailing lists and join our IRC channels to stay on top of Linaro developments:


For any errata issues, please see:

Bug reports for this release should be filed in Launchpad against the individual packages that are affected. If a suitable package cannot be identified, feel free to assign them to:


Registration for Linaro Connect USA (LCU14), which will be in Burlingame, California from September 15 – 19, 2014 is now open.  More information on this event can be found at:

Posted in android, big.little, connect, CortexA8, CortexA9, Evaluation builds, kernel, linaro, Linaro Connect, linux, Linux on ARM, release cycle, Releases, Toolchain, ubuntu | No Comments »

March 26, 2014

LAVA packages for Debian

Packaging LAVA for Debian unstable

with notes on other distributions

I’ve been building packages for LAVA on Debian unstable for several months now and I’ve been running LAVA jobs on the laptop and on devices in my home lab and on an ARMv7 arndale too.

Current LAVA installations use lava-deployment-tool which has only supported Ubuntu 12.04 LTS Precise Pangolin. There has been a desire in LAVA to move away from a virtual environment, to put configuration files in FHS compliant paths, to use standard distribution packages for dependencies and so to make LAVA available on more platforms than just precise. Packaging opens the door to installing LAVA on Debian, Ubuntu, Fedora and any other recent distribution. Despite LAVA currently being reliant on 12.04 Precise, some of the python dependencies of LAVA have been able to move forward using the virtual environment provided by builtout and pypi. This means that LAVA, as packaged, requires a newer base OS suite than precise – for Ubuntu, the minimal base is Saucy Salamander 13.10 and for Debian it would be Jessie (testing) although there is currently a transition ongoing in Debian which means that uwsgi is not in testing and Debian unstable would be needed instead.

The work to migrate configuration snippets out of deployment-tool and to ensure that the tarball built using setuptools contains all of the necessary files for the package has already been done. The packaging itself is clean and most of the work is done upstream. There is, as ever, more to do but the packages work smoothly for single install LAVA servers where the dispatcher is on the same machine as the django web frontend.

The packages have also migrated to Django1.6, something which is proving difficult with the deployment-tool as it has not kept pace with the changes outside the virtual environment, even if other parts of LAVA have.

LAVA will be switching to packages for installation instead of deployment-tool and this will mean changes to how LAVA works outside the Cambridge lab. When the time comes to swich to packaging, the plan is to update deployment-tool so that it no longer updates /srv/lava/ but instead migrates the instance to packages.

Main changes

  1. Configuration files move into /etc/

    • Device configuration files /etc/lava-dispatcher/devices/

    • Instance configuration files/etc/lava-server/

  2. Log files move into /var/log/

    • Adding logrotate support – no more multi-Gb log files in /srv/lava/

  3. Commitment to keeping the upstream code up to date with dependencies

  4. Support for migrating existing instances, using South.

  5. Packaging helpers

    • add devices over SSH instead of via a combination of web frontend and SSH.

    • Developer builds with easily identifiable version strings, built as packages direct from your git tree.

  6. New frontend

    • Although django1.6 does not change the design of the web frontend at all, LAVA will take the opportunity to apply a bootstrap frontend which has greater support for browsers on a variety of devices, including mobile. This also helps identify a packaged LAVA from a deployment LAVA.

  7. Documentation and regular updates

The Plan

LAVA has made regular releases based on a monthly cycle and these will be provided as source tarballs at for distributions to download. The
official monthly release and any intervening updates will be made available for distributions to use for their own packaging. Additionally, Debian packages will be regularly built for use within LAVA and these will be available for those who choose to migrate from Ubuntu Precise to Debian Jessie. LAVA will assist maintainers who want to package LAVA for their distributions and we welcome patches from such maintainers. This can include changes to the developer build support script to automate the process of supporting development outside LAVA.

Initially, LAVA will migrate to packaging internally, to prove the process and to smooth out the migration. Other LAVA instances are welcome to follow this migration or wait until the problems have been ironed out.

The Issues

  • Remote workers – this is work to be completed during the migration to packaging within LAVA as well as pending work upstream on the internals of the connection between a remote worker and the master scheduler. Expect some churn in this area whilst the code is being finalised. A lava-worker package is being prepared which borrows enough code from lava-server to run the lava-scheduler-daemon on the remote worker, until such time as the remote worker communications are refactored.

  • Integration – there are plans to integrate some of the modules which LAVA uses which are not commonly packaged: linaro-dashboard-bundle and linaro-django-xmlrpc

  • OpenID – the certificates which underpin OpenID use with Launchpad have been removed in Debian and LAVA is currently investigating alternatives.

  • json-schema-validator currently has a broken test suite, so this will need to be patched by LAVA to allow the package to build. The code may be replaced with a different validator if the issue persists.

  • namespace handling – the current package install is unnecessarily noisy with complaints about the lava namespace. This will need a fix but does not affect current operation.

  • Unit tests – LAVA is working hard to add a much larger coverage of internal unit tests. These use a temporary database which is not generally available during the build of a distribution package. LAVA is already running continuous integration tests to ensure that these tests continue to pass and the packages will gain documentation on how to run these tests after installation.

Build Dependencies

For Debian unstable, the list of packages which must be installed on your Debian system to be able to build packages from the lava-server and lava-dispatcher source code trees are:

debhelper (>= 8.0.0) python | python-all | python-dev | python-all-dev 
python-sphinx (>= 1.0.7+dfsg) | python3-sphinx python-mocker 
python-setuptools python-versiontools

(python-versiontools may disappear before the packages are finalised)

In addition, to be able to install lava-server, these packages need to be built from tarballs released by Linaro (the list may shorten as changes upstream are applied)

python-django-restricted-resource (>= 0.2.7), 
lava-tool (>= 0.2), lava-utils-interface (>= 1.0), 
linaro-django-xmlrpc (>= 0.4),
python-versiontools (>= 1.8),
linaro-dashboard-bundle (>= 1.10.2), 
lava-dispatcher (>= 0.33.3)
lava-coordinator, lava-server-doc

The list for lava-dispatcher is currently:

python-json-schema-validator, lava-tool (>= 0.4), 
lava-utils-interface, linaro-dashboard-bundle (>= 1.10.2), 

The packages available from my experimental repository are using a new packaging branch of lava-server and lava-dispatcher where we are also migrating the CSS to Bootstrap CSS.

Installing LAVA on Debian unstable

$ sudo apt-get install emdebian-archive-keyring

Add the link to my experimental repository (amd64, i386, armhf & arm64) to your apt sources, e.g. by creating a file /etc/apt/sources.list.d/lava.list containing:

deb sid main

Update with the new key

$ sudo apt-get update

It is always best to install postgresql first:

$ sudo apt-get install postgresql

There are then three options for the packages to install (Please be careful with remote worker setup, it is not suitable for important installations at this time.):

  1. Single instance, server and dispatcher with recommended tools
    apt-get install lava
    Installs linaro-image-tools and guestfs tools.

  2. Single instance, server and dispatcher
    apt-get install lava-server
    Installs enough to run LAVA on a single machine, running jobs on boards on the same LAN.

  3. Experimental remote worker support
    apt-get install lava-worker
    Needs a normal lava-server installation to act as the master scheduler but is aimed at supporting a dispatcher and boards which are remote from that master.

The packages do not assume that your apache2.4 setup is identical to that used in other LAVA installations, so the LAVA apache config is installed to /etc/apache2/sites-available/ but is not enabled by default. If you choose to use the packaged apache config, you can simply run:

$ sudo a2ensite lava-server
$ sudo apache2ctl restart

(If this is a fresh apache install, use a2dissite to disable to default configuration before restarting.)

Information on creating a superuser, adding devices and administering your LAVA install is provided in the README.Debian file in lava-server:

$ zless /usr/share/doc/lava-server/README.Debian.gz

Provisos and limitations

Please be aware that packaged LAVA is still a work-in-progress but do let us know if there are problems. Now is the time to iron out install bugs and other issues as well as to prepare LAVA packages for other distributions.

It will be a little while before the packages are ready for upload to Debian – I’ve got to arrange the download location and upload the dependencies first – and some of that work will wait until more work has gone in upstream to consolidate some of the current dependencies.

Posted in linaro | No Comments »

March 25, 2014

Using bmaptool To Create A Memory Card

Here's the scenario: I have just used OE to build a core-image-minimal which I want to run on my Wandboard-dual, I insert my 4GB microSD card into my desktop, use dd to write the image to the card, insert the card into my board, boot, and get:

Size=62.0M Used=19.1M

But it's a 4GB card?! Where's the rest of my disk?

OE has no idea how big of a card you want to use, so by default it makes an image that is just a bit bigger than required (or 8MB, whichever is larger).

Writing this small image is quick:


If I want to use (roughly) the entire 4GB card I simply ask OE to build an image of that size. Edit conf/local.conf and add/edit:
Now when I build my image, the output from OE will be roughly 3.7GB in size. Writing this image to a card will take much longer:


The funny thing is, the data hasn't changed; I'm still using the same amount of data on the card. What has changed is that I now have access to (roughly) the entire card, but at the cost of having it take ~160 times longer to write the image!

Size=3.4G Used=86.9M

In this case we're wasting lots of time (and flash write cycles) writing empty portions of the image to the disk. This is where the bmaptools come in. In essence, bmaptool looks at your image and determines which parts are important, and which are empty. When you go to actually write your image, only the non-empty parts are transferred -- saving you lots of write time (and flash cycles).

Using bmaptool is a two-step process:
  1. use bmaptool create to create a mapping file
  2. use bmaptool copy to write your image to a disk (with the help of the mapping file you just created)
Applying bmaptool to our 4GB image:


It's not the 18 seconds from above (i.e. dd'ing the 80MB image), but it's still better than the 49 minutes required to dd the 4GB image. The image written with bmaptool works:

Size=3.4G Used=86.9M

Note that if I use bmaptool on the first (80MB) image, there isn't much savings:


The real benefits are seen when trying to write an image such that most of the card is then available for use, and most of the image to be written is empty.
Posted in linaro, open-embedded | No Comments »