Intelligent Systems Lab Project: Image processing on an embedded system

Participants

Christian Ascheberg
Markus Lux

Project Supervisors

Dr.-Ing. Thorsten Jungeblut
Dipl.-Ing. Daniel Klimeck
Dipl.-Ing. Marten Vohrmann

Abstract

We implemented a simple face recognition algorithm on the CoreVA processor which aims to consume much less energy than a modern CPU.
To evaluate our results we measured and calculated the energy consumption of our implementation on the CoreVA and compared its efficiency to a commercially available notebook processor.

Motivation

These days, an image processing system requires much computational power and therefore much energy.
With the autonomy and the constrained size of embedded systems the demand for low energy consumption and small space cannot be satisfied by general purpose hardware.
With these problems in mind the CoreVA processor was developed to be used in an embedded environment. Not only because of its parallel VLIW architecture it is very energy efficient.

Application Scenario

There are numerous applications of energy efficient image processing units. An example would be the use in a swarm of mini robots which may need to recognize and classify objects of interest (e.g. human faces, other robots) in order to collaborate with them. Another more real world example of such a system would be the integration into mobile medical devices carried around by doctors in a hospital. These devices could be able to detect visual patterns of diseases and help to quickly analyze regions of interest.

Objectives

The project goals are:

Implementation of the eigenfaces algorithm for face detection on the supplied CoreVA hardware.
Multiple pre-processing steps covering a subset of image processing algorithms.
Keeping the power consumption for this tasks at a minimum level.

Description

The CoreVA resource-efficient VLIW processor

The CoreVA is a processor that was developed by the Cognitronics and Sensor Systems Group / CITEC Bielefeld. It is part of the family of VLIW (very large instruction word) processors, which means it can take advantage of instruction level parallelism determined at compile time.
We won't go too much into detail here. More information can be found on the website.

Setup and architecture

Click for bigger version

In real world usage the CoreVA would be part of an embedded system. For development purposes it is placed onto a host system, connected via a RAPTOR board. The host system is responsible for computing the training data of the eigenfaces algorithm. It is also used for generating input data which it sends to the processor. Furthermore it reads the data sent back by the processor and converts it back to an image that contains marked faces.

Face detection process

The detection process consists of the following steps:

skin-locus detection
greyscale conversion
segmentation of skin-colored regions
discarding of wrongly sized regions
eigenface detection on the remaining regions

Challenges

While developing the components for our face detection system, we were confronted with a few challenges specific to the architectural environment.

file I/O only via FIFO-Buffers
limited memory
debugging at hardware level
prototypical hardware implementation
floating point unit not available

Results

Main goals

We reached the goal of detecting and marking multiple faces in an image. One testing image showed a group of people. In a processed image white rectangles represent detected faces, black rectangles contain skin-colored regions and grey rectangles got rejected by the detector. In this sample 14 out of 17 total faces got recognized by our program. There were three false positives.

Efficiency

We compared energy consumption of our program running on the CoreVA processor with a commercially available notebook processor and obtained the following results:

	CoreVA, 100mW consumption	Intel P8400, 25W TDP
Energy consumed:	250mJ	45mJ
Runtime:	0.01s	0.45s

Keep in mind that this comparison is not fully representative but gives a rough estimation of real values. The executable on the notebook was compiled with the -O3 switch - without it would be 4 times less efficient.

Discussion and Conclusion

We were able to get the CoreVA up and running, which includes data transfer, a set of image processing methods and face detection. On this system we obtained good results on a non-optimized code base which forms a solid basis for future development. The eigenfaces algorithm is no more state-of-the-art but with this we could show that further work can yield better results using more sophisticated algorithms.

Outlook

As already mentioned more robust object detection algorithms can be implemented on this hardware. Beside this there is there are other areas of enhancement or improvement:

TFT-display on the RAPTOR board
ethernet connection to the processor
realtime object detection in video streams
using newly ported GCC for compilation of programs

Navigation

Activities