Intelligent Systems Lab Project: Image processing on an embedded system
Participants
- Christian Ascheberg
- Markus Lux
Project Supervisors
- Dr.-Ing. Thorsten Jungeblut
- Dipl.-Ing. Daniel Klimeck
- Dipl.-Ing. Marten Vohrmann
Abstract
- We implemented a simple face recognition algorithm on the CoreVA processor which aims to consume much less energy than a modern CPU.
- To evaluate our results we measured and calculated the energy consumption of our implementation on the CoreVA and compared its efficiency to a commercially available notebook processor.
Motivation
- These days, an image processing system requires much computational power and therefore much energy.
- With the autonomy and the constrained size of embedded systems the demand for low energy consumption and small space cannot be satisfied by general purpose hardware.
- With these problems in mind the CoreVA processor was developed to be used in an embedded environment. Not only because of its parallel VLIW architecture it is very energy efficient.
Application Scenario
There are numerous applications of energy efficient image processing units. An example would be the use in a swarm of mini robots which may need to recognize and classify objects of interest (e.g. human faces, other robots) in order to collaborate with them. Another more real world example of such a system would be the integration into mobile medical devices carried around by doctors in a hospital. These devices could be able to detect visual patterns of diseases and help to quickly analyze regions of interest.
Objectives
The project goals are:- Implementation of the eigenfaces algorithm for face detection on the supplied CoreVA hardware.
- Multiple pre-processing steps covering a subset of image processing algorithms.
- Keeping the power consumption for this tasks at a minimum level.
Description
The CoreVA resource-efficient VLIW processor
The CoreVA is a processor that was developed by the Cognitronics and Sensor Systems Group / CITEC Bielefeld. It is part of the family of VLIW (very large instruction word) processors, which means it can take advantage of instruction level parallelism determined at compile time.We won't go too much into detail here. More information can be found on the website.
Setup and architecture
Click for bigger version |
Click for bigger version |
In real world usage the CoreVA would be part of an embedded system. For development purposes it is placed onto a host system, connected via a RAPTOR board. The host system is responsible for computing the training data of the eigenfaces algorithm. It is also used for generating input data which it sends to the processor. Furthermore it reads the data sent back by the processor and converts it back to an image that contains marked faces.
Face detection process
The detection process consists of the following steps:- skin-locus detection
- greyscale conversion
- segmentation of skin-colored regions
- discarding of wrongly sized regions
- eigenface detection on the remaining regions
Challenges
While developing the components for our face detection system, we were confronted with a few challenges specific to the architectural environment.- file I/O only via FIFO-Buffers
- limited memory
- debugging at hardware level
- prototypical hardware implementation
- floating point unit not available
Results
Main goals
We reached the goal of detecting and marking multiple faces in an image. One testing image showed a group of people. In a processed image white rectangles represent detected faces, black rectangles contain skin-colored regions and grey rectangles got rejected by the detector. In this sample 14 out of 17 total faces got recognized by our program. There were three false positives.Efficiency
We compared energy consumption of our program running on the CoreVA processor with a commercially available notebook processor and obtained the following results:CoreVA, 100mW consumption | Intel P8400, 25W TDP | |
---|---|---|
Energy consumed: | 250mJ | 45mJ |
Runtime: | 0.01s | 0.45s |
Keep in mind that this comparison is not fully representative but gives a rough estimation of real values. The executable on the notebook was compiled with the -O3 switch - without it would be 4 times less efficient.
Discussion and Conclusion
We were able to get the CoreVA up and running, which includes data transfer, a set of image processing methods and face detection. On this system we obtained good results on a non-optimized code base which forms a solid basis for future development. The eigenfaces algorithm is no more state-of-the-art but with this we could show that further work can yield better results using more sophisticated algorithms.Outlook
As already mentioned more robust object detection algorithms can be implemented on this hardware. Beside this there is there are other areas of enhancement or improvement:- TFT-display on the RAPTOR board
- ethernet connection to the processor
- realtime object detection in video streams
- using newly ported GCC for compilation of programs