Introduction: Edge Vision AI Development
When you look at the current landscape of vision-based AI solutions in IoT, you would find that more and more companies are using neural networks to process image data at the edge. Processing image data at the edge greatly lowers latency, network traffic, and privacy risks. But what is ‘edge’? Edge could mean something different depending on who you ask. In this blog, ‘edge’ is referring to where small computing units with sensors are deployed, and it is the first place where sensing data could possibly be processed after generation. In this type of edge environment, these devices are usually microcontroller units (MCU) running Real Time Operating Systems (RTOS), and are severely resource constrained. Thus, the environment where these devices are deployed is often referred to as ‘constrained edge’. And for vision AI processing, these constrained edge devices must be equipped with an image sensor, and are commonly used as IoT cameras.
But building an edge vision AI solution for these devices is extremely challenging for most developers, where it requires developing both a neural network model and an application highly optimized to run in an extremely constrained environment. This blog provides a high-level summary of what these challenges are, and how the combination of WebAssembly as a sandboxing technology and an AI-capable image sensor is the key to overcoming them.
This blog is about application development with WebAssembly, but there will be another blog following this one that addresses the challenges around the creation of the model.
Challenges in Developing Edge Vision AI Applications
1. Resource Constraints
In order to develop applications to run on these tiny devices, you would face the typical challenges familiar in embedded systems development.
The first challenge to mention is an obvious one: these constrained edge devices come with extremely limited resources available for applications. To get an idea, here are some of the popular models:
- STM32 (ex. STM32F4)
- Arduino (ex. Arduino Due)
- ESP32 (ex. ESP32-WROVER)
With image processing, because of the image data size, the bottleneck is typically the memory size, and as such, boards with less than 1 MB of RAM are simply not appropriate for vision-based use cases. On the other hand, something like the ESP32 model above, with up to 8MB of PSRAM and 16MB of Flash, should give us more than enough to run fairly complex vision sensing applications.
It’s important that we maintain a realistic view about the capabilities of these devices before we start thinking about application development because at some point, we will definitely hit the hardware limits. That being said, devices will continue to get more powerful. For example, consider NXP i.MX Cross over MCUs (ex. RT1170), which are still microcontrollers running RTOS, but ship with significantly higher specs. If you want to develop a highly complex application for an IoT device, these may indeed hit the sweet spot!
2. Device Fragmentation
The second challenge is the device fragmentation in the IoT space. If you do a search on the Internet for various IoT solutions, you would find a huge variety of RTOS distributions (TinyOS, NuttX, FreeRTOS, Zephyr, seL4, …) as well as many different instruction set architectures (ARM Cortex-M, ARM Cortex-A, RISC-V, MIPS, Xtensa…).
With this much fragmentation, embedded developers would continuously need to be aware of the hardware architecture and the operating system targets for which the application is developed. This is highly undesirable as you would end up needing to write code tailored for each target environment, generating a catalog of non-portable applications. What we want is to completely decouple applications from hardware so that developers no longer need to be hardware-aware at the time of development, as is the case mostly in the cloud side with virtualization technologies.
3. Security (Memory Safety)
Thirdly, developers need to guard against numerous security threats in IoT. As seen in the Bitefinder and NETGEAR’s survey, “The 2023 IoT Security Landscape Report”, there were 3.6 billion security events generated from 120 million IoT devices investigated! Among these, it is also the case that memory safety issues are some of the most commonly reported security threats. Microsoft has reported in their annual security vulnerabilities survey that “~70% of the vulnerabilities Microsoft assigns a CVE [Common Vulnerability and Exposure] each year continue to be memory safety issues.” According to their blog post, the main cause is the memory bugs inadvertently included in the C/C++ code by developers. Unless developers opt for memory-safe languages, this problem will likely persist.
Unfortunately, in embedded systems, C and C++ still dominate. One study reported in the 2023 Embedded Survey revealed that 70% of development is still written in C and C++ (52% and 18%, respectively). Memory safety vulnerabilities are especially concerning in IoT where applications generally do not run in isolation and applying a patch is often not trivial; a single application with carelessly managed memory could easily ‘brick’ the entire device with no easy way to recover.
This is not surprising as many applications in embedded require high performance and low-level optimization for the target hardware. But if embedded application development is somehow no longer restricted to low level unsafe languages, we could have a bigger pool of developers in this space, leading to greater potential for innovation.
And this naturally leads to the next topic, which is especially important in vision AI development: Python!
4. Python Support
For AI application developers, Python is the most popular programming language, and one of the reasons is its rich support for useful libraries in AI application development.. In the Stack Overflow Developer Survey of 2023, Python is the third most popular programming language, and NumPy, Pandas, Scikit-Learn, and Pytorch, made the list as some of the most popular libraries.
These surveys show a clear gap between AI and embedded developers when it comes to choice of programming languages and frameworks. It is this way for good reasons, of course, as AI and embedded developments have been done completely separately by differently skilled developers tackling different kinds of problems. But for edge AI application development to foster, this gap needs to start closing!
This is not to say that edge AI application development can only be done with Python. It’s just that Python deserves extra attention because of its popularity among AI developers.
WebAssembly (Wasm) as a Solution
To overcome these challenges for edge vision AI development, after looking into several options, we came to a firm conclusion that Wasm, short for WebAssembly, is exactly what we were looking for!
What is WebAssembly?
Wasm is a portable binary instruction format and a compilation target for many programming languages (C, C++, Rust, Go, and many more!). It provides application sandboxing that is magnitudes smaller than containers, near native performance, and a true polyglot programming experience for developers. It also comes with a strong security model in which a Wasm module is allowed to access resources outside of its sandbox only through explicit declarations of interface imports and exports; Each module, therefore, is only allowed to access its own linear memory within the sandbox.
Wasm officially began its life in 2017 with the initial goal of running high-performance applications safely inside the browser (superseding the earlier effort by Mozilla with asm.js). Wasm is also a W3C standard. While it was originally targeting the web, its design from the start never excluded it from being used outside the web. Indeed, with its lightweight and strong security design, Wasm is a great fit for IoT. According to The State of WebAssembly 2023 published by SlashData™ and the Linux Foundation, IoT is already the third most common usage of Wasm.
One of the main benefits of Wasm on IoT is the application portability. The concept of ‘Write Once, Run Anywhere’ is very appealing for largely fragmented IoT devices! In order to achieve this level of portability, however, the runtime must support different architectures and operating systems.
Wasm Runtime for Constrained Edge
Wasm modules can run anywhere as long as there’s a Wasm runtime installed, but is there a runtime small enough for microcontrollers? The answer is yes!
Midokura is a proud member of the Bytecode Alliance, which is a non-profit organization focusing on the actual implementation of the Wasm specifications. One of the projects in the Bytecode Alliance is WAMR (WebAssembly Micro Runtime), a Wasm runtime targeting microcontrollers that is fully compliant to Wasm W3C MVP. It is super lightweight, has rich support for operating systems and architectures, and supports AOT (Ahead of Time compilation), which is essential for the performance required in many IoT use cases.
👉 How small is the runtime? The runtime binary size to execute AOT is only ~29.4K!
With WAMR, it becomes possible to run applications on microcontrollers with all the benefits of Wasm!
WASI: Application Portability in Constrained Edge
As mentioned earlier, IoT devices are extremely heterogeneous, and writing a portable application is very difficult. One key benefit of using Wasm for IoT is the application portability, but is it truly portable? Currently, not in all cases.
While the Wasm’s binary format itself is portable, it is not enough to achieve true portability without having standard interfaces for the modules. WASI (WebAssembly System Interface), also a W3C standard, which defines a set of standard APIs for Wasm modules, does just that. Through WASI, modules can interact with other modules and access host resources with common API. Currently, WASI includes a mix of POSIX-like API, such as file I/O (wasi-io), and higher level API, such as HTTP (wasi-http). It even includes a set of machine learning API called wasi-nn. To give you a better idea on the state of WASI, the list of most anticipated WASi features according to The State of WebAssembly 2023, include SQL, Key-value store, Runtime config, and many others.
But WASI is still at an early maturity level, and it does not yet offer all the interfaces needed by edge vision AI applications. For example, these applications typically need to interact with the image sensor and computer vision libraries on the host to extract and manipulate images, but at this moment, there are no such interfaces. Defining standard interfaces for these operations is the key to achieving true portability. With WASI continuing to make rapid progress after each release, we are confident that these interfaces will be standardized sooner than later. There are already some proposals and discussions around this topic (wasi-i2c and wasi-sensor), and we expect these activities to elevate in the near future.
Securing IoT with Wasm Sandbox
The security model of Wasm brings significant security benefits in IoT. Within the Wasm sandbox, each module gets its own memory, and is restricted from accessing any resources outside its sandbox. This means that while Wasm does not prevent code with memory bugs from running and wreaking havoc within the sandbox, it does effectively prevent such code from causing harm outside the sandbox. By limiting the fault domain, a single faulty or rogue application cannot ‘brick the device’. The memory isolation of Wasm is particularly beneficial for MCUs because they typically lack memory protection units (MMU). Furthermore, patching is much easier and efficient as you only need to replace the offending module, whereas before, you would have most likely needed a full firmware OTA update!
Security is further enhanced with the introduction of the Component Model but since it is not supported by WAMR at the time of this writing, I will not expand on it.
Python Support and MicroPython for Wasm
Hopefully I have successfully convinced you that Wasm and WASI, along with WAMR, create a great execution environment for edge vision AI applications on constrained edge devices. But how easy is it really to develop an edge AI Wasm module? With rich support of high level languages in Wasm, it should be easier than before where C and C++ were the only options. But it is not that simple as this depends largely on the support level of Wasm in the programming language toolchain. And in the context of edge AI application development, we want to put our focus on Python.
To be clear, Python code can already compile into Wasm, and indeed, it is the fourth most commonly used programming language according to The State of WebAssembly 2023 report.
However, if you want to compile a Python application to Wasm with full features of Python, you would need to compile it with the entire interpreter, CPython. When we tried it, the size of the AOT binary came out to be ~20MB, which is clearly too big to run on microcontrollers. While great attention continues to be given to improve support of Wasm in the Python community, a different approach must be considered to run Python on microcontrollers.
One such approach might be Micropython, which is a subset of standard Python libraries optimized for microcontrollers. It is lightweight (265K of code space and 16K of RAM), includes a numpy-like library (ulab) and the AOT size came out to be only 1.5MB! Even though we still need to do more investigation and optimization, we are very encouraged with what we have seen so far. It no longer seems far-fetched to run an edge AI Python application on a microcontroller!
AI Processing with Specialized Hardware
So far, I’ve only mentioned the software side to solving these challenges, and ignored the hardware and anything around AI. It is important to keep in mind that no matter how much you manage to shrink the software size, in many instances the device may just be too resource constrained for what you want to do. In that case, an effective solution would be to rely on special hardware to offload the AI processing of the images.
For example, Sony’s IMX500 image sensor is an intelligent vision sensor that is equipped with an AI processing functionality, which allows for both the ISP (Image Signal Processor) and AI processing on signals acquired by the pixel chip! In this case, since the inference is performed on the sensor itself, the application is only responsible for post-processing of the result, which greatly reduces the resource requirement.
If you are interested in getting started with IMX500, you can either get an ESP32 based IMX500 equipped camera or an IMX500-integrated Raspberry Pi AI camera module and attach it to a compatible Raspberry Pi.
Conclusion
The intention of this blog is to show a high level overview of why Wasm is a great choice for developers interested in building edge vision AI applications. I hope it raised your interest in this topic! In the next part, my colleague will explain in detail the challenges of AI in the constrained edge environment, and how working with the aforementioned IMX500 sensor opens up the possibility of running true edge vision AI applications on tiny IoT cameras. We will follow that up with more blogs to show how Wasm edge vision AI application development can be done with open source tools and SDK. Stay tuned!