RPMsg-Based Inter-Core Communication in Heterogeneous Embedded Systems

RPMsg-Based Inter-Core Communication in Heterogeneous Embedded Systems

Abstract

In heterogeneous multicore System-on-Chip (SoC) devices, communication between processor cores with different execution environments is a fundamental requirement. Remote Processor Messaging (RPMsg) provides a framework for message-based inter-core communication, enabling efficient and deterministic data exchange between general-purpose and real-time processors.

Introduction

Heterogeneous multicore architectures integrate processors with different capabilities on a single silicon die to meet the requirements of modern embedded systems. High-performance application processors (e.g., ARM Cortex-A) typically run Linux and handle computational workloads, while low-power microcontrollers (e.g., ARM Cortex-M) execute real-time tasks under an RTOS or bare metal.

Communication between applications running on different systems - Linux, Nucleus RTOS, and bare-metal

Figure 1. Communication between applications running on different systems - Linux, Nucleus RTOS, and bare-metal

Reliable communication between these cores is essential, but their differing execution models make direct interaction complex. RPMsg addresses this challenge by providing a lightweight and standardized inter-processor communication (IPC) mechanism. It enables message exchange between Linux-based host processors and real-time cores and is a key component of many heterogeneous SoC designs.

Different communication settings between the host and the remote core

Figure 2. Different communication settings between the host and the remote core

Remote Processor Messaging Overview

RPMsg operates on top of VirtIO and shared memory, supporting asynchronous, bidirectional communication between a host processor (e.g. Cortex-A core) and a remote processor (e.g. Cortex-M core).

Each processor establishes an endpoint identified by a unique address for each logical communication channel it requires. Message queues are implemented as VirtIO vrings in shared memory, handling buffer allocation and synchronization.

The host processor runs the VirtIO-RPMsg driver under Linux, initializing communication and setting up shared memory. The remote processor runs the RPMsg implementation (commonly through OpenAMP), registers endpoints, and waits for messages.

Typical RPMsg communication flow:

  1. Initialization – Firmware loading on the remote core and shared memory setup.
  2. Endpoint creation – Logical channels established on both sides.
  3. Message exchange – Bidirectional transmission through vring buffers.
  4. Buffer reuse – Upon message consumption, processed buffers are released and returned to the pool.

Example: Data Exchange Between Cores

A common implementation involves a Cortex-M core reading sensor data in real time and sending it to a Cortex-A core for processing.

  • The Cortex-M core collects sensor data and transmits it via RPMsg.
  • The Cortex-A core receives, processes, and optionally responds to the message.

Buffers used for RPMsg can reside in SRAM or DDR. Data from the kernel is typically exposed to user space through the TTY driver.

Typical RPMsg communication between Cortex-A and Cortex-M cores

Figure 3. Typical RPMsg communication between Cortex-A and Cortex-M cores

Technical Challenges

1. Systems Without MPU

Problem Description

The target system lacked a Memory Protection Unit (MPU). An MPU allows memory segmentation with configurable access permissions, ensuring that user applications cannot access restricted regions. Without it, software mechanisms must ensure safe use of shared memory.

Solution

A Hardware Abstraction Layer (HAL) was introduced to control access to shared memory via defined interfaces:

hal_shared_memory_write();

hal_shared_memory_read();

Developers define:

  • The base address of the shared memory region.
  • The maximum payload size.
  • The number of message slots in shared memory.

Implementation

The write function accepts a message index, data pointer, and data length. It validates parameters before writing to shared memory. Invalid indices or oversized payloads are rejected.

The read function accepts an index, destination pointer, and data length. It verifies index validity before reading data from the corresponding offset. This approach isolates shared memory and prevents user-space access to unintended regions.

2. Periodic Message Transmission

Problem Description

The system required periodic message transfer between cores. However, the standard TTY-based RPMsg driver could not maintain synchronization because it treats data as a continuous byte stream. The TTY interface (/dev/ttyRPMSGx) lacks message boundary awareness, which can result in fragmented messages and lost synchronization.

Solution: Custom Character Driver

A custom character driver was developed to replace the stream-oriented TTY interface. This driver is frame-oriented, treating each RPMsg message as a complete packet and exposing it through /dev/rpmsg_data virtual file.

Each user-space read or write corresponds to one complete message. This design guarantees message synchronization and simplifies debugging, as messages can be inspected individually.

Driver Selection and Implementation

Two configurations were evaluated:

  1. Disable the TTY driver entirely and use only the custom character driver.
  2. Support both drivers with runtime selection capability.

The second option was implemented, allowing driver selection using a kernel module parameter (mux_mode). Based on the selected mode, the probe function initializes the corresponding driver.

Implementation Steps

  1. Register a callback in the RPMsg driver probe function.
  2. Implement rpmsg_char_read() and rpmsg_char_write() for user-space communication.
  3. Register these functions in the file operations structure.

Example


// define module parameters for driver selection
static int mux_mode = 0;
module_param(mux_mode, int, 0644);
MODULE_PARM_DESC(mux_mode, "Set to 1 for CHAR mode, 0 for TTY mode");

// callback function that adds data to FIFO
static int rpmsg_char_cb(struct rpmsg_device *rpdev, void *data, int len, void *priv, u32 src) {
    enqueue_message(&rx_queue, data, len); // add data to FIFO
    wake_up_interruptible(&read_queue); // wakeup processes the waits for read()
}

// when probe is called driver type is selected
static int probe(struct rpmsg_device *rpdev) {
    int ret = 0;
    if (mux_mode == 1) {
        ret = rpmsg_char_init(rpdev);
    } else {
        ret = rpmsg_tty_init(rpdev);
    }
    return ret;
}

// char driver init function should create endpoint and register callback function
int rpmsg_char_init(struct rpmsg_device *rpdev) {
    rpdev->ept = rpmsg_create_ept(rpdev, rpmsg_char_cb, NULL, RPMSG_ADDR_ANY);
    // register and create char device afterwards
}

// RPMsg char read function, called when user-space calls read()
static ssize_t rpmsg_char_read(struct file *fp, char __user *buf, size_t len, loff_t *off) {
    // get a message from FIFO if not empty, and copy data to user space, copy_to_user can be used
}

// RPMsg char write function, called when user-space calls write()
static ssize_t rpmsg_char_write(struct file *fp, const char __user *buf, size_t len, loff_t *off) {
    // copy data from user space to kernel buffer, copy_from_user can be used
    // send data to remote processor using rpmsg_send

    // NOTE: This function is optional and should be implemented if you want two-way communication
}

// rpmsg_char_read and rpmsg_char_write must be registered in file operations structure
static const struct file_operations rpmsg_fops = {
    .owner   = THIS_MODULE,
    .read    = rpmsg_char_read,
    .write   = rpmsg_char_write,
    .open    = nonseekable_open,
    .llseek  = no_llseek,
};
Bidirectional communication

Figure 4. Bidirectional communication

Future Work

Further improvements may include:

  • Dynamic buffer sizing – Modify the VirtIO RPMsg driver to support multiple vrings with variable buffer sizes, optimizing memory allocation.
  • Data integrity verification – Introduce validation using magic numbers or checksums (e.g., CRC-32) to detect and prevent data corruption in shared memory.

Conclusion

RPMsg provides a compact and efficient framework for inter-core communication in heterogeneous SoCs. It achieves a good balance between simplicity and performance, making it well-suited for systems that combine general-purpose and real-time processing.

In environments lacking hardware protection or relying on stream-based drivers, additional mechanisms are necessary to maintain reliable communication. The combination of a software-based protection layer and a frame-oriented communication driver ensures deterministic, safe, and synchronized inter-core data exchange.

You may also like