The main task I worked on over the summer was to add CRC support to vkms module. In this post, I’ll explain the general requirements needed to add CRC API support to a DRM driver, then I’ll talk about the specific details related to adding CRC support to the virtual driver vkms.

CRC API in DRM provide to userspace CRC checksums of the final frame shown on the display.

Testing the CRC is extensively used in test stuites such as IGT, which expects frames that are identical to have identical CRC values [1].

Typically, CRC entries are provided by the hardware in which the driver would poll for every frame. Examples of GPU drivers that implement the CRC API includes: i915, Rockchip, amdgpu.

Given that vkms is a virtual driver, we needed a way to access the framebuffer and then compute a unique CRC for each frame. This gave me an opportunity to get familiar with the framebuffer abstraction and how it is mapped to the final visual output displayed on the screen.

This post is structured as follow:

# DRM CRC API

Generally, CRC (Cyclic Redundancy Check) is an error-detecting code that gets attached to a data frame so that the recieving system can check for any corruption to the data by recomputing the CRC value again and comparing the new value with the attached one.

In DRM subsystem, since the hardware usually provides a CRC value, recomputing the checksum again on the framebuffer is not feasible. For that, the only way to detect if a frame is valid is by comparing CRC values for frames that are expected to have similar contents.

The DRM exposes CRC API through the debugfs directory at dri/0/crtc-N/crc, where the N represents the index of the CRTC (output). It adds two files per CRTC, control and data files.

$ls /sys/kernel/debug/dri/0/crtc-0/crc control data  The userspace can specify the source of the frame CRCs by writing to the control file. The default value is auto, which let the driver select a default source. For example, in addition to the default value, i915 driver support the following sources: {plane1, plane2, pf, pipe, TV, DP-B, DP-C, DP-D}. To add CRC API support, a driver need to implement (verify_crc_source(), set_crc_source()) callbacks in drm_crtc_func to check if the specified source is supported by the driver, and if so, then to start gerenating frame CRCs or stop them if the specified source is NULL. Part of the callbacks implementation, the driver has to specify the number of CRC values the driver would provide per entry by updating values_cnt. Then, adding CRC entries can be done by callingdrm_crtc_add_crc_entry(), which takes as input an array of CRC values, and the number of the frame the CRC is computed from if that is supported by the driver, otherwise, the field would be filled with XXXXXXXX instead. For example, i915 has 5 CRC values per entry as we can see below. For each line, the first column eg. 0x011346cc indicates the frame number associated with a framebuffer the CRC values have been computed against and the next values_cnt=5 columns represent the CRC values of the framebuffer. $ root@Haneen:/sys/kernel/debug/dri/0/crtc-0/crc# cat data
0x011346cc 0xd7aec7a9 0x00000000 0x00000000 0x00000000 0x00000000
0x011346cd 0xd7aec7a9 0x00000000 0x00000000 0x00000000 0x00000000
0x011346ce 0x7892d600 0x00000000 0x00000000 0x00000000 0x00000000
0x011346cf 0x7892d600 0x00000000 0x00000000 0x00000000 0x00000000
....


# Adding CRC API to vkms

Since vkms is not associated with a particular hardware, we had the liberty to decide how many CRC values vkms would provide per entry, and the method to compute the CRC.

Using crc32_le function we compute one CRC value per entry (it didn’t matter if the system is little endian or not as long as the reported CRC is unique per fb content)

In addition, we need a way to access the framebuffer regularly, compute CRC, and add it through drm_crtc_add_crc_entry(). To solve that, we used a combination of an hrtimer callback and an ordered workqueue.

### Compute CRC value

To compute CRC checksum for a framebuffer, the function crc32_le expects a pointer to the address where the buffer starts at in the memory, and the size of the buffer.

 u32 crc32_le(u32 crc, unsigned char const *p, size_t len) 

To do that, we had to map the backing memory of the framebuffer to the kernel’s virtual address space first. That was acomplished by the help of vmap function which maps a set of pages to the kernel’s virtual address space. The following two patches addressed that issue [2, 3].

Since the CRC is computed against the visible part of the framebuffer, it’s crucial to understand how a framebuffer is mapped in memory.

 crc = function(visible portion of fb) 

First, a framebuffer (struct drm_framebuffer) has a 2D abstraction which is described by a width, height, and a pitch attribues. The width and height represents how many pixels are present horizantally and vertically respectively, whereas the pitch represents one single row of pixels in bytes in addition to some padding bytes.

There are multiple ways a 2D framebuffer can be stored in a 1D memory layout (called modifiers) with the linear format being the easiest type to understand. In the linear format, a framebuffer rows are stored in increasing memory locations [4].

The final displayed frame on screen can be a subset of the framebuffer’s content, as well as the result of blending multiple framebuffers together (ex. blending a framebuffer with the cursor image with another framebuffer that displays the layouted windows on screen).

To describe the subset of framebuffer that would be visible and where it should be mapped to in the screen space, the DRM subsystem uses an abstraction called plane represented by: struct drm_plane.

There are three main types of plane abstraction in DRM: Primary, Cursor, and Overlay. Each driver must provide one primary plane per display output (CRTC).

Planes describe how the framebuffer is clipped or scaled out. It specifies from where in the framebuffer the image should be mapped from (x, y, width, height) by using src_ prefix, and crtc_ to describe the destination of the image in the screen space.

To figure out where the pixel data starts, framebuffers stores offsets value that describes when the actual pixel data for this framebuffer plane starts. Offset value is helpful for when multiple planes are allocated within the same backing storage memory.

In summary, the framebuffer abstraction stores the backing storage information (source of pixels), which it feeds it to one or multiple planes to be blended with other planes to construct the final image that would be displayed on a screen.

Adding all the above together, we can compute a CRC value over the content of a framebuffer by iterating over the pixels of the final displayed frame as follow:

for (i = src_y; i < src_y + src_h; ++i) {
for (j = src_x; j < src_x + src_w; ++j) {
v_offset = i * pitch;
h_offset = j * cpp /* bytes per pixel */;
src_offset = offset + v_offset + h_offset;
crc = crc32_le(crc, vaddr + src_offset, sizeof(u32));
}
}


### Report CRC value to userspace

So far we’ve managed to figure out how to compute a CRC value per framebuffer. We still need to actually call the function and append the value to the data file using: drm_crtc_add_crc_entry() at the end of a vblank.

Usually, drivers add the function call along with their vbalnk interrupt handle [5]. For example, amd GPU call drm_crtc_add_crc_entry() from within dm_crtc_high_irq() interrupt handle.

static void dm_crtc_high_irq(void *interrupt_params)
{
...
drm_handle_vblank(adev->ddev, crtc_index);
amdgpu_dm_crtc_handle_crc_irq(&acrtc->base); /* calls drm_crtc_add_crc_entry() */
}


But again, since vkms is a virtual driver that is not associated with a particular hardware that can derive a regular vblank interrupt from, we had to simulate vblank interrupt using hrtimer and call a handler at a regular intervals. This issue was addressed by Rodrigo and the details about the implementation can be found here [6]. The following is a snippet from the code where it initializes the hrtimer:

drm_calc_timestamping_constants(crtc, &crtc->mode);
hrtimer_init(&out->vblank_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
out->vblank_hrtimer.function = &vkms_vblank_simulate;
out->period_ns = ktime_set(0, vblank->framedur_ns);
hrtimer_start(&out->vblank_hrtimer, out->period_ns, HRTIMER_MODE_REL);


Since hrtimer callback shouldn’t include heavy computations, we have used an ordered workqueue to start computing CRC value and add the value to the data file.

The ordered workqueue ensures that the CRC computations callback are run in the order thay’re submitted.

 queue_work(output->crc_workq, &state->crc_workq); 

With all of the above (plus taking into account synchronization issues) vkms can support CRC API and the specific details can be found in the final CRC patch [7].


root@haneenDRM: modprobe vkms
<.. submit fb contents (eg. using igt tests) ..>
root@haneenDRM: cat /sys/kernel/debug/dri/0/crtc-0/crc/control
auto
root@haneenDRM: cat /sys/kernel/debug/dri/0/crtc-0/crc/data
0x00121580 0x4c1cc376
0x00121581 0x4c1cc376
0x00121582 0x00000000
0x00121587 0x7c9b032e
0x00121588 0x7c9b032e