The main task I worked on over the summer was to add CRC support to vkms
module.
In this post, I’ll explain the general requirements needed to add CRC API support to a DRM driver,
then I’ll talk about the specific details related to adding CRC support to the virtual driver vkms
.
CRC API in DRM provide to userspace CRC checksums of the final frame shown on the display.
Testing the CRC is extensively used in test stuites such as IGT, which expects frames that are identical to have identical CRC values [1].
Typically, CRC entries are provided by the hardware in which the driver would poll for every frame. Examples of GPU drivers that implement the CRC API includes: i915, Rockchip, amdgpu.
Given that vkms
is a virtual driver, we needed a way to access the framebuffer and then compute a unique CRC
for each frame. This gave me an opportunity to get familiar with the framebuffer abstraction and how it is mapped to the
final visual output displayed on the screen.
This post is structured as follow:
DRM CRC API
Generally, CRC (Cyclic Redundancy Check) is an error-detecting code that gets attached to a data frame so that the recieving system can check for any corruption to the data by recomputing the CRC value again and comparing the new value with the attached one.
In DRM subsystem, since the hardware usually provides a CRC value, recomputing the checksum again on the framebuffer is not feasible. For that, the only way to detect if a frame is valid is by comparing CRC values for frames that are expected to have similar contents.
The DRM exposes CRC API through the debugfs directory
at dri/0/crtc-N/crc
, where the N represents the index of the CRTC (output).
It adds two files per CRTC, control
and data
files.
$ ls /sys/kernel/debug/dri/0/crtc-0/crc control data
The userspace can specify the source of the frame CRCs by writing to the control file.
The default value is auto
, which let the driver select a default source.
For example, in addition to the default value, i915 driver support the following sources: {plane1, plane2, pf, pipe, TV, DP-B, DP-C, DP-D}.
To add CRC API support, a driver need to implement
(verify_crc_source(), set_crc_source()
) callbacks
in drm_crtc_func
to check if the specified source
is supported by the driver, and if so, then to start gerenating
frame CRCs or stop them if the specified source is NULL.
Part of the callbacks implementation, the driver has to specify the number of
CRC values the driver would provide per entry by updating values_cnt
.
Then, adding CRC entries can be done by callingdrm_crtc_add_crc_entry()
,
which takes as input an array of CRC values, and the number of the frame the CRC is computed from
if that is supported by the driver, otherwise, the field would be filled with XXXXXXXX instead.
For example, i915 has 5 CRC values per entry as we can see below.
For each line, the first column eg. 0x011346cc
indicates the frame number
associated with a framebuffer the CRC values have been computed against
and the next values_cnt=5
columns represent the CRC values of the framebuffer.
$ root@Haneen:/sys/kernel/debug/dri/0/crtc-0/crc# cat data 0x011346cc 0xd7aec7a9 0x00000000 0x00000000 0x00000000 0x00000000 0x011346cd 0xd7aec7a9 0x00000000 0x00000000 0x00000000 0x00000000 0x011346ce 0x7892d600 0x00000000 0x00000000 0x00000000 0x00000000 0x011346cf 0x7892d600 0x00000000 0x00000000 0x00000000 0x00000000 ....
Adding CRC API to vkms
Since vkms is not associated with a particular hardware, we had the liberty to decide how many CRC values vkms would provide per entry, and the method to compute the CRC.
Using crc32_le
function we compute one CRC value per entry
(it didn’t matter if the system is little endian or not as long as the reported CRC is unique per fb content)
In addition, we need a way to access the framebuffer regularly, compute CRC, and add it through
drm_crtc_add_crc_entry()
.
To solve that, we used a combination of an hrtimer
callback and an ordered workqueue
.
Compute CRC value
To compute CRC checksum for a framebuffer, the function crc32_le
expects a pointer to the address where the buffer starts at in the memory,
and the size of the buffer.
u32 crc32_le(u32 crc, unsigned char const *p, size_t len)
To do that, we had to map the backing memory of the framebuffer to the kernel’s virtual address space
first. That was acomplished by the help of vmap
function which maps
a set of pages to the kernel’s virtual address space. The following two patches
addressed that issue [2, 3].
Since the CRC is computed against the visible part of the framebuffer, it’s crucial to understand how a framebuffer is mapped in memory.
crc = function(visible portion of fb)
First, a framebuffer (struct drm_framebuffer
) has a 2D abstraction which is described by a width, height, and a pitch attribues.
The width and height represents how many pixels are present horizantally and vertically respectively,
whereas the pitch represents one single row of pixels in bytes in addition to some padding bytes.
There are multiple ways a 2D framebuffer can be stored in a 1D memory layout (called modifiers
)
with the linear format being the easiest type to understand.
In the linear format, a framebuffer rows are stored in increasing memory locations [4].
The final displayed frame on screen can be a subset of the framebuffer’s content, as well as the result of blending multiple framebuffers together (ex. blending a framebuffer with the cursor image with another framebuffer that displays the layouted windows on screen).
To describe the subset of framebuffer that would be visible and where it should be mapped to in the screen space,
the DRM subsystem uses an abstraction called plane
represented by: struct drm_plane
.
There are three main types of plane abstraction in DRM: Primary, Cursor, and Overlay. Each driver must provide one primary plane per display output (CRTC).
Planes describe how the framebuffer is clipped or scaled out.
It specifies from where in the framebuffer the image should be mapped from (x, y, width, height) by using src_
prefix,
and crtc_
to describe the destination of the image in the screen space.
To figure out where the pixel data starts, framebuffers stores offsets
value
that describes when the actual pixel data for this framebuffer plane starts.
Offset value is helpful for when multiple planes are allocated within the same backing storage memory.
In summary, the framebuffer abstraction stores the backing storage information (source of pixels), which it feeds it to one or multiple planes to be blended with other planes to construct the final image that would be displayed on a screen.
Adding all the above together, we can compute a CRC value over the content of a framebuffer by iterating over the pixels of the final displayed frame as follow:
for (i = src_y; i < src_y + src_h; ++i) { for (j = src_x; j < src_x + src_w; ++j) { v_offset = i * pitch; h_offset = j * cpp /* bytes per pixel */; src_offset = offset + v_offset + h_offset; crc = crc32_le(crc, vaddr + src_offset, sizeof(u32)); } }
Report CRC value to userspace
So far we’ve managed to figure out how to compute a CRC value per framebuffer.
We still need to actually call the function and append the value to the data
file using:
drm_crtc_add_crc_entry()
at the end of a vblank.
Usually, drivers add the function call along with their vbalnk interrupt handle [5].
For example, amd GPU call drm_crtc_add_crc_entry()
from within
dm_crtc_high_irq()
interrupt handle.
static void dm_crtc_high_irq(void *interrupt_params) { ... drm_handle_vblank(adev->ddev, crtc_index); amdgpu_dm_crtc_handle_crc_irq(&acrtc->base); /* calls drm_crtc_add_crc_entry() */ }
But again, since vkms is a virtual driver that is not associated with a particular hardware that can derive a regular vblank interrupt from,
we had to simulate vblank interrupt using hrtimer
and call a handler at a regular intervals.
This issue was addressed by Rodrigo and the details about the implementation can be found here [6].
The following is a snippet from the code where it initializes the hrtimer
:
drm_calc_timestamping_constants(crtc, &crtc->mode); hrtimer_init(&out->vblank_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); out->vblank_hrtimer.function = &vkms_vblank_simulate; out->period_ns = ktime_set(0, vblank->framedur_ns); hrtimer_start(&out->vblank_hrtimer, out->period_ns, HRTIMER_MODE_REL);
Since hrtimer callback shouldn’t include heavy computations, we have used an ordered workqueue
to start computing CRC value and add the value to the data
file.
The ordered workqueue ensures that the CRC computations callback are run in the order thay’re submitted.
queue_work(output->crc_workq, &state->crc_workq);
With all of the above (plus taking into account synchronization issues) vkms
can support CRC API
and the specific details can be found in the final CRC patch [7].
root@haneenDRM: modprobe vkms <.. submit fb contents (eg. using igt tests) ..> root@haneenDRM: cat /sys/kernel/debug/dri/0/crtc-0/crc/control auto root@haneenDRM: cat /sys/kernel/debug/dri/0/crtc-0/crc/data 0x00121580 0x4c1cc376 0x00121581 0x4c1cc376 0x00121582 0x00000000 0x00121587 0x7c9b032e 0x00121588 0x7c9b032e
Refrences:
[1] New debugfs API for capturing CRC of frames
[2] [v4,1/4] drm/vkms: Add functions to map/unmap GEM backing storage
[3] [v4,2/4] drm/vkms: map/unmap buffers in [prepare/cleanup]_fb hooks
[4] Inten Open Source Graphics Programmer’s Reference - Memory Views
[6] Add infrastructure for Vblank and page flip events in vkms simulated by hrtimer