# Demosaic RTL ISP Design Dr. Maikon Nascimento 23/Dec/2021 maikon@ualberta.ca https://github.com/maikonadams http://scientistengineer.blogspot.com/ https://www.linkedin.com/in/maikonadams/ https://maikon12.wixsite.com/drmaikon # Agenda - Introduction - Workflow - Octave Model - RTL Design - Results - Conclusion and Future Work [xc7z020clg400-1] #### Introduction B B B B B B B B B - An imaging sensor with a matrix of pixels/sensors of any common digital camera is actually composed by a pattern of pixels in blocks of 2x2, where 2 diagonally opposed are green and the other two are red and blue. Which resembles a mosaic like illustrates in the picture above [\*0]. - The Colored filters extracted from [\*1] are the interpolation filters that uses 2 colors to interpolate and fill up the gaps in the matrix of pixels. - Those filters will work together with muxes to be selected according to the pixel position and CFA order, the filters also are repeated. [\*0] Donald Bailey, Design for Embedded Image Processing on FPGAs [\*1] Malvar and Cutler, High-Quality Linear Interpolation for Demosaicing of Bayer-Patterned Color Images #### Workflow - The basic framework for RTL ISP design is illustrated in the figure where a model is coded in Octave (a free Matlab). - The Octave model has a true-colour image to evaluate the algorithm and also generate the inputs for the RTL and the expected output that the RTL is compared against. - True-colour images also can provide a reference for the overall algorithm performance using PSNR to compare. One of the images is shown here. - In the RTL we then compare the output of the design with the output of the Octave Model which is a hardware friendly version of the algorithm. ### Octave Model - Octave was chosen because of the similarity of Matlab and all the functions, and packages needed. But Matlab itself could be used, or Python, or any language for quick algorithm coding that also have libraries for writing and reading files. - The gray image on top is the result of the true-RGB image is converted to a mosaiced Bayer pattern simulating a raw image from the camera and being converted to an input to the RTL design. - For this algorithm a header with a few parameters were included for quick modification of the testing conditions such as image dimensions and pixel width. - Also this model will run a hardware friendly version of the Demosaicing algorithm itself with floor operations for trucation of words when requantization is needed, also limiting the words size based on the RTL bus size like: uint8 or uint16. - Another important aspect is to decided what to do with the border which for this first version is padded with 0. - The code is on github and here I show some snaps. ``` Wapply filters N reconstruct the image Washape = 'full'; shape = 'full'; shape = 'same'; sy image tmp G at G = conv2(iv image, G at G coef , shape); uint16(sv image tmp G at G); %g0 sv image tmp G at B = floor(conv2(iv image, G at R coef, shape), /8); uint16(sv image tmp G at B); %g1 sv image tmp G at B = floor(conv2(iv image, G at R coef, shape), /8); uint16(sv image tmp G at B); %g1 ``` ``` rgb_image(:,:,1) = sv_image_R; rgb_image(:,:,2) = sv_image_G; rgb_image(:,:,3) = sv_image_B; rgb_image(1:2, :, 1:3) = 0; rgb_image(G_IMG_HEIGHT -1:G_IMG_HEIGHT, :, 1:3) = 0; rgb_image(:, 1:2, 1:3) = 0; rgb_image(:, G_IMG_WIDTH-1:G_IMG_WIDTH, 1:3) = 0; ``` ## RTL Design - The basic RTL architecture is shown in the figure, where the controller manager interfaces such as AXIS4, pixel addresses, valid and ready signals transmission. - Fifo5x5 is responsible for the converting the stream of pixels into a 5x5 mask keeping the shape of the block. - Router manages conditionals that might change the design based on the pixel position. - And finally conv realizes the convolution, clip and clamp the pixels. - The initial tool used for this is ModelSim, but could be any simulation software. For debuding testbenches are also included and code to read and write files, which comes from the octave model. ## RTL Design - FIFO [\*0] Donald Bailey, Design for Embedded Image Processing on FPGAs - The image shows a 3x3 window/mask to buffering 2 rows of the frame, we use similar approach but for 5x5 mask; - The fifo will buffer 4 rows and will use shift registers to shape the 5x5 grid; - From the stream of coming pixels, this Fifo5 will output 25 pixels on parallel to the next module, the router. - This module is expected to consume more blocks of memory. ## RTL Design - Router - The Fig. Is not from this design but represents the concept where the shape of filter mask may change according to the pixel position, which is why the router is needed. - For this first version the conditional is very simple and when the pixel in on the border or 1<sup>st</sup> and 2<sup>nd</sup> rows or cols the output is 0 padding, which makes the calculation made only for the center and it is easy to model. - A second version will modify the filter mask to adapt it for the border and corner of the image. ## RTL Design - Convolution - The Fig. On top shows the basic DSP block present at Zynq platform or 7 Series FPGAs of Xilinx, which is a decision the design have to make to use them instead of logic. In my first design I did not use initially and later I changed the coding style to infer the DSP tiles. - Because there are fracional numbers in the coeficients, a fixed point conversion is needed, and in the coding style I also keept the shape of the coeficients to be more clear for code reading. - A third approach of implementation is to use open the operations with sum and bus shifting which is possible due to the nature of the coeficients numbers present. [\*2] 7 Series DSP48E1 Slice User Guide (UG479) G at R locations ### Results Left and grey image is the mosaic raw image, the right image is the output of the demosaic where RGB is recovered. The PSNR is 39.2 dB. #### Results - ModelSim was used for the basic development and functional validation. The image on top illustrates the project where text files were created by Octave with the input vector and expected output, then image dimensions and basic signals for verification like valid and error are there to show any error when comparing with Octave. In pink shows 0 errors so the design has bit accuracy. - In terms of FPGA performance and resource utilization we have the report from Vivado. - The RTL and Octave can be found at: https://github.com/maikonadams/fpgaip/tree/master/demosaicing #### Results - Here just to show that a simple coding style decision can penalize the design by misleading Vivado. (0) shows the memory was not being infered using blocks of RAM. - (1) has the blocks of RAM being used but no DSP and the max freq is 150 MHz for xc7z020. Pipelining more the design of the convolution keeping only 2 operands per operation and making sure the attribute of DSP is set to "true", makes the DSPs blocks being infered reducing even more the resources utilized and increasing the max freq to 200 MHz. ### Conclusion and Future Work - I presented my usual methodology to develop ISP RTL design using Octave, Modelsim, and Vivado. The system uses real data and aims to be a professional design to be implemented at any FPGA interfacing an Imaging sensor. - My design achieved 0 bit error or bit accuracy against the Octave Model, comparisons were made in the RTL project. - I also showed how to improve the max frequency by pipelining the design in the convolution module. My optimizations also reduces the amount of resources utilized. - It is still missing a demo in a live system which is the next step using DMA, AXIS4 interface, and embedded Linux to test the design with the real images that I have.