As 3D immersive media continues to gain prominence, Point Cloud Quality Assessment (PCQA) is essential for ensuring high-quality user experiences. This paper introduces ViSam-PCQA, a no-reference PCQA metric guided by visual saliency information across three modalities, which facilitates the performance of the quality prediction. Firstly, we project the 3D point cloud to acquire 2D texture, depth, and normal maps. Secondly, we extract the saliency map based on the texture map and refine it with the corresponding depth map. This refined saliency map is used to weight low-level feature maps to highlight perceptually important areas in the texture channel. Thirdly, high-level features from the texture, normal, and depth maps are then processed by a Transformer to capture global and local point cloud representations across the three modalities. Lastly, saliency along with global and local embeddings, are concatenated and processed through a multi-task decoder to derive the final quality scores. Our experiments on the SJTU, WPC, and BASICS datasets show high Spearman rank order correlation coefficients/Pearson linear correlation coefficients of 0.953/0.962, 0.920/0.920 and 0.887/0.936 respectively, demonstrating superior performance compared to current state-of-the-art methods. The code is available at https://github.com/cwi-dis/ViSam-PCQA_MM2024Workshop.

, , , ,
doi.org/10.1145/3689093.3689183
MM '24: The 32nd ACM International Conference on Multimedia
Distributed and Interactive Systems

Zhou, X., Viola, I., Yin, R., & César Garcia, P. S. (2024). Visual-saliency guided multi-modal learning for no reference point cloud quality assessment. In Proceedings of Quality of Experience in Visual Multimedia Applications Workshop (pp. 39–47). doi:10.1145/3689093.3689183