In progress – view active session
Conference 13137
Applications of Digital Image Processing XLVII
19 - 21 August 2024
19 August 2024 • 9:30 AM - 11:10 AM PDT
Session Chair:
Andrew G. Tescher, AGT Associates (United States)
13137-1
19 August 2024 • 9:30 AM - 9:50 AM PDT
Show Abstract +
Market-available automated microscopy systems are often unaffordable for research institutions, especially those in economically disadvantaged nations, limiting their access to advanced technologies. This work addresses this challenge by developing a cost-effective virtual microscopy and telemicroscopy system, aiming to create a remote-controlled microscopy setup for analyzing digital samples with comparable performance to high-end equipment but at a reduced cost. The system includes a web platform for telemicroscopy, enabling remote control of the robotic stage and real-time viewing of the microscope camera. Additionally, a decision support system has been implemented, integrating AI-based models for identifying objects of interest in two use cases: i) analyzing water quality in biological samples and ii) identifying cancerous tissue in digital pathology samples. These models enhance diagnostic capabilities, leading to increased productivity for experts and reducing manual workload. Sample virtualization and automatic processing, simplify tasks for professionals, allowing remote participation in concurrent work sessions and streamlining processes for digital samples.
13137-2
19 August 2024 • 9:50 AM - 10:10 AM PDT
Show Abstract +
This research compares two deep-learning models, BetaVAEClassifier and PCAEClassifier, for identifying white matter lesions in the brains of multiple sclerosis patients. Both models use convolutional encoder-decoder architectures with different approaches for feature representation. The dataset, comprising various MRI modalities, undergoes data enhancement, compression, and augmentation. Evaluation metrics show promising results, highlighting the potential for accurate diagnosis and assessment in multiple sclerosis research.
13137-3
19 August 2024 • 10:10 AM - 10:30 AM PDT
Show Abstract +
This research extends previous studies on white matter lesion identification in multiple sclerosis. While the initial study with CVIPtools achieved a 90.63% success rate and the second study using deep learning architecture reached 93%, our current investigation focuses on compressed MRI datasets. The results indicated a significant 50% decrease in lesion identification accuracy using established methods, highlighting a limitation with the CVIPtools approach. However, the deep learning model maintained a remarkable 98.53% accuracy despite compression challenges, demonstrating its resilience and effectiveness in accurately classifying lesion and non-lesion classes.
13137-4
19 August 2024 • 10:30 AM - 10:50 AM PDT
Show Abstract +
This paper introduces an automated system comparing VGG16 and ResNet50 for dermatoscopic image processing and classification. Swift and accurate diagnosis of skin lesions enable skin cancer detection at an early stage. This method utilized transfer learning and fine-tuning VGG16 and ResNet50 using the HAM10000 dataset. Random resampling balanced the dataset, optimizing models for accurate results with limited resources. We preprocessed images, performed data augmentation, modified the pre-existing models, and tuned the hyperparameters to increase the overall accuracy of both the models. Results demonstrate VGG16 and ResNet50 achieving 92.10% and 91.8% accuracy, respectively, showcasing the effectiveness of the proposed system in advancing early skin cancer intervention with deep learning techniques.
13137-5
19 August 2024 • 10:50 AM - 11:10 AM PDT
Show Abstract +
This study introduces a new method for automating the classification of brain tumors in MRI images using three deep learning models: VGG16, ResNet18, and DenseNet. The research uses a dataset that includes 7023 brain MRI images categorized into glioma, meningioma, no tumor, and pituitary classes. Data augmentation techniques are used to improve the learning process of the models, and an advanced image enhancement algorithm enhances tumor visibility. The study compares the models and identifies a methodology that achieves up to 95% accuracy. This research is a significant advancement in automated brain tumor classification, providing insights into deep learning models for medical imaging and guiding future research for more precise diagnostic devices.
Coffee Break 11:10 AM - 11:40 AM
19 August 2024 • 11:40 AM - 12:40 PM PDT
Session Chair:
Frederik Temmermans, Vrije Univ. Brussel (Belgium)
13137-6
19 August 2024 • 11:40 AM - 12:00 PM PDT
Show Abstract +
While distributed version control systems offer a solid foundation for monitoring revision history, their effectiveness is hindered when dealing with digital media assets, which are often treated as opaque binary data. This makes it challenging to precisely track modifications and compromises storage efficiency. Despite this, a significant portion of embedded metadata within these files is actually textual in nature, though it remains unrecognized due to its integration into the binary structure. Moreover, alterations to the metadata and the underlying structure of metadata container formats, such as the JPEG Universal Metadata Box Format (JUMBF), go unnoticed during media rendering, further complicating the identification process. To address these issues, this paper proposes a solution that defines a standardized asset decomposition and structured serialization scheme. This framework enables the individual tracking of subcomponents within media assets, facilitating more accurate version control and metadata management.
Show Abstract +
AI-driven image manipulation techniques offer unprecedented capabilities for creativity and visual enhancement, but they also pose significant challenges in terms of authenticity, integrity, and misinformation. Current
state-of-the-art techniques for image manipulation detection often struggle to discern subtle alterations made by
AI algorithms and, as such, report poor detection results, necessitating the development of advanced detection
methods capable of discerning AI manipulations. This paper presents a dataset of images containing AI-generated modifications and a new method for the detection of image manipulations that excels in AI-generated
manipulations.
13137-8
19 August 2024 • 12:20 PM - 12:40 PM PDT
Show Abstract +
The massive development of IoT, Big Data and other technologies has led to security concerns with respect to data protection. It has become imperative to develop solutions to protect our data, such as images, texts, and audios from unauthorized access. This work presents an encrypted image transmission scheme based on a chaotic dynamic configuration of two synchronized spherical chaotic attractors of 3 dimension in a master-slave topology. We synchronized the future evolution of the chaotic systems starting with different initial conditions using the Hamiltonian observer-based approach and then utilized the resulting phase space points as the pseudo-random numbers for securing image transmitted through the communication channel. The scheme designed is realized and implemented on the Multiprocessor System-on-Chip (MPSoC) platform by harnessing the easy and synthesizable programming features of Python with MPSoC. The image is transmitted through the state variables x1, x2, and x3, and analyzed using the two statistical techniques namely, information entropy and correlation analysis where the result shows the full recovery of the image that was transmitted through the state variables.
Lunch Break 12:40 PM - 2:10 PM
19 August 2024 • 2:10 PM - 3:10 PM PDT
Session Chair:
Frederik Temmermans, Vrije Univ. Brussel (Belgium)
13137-9
19 August 2024 • 2:10 PM - 2:30 PM PDT
Show Abstract +
The past few years have witnessed remarkable advancement in the domain of face recognition thanks to the development of deep learning. However, the robustness of deep face recognition techniques in varying real-world conditions is a pressing challenge. This paper proposes to incorporate both pose-invariant and cross-resolution strategies into one face recognition framework and learning a unified feature representation. Firstly, a knowledge distillation paradigm is employed as the learning framework. The face recognition model learns to extract pose and resolution-robust features from varying faces in the wild under the guidance of the feature representation from frontal and high-resolution faces. Secondly, two sub-networks attached to the feature extractor are devised, which learn to bridge the discrepancy between face images in different poses or resolutions in deep feature space. Extensive experiments on different in-the-wild face recognition benchmarks demonstrate the superiority of the proposed method over the state-of-the-art.
13137-10
19 August 2024 • 2:30 PM - 2:50 PM PDT
Show Abstract +
JPEG Trust is a novel international standard that responds to the pressing need to assess trust of digital media assets. JPEG Trust provides a comprehensive framework addressing key elements such as provenance, authenticity, integrity, and copyright. Built on top of established JPEG and industry standards, the framework ensures compatibility across digital media ecosystems. This paper provides an overview of the JPEG Trust framework and illustrates its usage in several usage scenarios.
13137-11
19 August 2024 • 2:50 PM - 3:10 PM PDT
Show Abstract +
There is insufficient information in the literature about the impacts of image manipulation in society. While some anecdotes about qualitative factors exist, such factors are thinly covered in the literature, and estimates of quantitative, especially monetary, costs are even less available. That these costs are substantial is perhaps indicated by a 2019 study jointly issued by the University of Baltimore and Isael-based cybersecurity firm CHEQ claiming that, on the whole, fake news costs the global economy $78 billion annually, however the bases for such a figure are difficult to find.
Quantifying the impacts of misinformative images is an important first step in addressing and mitigating these impacts. Furthermore, identifed quantitative factors have the potential to inform models providing justification for implementation of control measures for fake images, and may also assist in informing relevant policy and regulation. This paper identifies factors that may contribute to quantitative assessment of fake image impacts, with an exploration of approaches to usefully modelling such impacts.
Coffee Break 3:10 PM - 3:40 PM
19 August 2024 • 3:40 PM - 5:00 PM PDT
Session Chair:
Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
13137-12
19 August 2024 • 3:40 PM - 4:00 PM PDT
Show Abstract +
Mosquito-borne diseases annually impact 3 billion people and cause over 500,000 deaths. Traditional identification methods, requiring specialized skills and equipment, limit monitoring scalability and are challenged by climate-induced habitat changes. Our study introduces a scalable solution through citizen science, leveraging smartphone imagery for mosquito identification despite challenges like varied backgrounds. We utilize object detection for precise mosquito identification from diverse images, converting a classification dataset into one with annotated bounding boxes for two primary breeds: Aedes Albopictus and Culex Quinquefasciatus. Training on 10,000 images from Mosquito Alert and testing with a Malaysian dataset, our model demonstrates high accuracy (mAP50 of 90% and 99% respectively), showing promise for global mosquito monitoring and enhancing public health efforts.
13137-13
19 August 2024 • 4:00 PM - 4:20 PM PDT
Show Abstract +
The Vegetation Index(VIs) is determined as a parameter calculated from the reflectance values at different wavelengths of the vegetation and is particularly sensitive to the vegetation cover.. The problem of detecting the vegetation index utilizing UAVs has been addressed in multiple articles in the literature, in which many special hardware and thermal or infrared cameras are adapted to improve its detection. This article seeks to identify the vegetation index from its biophysical parameters. We help ourselves with artificial intelligence algorithms and machine learning algorithms. A semi-physical model was designed to estimate the ecosystem and establish the vegetation index correctly. The results will be validated by remote sensing. Finally, an ecological model will be developed to simulate the environmental impact on vegetation patterns and geographic plains. The proposed model successfully imitated the urban effect. Given these results, it was possible to predict better the impact of changing seasons in a defined geographic area.
13137-14
19 August 2024 • 4:20 PM - 4:40 PM PDT
Show Abstract +
CNN algorithms have become ubiquitous within the vision domain, encompassing a wide array of tasks, including object detection, segmentation, and classification. However, executing complex CNN algorithms on real-time vision systems demands better energy efficiency, runtime, and accuracy requirements. This has led to innovative computing architectures, leveraging heterogeneity that combines CPUs, GPUs, FPGAs, and other accelerators into a single processing fabric. However, scheduling and partitioning algorithms remain an arduous task, particularly when distributing operations among accelerators that have different computing paradigms. This paper proposes a scheduler targeting heterogeneous vision systems, which carefully fine-grain partitions and maps layers and sub-operations of state-of-the-art convolutional neural networks and image processing algorithms. Our experiments reveal that the scheduled partitioned algorithms perform better in both energy and runtime efficiency compared to their best-performing homogeneous components executing the complete algorithm.
13137-15
19 August 2024 • 4:40 PM - 5:00 PM PDT
Show Abstract +
This study explores two algorithms for removing rain streaks in car images. The first method utilizes CVIPtools software and the second method combines CVIPtools and Python. While both methods address rain streak removal, method 1 loses more information, while method 2, though requiring more programming expertise, is more robust and accurate in preserving significant details.
19 August 2024 • 5:30 PM - 7:00 PM PDT
13137-43
19 August 2024 • 5:30 PM - 7:00 PM PDT
Show Abstract +
This project shows the latest scientific development in digital image processing applied to wine grape characterization. The literature review highlights that existing studies on both external and internal grape characteristics often employ expensive equipment or lack essential features for robust prediction results. Consequently, experts in the field advocate for new studies considering a broader range of characteristics and economic viability for end-users. An analysis of the Scopus database, using keywords like "grape image processing" identified 285 papers covering 2012 to 2023; additionally, advanced searches related to maturation, color, chemical analysis, phenolic composition, sugar content, prediction models, and correlation of physical and chemical attributes indicate an area of opportunity due to the decrease of works found on these specific topics. Bibliometric results reveal the evolving research landscape in these areas over the past decade, with notable authors such as Whitty, M., and Liu, S. Leading institutions and countries include China, India, the United States, and Spain.The VOSviewer software was employed to confirm influential studies and trends in the field.
13137-44
19 August 2024 • 5:30 PM - 7:00 PM PDT
Show Abstract +
We conducted the research on photoacoustic computed tomography (PACT) in small animal models with prostate tumors using the photoacoustic tomography (PAT) system of LOIS-3D, and achieved good imaging results without the use of exogenous contrast agents. The excitation light source with a wavelength of 755 nm was employed to image the vascular structure of mice, achieving a comprehensive visualization of the overall vascular network distribution. The vascular structure with an irregular shape in tumor tissue exhibited obvious differences compared to normal tissue, which can provide a valuable reference for the diagnosis of tumors.
13137-45
19 August 2024 • 5:30 PM - 7:00 PM PDT
Show Abstract +
Simultaneous Localization and Mapping (SLAM) is the task of reconstructing an environmental model passed using on-board sensors and at the same time maintaining an estimate of the mobile sensor location within the model. One of the known approaches to the SLAM problem is the Kalman filter. The Kalman filter efficiency is based on the fact that it contains a fully correlated posterior over feature maps and mobile sensor poses.
The important element of the SLAM problem is the reconstruction of the environmental 3D scene. In this paper, we propose an algorithm to restore the 3D scene using consistent condition and a modified version of the Kalman filter. The reconstruction algorithm is non-iterative. Computer simulation results are provided to illustrate the performance of the proposed method.
13137-46
19 August 2024 • 5:30 PM - 7:00 PM PDT
Show Abstract +
3D point cloud registration is of great importance in robotics and computer vision to find a rigid body transformation to align a pair of point clouds with unknown point correspondences. In recent years, the deep learning model has dominated the field of computer vision. The important part of registration is the estimation of correspondences between point clouds. The main idea of studying correspondences between point clouds is to establish correspondences through the multidimensional features of each point.
In this paper, we propose a simple neural network algorithm to register incongruent point clouds. The proposed algorithm utilizes the virtual points and is partially based on the PointNet++ neural network. Computer simulation results are provided to illustrate the performance of the proposed method.
13137-47
19 August 2024 • 5:30 PM - 7:00 PM PDT
Show Abstract +
To effectively use deep learning feature extraction, we train a large model on usual detection and segmentation to find abnormalities in mammogram screening. Such a model is then used for distillation or transfer learning to train a smaller network, which would be much easier and faster to use without much loss of quality. We create a pipeline in which such a smaller distilled model is used to extract deep features from mammogram screenings and to create dictionaries of these features for unsupervised anomaly detection. If segment features do not match or are too different from any known data in the network, then previously learned clusters are used to create new groups in our dictionary, which helps us find and group any similar pathology.
13137-48
19 August 2024 • 5:30 PM - 7:00 PM PDT
Show Abstract +
Mammography screening also leads to a high rate of false positive results. This may lead to unnecessary worry, inconvenient follow-up care, additional imaging studies, and sometimes the need for tissue. blood draws (often a needle biopsy). Convolutional neural networks (CNN) are one of the most important networks in the field of deep learning. The neural networks form some feature vectors often contain weak features. There are known methods for eliminating weak features based on the mutual information.
In this paper, we propose a convolutional neural network based to recognize local geometrical features. Computer simulation results are provided to illustrate the performance of the proposed method.
13137-49
19 August 2024 • 5:30 PM - 7:00 PM PDT
Show Abstract +
This work proposes a multi-class model for breast pathology classification using a combination of machine and deep learning methods, aimed at improving classification rates and minimizing false positives. The proposed method encompasses the following steps: preprocessing of image datasets, training of base classification models, and construction of a meta-classifier. The model enhances the performance of single classifiers and is benchmarked against various machine learning models. Finally, the method is evaluated using the MIAS and CBIS-DDSM mammography datasets
Show Abstract +
In recent years, 3D printing has gained prominence in manufacturing. To enhance productivity and quality in this field, real-time management of equipment and printing processes is crucial. Technical vision systems utilizing video signals from cameras can aid in analyzing and optimizing the printing process. Challenges include developing algorithms for high-resolution video processing in real-time. These systems help monitor product quality, detect printing errors, and automate processes. Research is needed to integrate advanced computer vision, machine learning, and image processing methods into 3D printing control systems. Focused on the FFF method, a methodology using neural networks for real-time error detection and correction in 3D printing has been developed. Preliminary work with a dataset shows promise for enhancing printing parameter predictions using neural networks.
20 August 2024 • 8:45 AM - 10:05 AM PDT
Session Chair:
Thomas Richter, Fraunhofer-Institut für Integrierte Schaltungen IIS (Germany)
13137-16
20 August 2024 • 8:45 AM - 9:05 AM PDT
Show Abstract +
JPEG XS is a lightweight, low-latency image coding standard for
transmission of video streams over IP. To transmit video over an
error-prone networks such as a WAN or wireless networks, error
correction need to be considered. If in addition low-latency is a
design requirement, forward error correction according to
according to SMPTE ST 2022-5 is a favourable choice.
In this work, we study JPEG XS transmission in lossy networks with and
without forward error correction and report on the outcome of an
experiment where we measured image quality as a function of error rate
and also provide estimates on the additional latency due to the error
correction layer.
13137-17
20 August 2024 • 9:05 AM - 9:25 AM PDT
Show Abstract +
Traditionally, rice quality relied on manually estimating whole and broken grains, a slow and subjective process. This study explores leveraging image processing and machine learning for a more efficient approach, but achieving clear images is crucial. The study details a meticulously designed protocol considering background, grain shape, translucency, and lighting to capture high-quality images. This aims to pave the way for automated analysis, ultimately improving accuracy, efficiency, and industry standards for better quality control and consistent rice products.
Show Abstract +
Rice quality, crucial for the agri-food industry, relies on tedious and subjective manual grading of whole vs. broken grains. Seeking a solution, this study proposes a fully automated technique using digital image processing. While existing methods struggle with accuracy due to manual adjustments, this approach utilizes new algorithms to analyze images and overcome oversegmentation issues. By leveraging circularity information, it significantly reduces errors compared to manual grading, offering both efficiency gains and improved accuracy, paving the way for more automated and reliable rice quality assessment.
Show Abstract +
The JPEG Committee has recently initiated an activity known as JPEG AIC (Assessment of Image Coding) in response to recent advancements in image compression technology. This initiative aims to address the challenge posed by the high to nearly visually lossless range, where traditional subjective visual quality assessment protocols, such as those outlined in ITU-T Rec. BT.500, proved ineffective. The committee has issued a Call for Contributions on Subjective Image Quality Assessment. Furthermore, the committee is currently working on the Call for Proposals on Objective Image Quality Assessment. This paper aims to provide an overview of the future JPEG AIC-3 standards, highlighting recent advancements in this domain and outlining the roadmap for future advancements.
Coffee Break 10:05 AM - 10:30 AM
20 August 2024 • 10:30 AM - 11:50 AM PDT
Session Chair:
Thomas Richter, Fraunhofer-Institut für Integrierte Schaltungen IIS (Germany)
13137-20
20 August 2024 • 10:30 AM - 10:50 AM PDT
Show Abstract +
With the constant increase in video resolution and frame rate for immersive content applications, there is a need for efficient coding strategies that can deliver very high visual quality with very low latency over 5G networks. JPEG XS is a low-complexity codec that can be implemented with very low latency, designed to provide visually lossless quality at high compression ratios, making it suitable for immersive video applications. This paper reports a quality evaluation of omnidirectional videos using the JVET 360º test sequences dataset coded with JPEG XS. A subjective quality experiment used an alternating double-stimulus method in a VR environment, where subjects freely commute between reference and distorted videos. Test sequences were encoded with JPEG XS at five different bitrates, ranging from 0.25 to 3 bpp. These bit rates are suitable for real-time high-resolution video transmission over 5G networks. It was concluded that JPEG XS provides an effective low latency solution suitable for high quality immersive applications using 5G networks.
13137-21
20 August 2024 • 10:50 AM - 11:10 AM PDT
Show Abstract +
This paper introduces a methodology for evaluating quality of experience in mixed immersive communication systems. It focuses on assessing the impact of advanced 3D capture techniques, immersive eye-sensing light field displays, and efficient compression mechanisms in mixed setups where terminals offer different visual modalities. This research methodically investigates the influence of the above technologies on user experience in various communication scenarios. By employing a combination of qualitative and quantitative assessments, the study aims to develop methods for comprehensive evaluation of how such immersive technologies affect perceived visual quality, presence, engagement, and overall satisfaction and preference compared to traditional video communication methods. The experimental design incorporates a series of tests where participants interact through a state-of-the-art immersive communication setup, followed by detailed feedback sessions to gauge their experiences. Through this approach, the study seeks to uncover the nuances of user satisfaction in immersive environments and identify the key factors that enhance the overall quality of peer-to-peer communication.
Show Abstract +
Several image coding formats have been proposed recently, either by standardization committees, industry consortia or private companies. Examples include JPEG XL, AVIF, HEIF, JPEGLI, WebP, HDPhoto. etc. Often, the originators of the format claim superiority in terms of performance when compared to state-of-the-art. However, these are usually claims for which the proof is provided by the originators of the technologies that, besides obvious bias, they might not have the necessary insights or time to spend optimizing the state of the art they compare to. In this paper, we start with an overview of different performance metrics that can and are used to assess the quality, efficiency, and effectiveness of image coding for current and emerging standards. We then compare the most recent state of the art in image coding and provide a detailed assessment of their performance when measured by those metrics.
13137-23
20 August 2024 • 11:30 AM - 11:50 AM PDT
Show Abstract +
Emerging 5G technologies bring various new opportunities for the media sector. In particular, they allow for the incorporation of ultra-high resolution video formats and immersive AR/VR/XR content into streaming applications while providing a reliable and high-quality user experience.
In this paper, we focus on streaming immersive content in 8K and 360 within two scenarios and validate the feasibility of efficient, cost-effective solutions with significant added value measurement using various key performance indicators in the framework of a European innovation project called 5GMediaHUB.
Lunch/Exhibition Break 11:50 AM - 1:20 PM
20 August 2024 • 1:20 PM - 2:20 PM PDT
Session Chair:
Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
13137-24
20 August 2024 • 1:20 PM - 1:40 PM PDT
Show Abstract +
Real-world food recognition is a challenging task, as the contents of a plate of food can be complex intermixed objects, making it difficult to define their individual structures. Deep learning methods have shown better accuracy and ability to identify ingredients and types of food compared to traditional approaches for image classification. However, many deep learning methods rely on powerful computational resources, which have limitations in terms of cost, energy consumption, and size. Our method utilises deep-learning methods for detection and segmentation that are optimised for resource-constrained embedded platforms. The resulting system provides a fast, accurate way to recognise foods without requiring expensive, energy-intensive hardware.
13137-25
20 August 2024 • 1:40 PM - 2:00 PM PDT
Show Abstract +
Despite tremendous advancement in computer vision, especially with deep learning, understanding scenes in the wild remains challenging. Even modern image classification models often misclassify when presented with out-of-distribution inputs despite having been trained on tens of millions of images or more. Moreover, training modern deep-learning classifiers requires a lot of energy due to the need to iterate many times over the training set, constantly updating billions of model parameters. Owing to problems with generalisability and robustness as well as efficiency, there is growing interest in computer vision to mimic biological vision (e.g. human vision) in the hope that doing so will require fewer resources for training both in terms of energy and in terms of data sets while increasing robustness and generalizability. This paper proposes a biologically plausible neuromorphic vision system that is based on a spiking neural network and is evaluated on the classification of hand-written digits from the MNIST dataset.
13137-26
20 August 2024 • 2:00 PM - 2:20 PM PDT
Show Abstract +
While traditional on-orbit geometric calibration relies on comprehensive imaging parameters, such models are often unavailable for widely distributed remote sensing products. This limitation hinders the geometric accuracy of these images, impacting their usability for various applications. To address this challenge, we propose a novel approach that leverages rational polynomial coefficients (RPCs) to refine the geometric fidelity of remote sensing images. By employing RPCs, our method bypasses the need for a rigorous sensor model, making it applicable to a broader range of remote sensing data. This paper details the methodology and demonstrates its effectiveness in improving geometric accuracy.
Conference Break 2:20 PM - 2:30 PM
20 August 2024 • 2:30 PM - 3:15 PM PDT
Session Chair: Khan Iftekharuddin, Old Dominion Univ. (United States)
2:30 PM - 2:35 PM:
Welcome and Opening Remarks
2:30 PM - 2:35 PM:
Welcome and Opening Remarks
13136-501
AI-powered MRI: towards an omni imaging technology for brain mapping
(Keynote Presentation)
20 August 2024 • 2:35 PM - 3:15 PM PDT
Show Abstract +
The ongoing paradigm shift in healthcare towards personalized and precision medicine is posing a critical need for noninvasive imaging technology that can provide quantitative tissue and molecular information. Magnetic resonance signals from biological systems contain information from multiple molecules and multiple physical/biological processes (e.g., T1 relaxation, T2 relation, diffusion, perfusion, etc.). So, magnetic resonance imaging (MRI) is inherently a high-dimensional imaging technology that can acquire structural, functional and molecular information simultaneously. In practice, due to the curse of dimensionality, MRI experiments are often done in a low-dimensional setting to acquire biomarkers one at a time. Such a “divide-and-conquer” approach not only reduces data acquisition efficiency but also makes it difficult to obtain molecular information in high resolution. By synergistically integrating machine learning with sparse sampling, constrained image reconstruction and quantum simulation, we have successfully demonstrated ultrafast high-dimensional imaging of the brain. This talk will give an overview of this unprecedented omni imaging technology and show some exciting experimental results of brain function and diseases.
Coffee Break 3:15 PM - 3:30 PM
20 August 2024 • 3:30 PM - 5:30 PM PDT
3:30 PM - 3:35 PM:
Welcome and Opening Remarks
Welcome and Opening Remarks
13138-501
Sense making from multi-source, electro-optical, remote sensing constellations
(Plenary Presentation)
20 August 2024 • 3:35 PM - 4:15 PM PDT
Show Abstract +
With 140+ petabytes of historical data holdings, 3.8 million square kilometers of daily multi-spectral collection, integration of Synthetic Aperture Radar and newly launching assets every quarter, the opportunities to develop insight from sense making technologies at Maxar are ever growing. During this discussion, we will cover the challenges of collecting, organizing, and exploiting multi source electro-optical remote sensing systems at scale using modern machine learning architectures and techniques to derive actionable insights.
21 August 2024 • 9:15 AM - 10:15 AM PDT
Session Chair:
Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
13137-27
21 August 2024 • 9:15 AM - 9:35 AM PDT
Show Abstract +
Optical coherence tomography (OCT) is a crucial tool in ophthalmology. It aids in diagnosing and managing various ocular conditions by visualizing intricate retinal structures. Despite widespread adoption, the manual analysis of OCT images remains time-consuming and labor-intensive. This study presents a novel approach to streamlining this process by integrating artificial intelligence (AI) techniques.
13137-28
Hardware accelerators for AI-based video enhancement and optimization in video ASICs for data center
21 August 2024 • 9:35 AM - 9:55 AM PDT
Show Abstract +
Advanced AI and new compression standards are required to improve the viewing experience and reduce service costs, but the explosion in computational complexity is a major barrier to adoption. In this paper, we describe how the development of dedicated hardware accelerators for super-resolution and preprocessing to improve encoder compression performance can significantly improve video quality and compression efficiency while reducing cost and development time. These advances represent an important step towards balancing high quality streaming services with operational efficiency.
13137-29
21 August 2024 • 9:55 AM - 10:15 AM PDT
Show Abstract +
The study utilizes deep learning, specifically the U-Net convolutional neural network, to automate the segmentation of geographical features in satellite images for land cover classification. By combining cross-entropy and Dice loss during training, the model achieves precise results, aiding applications like urban planning and environmental monitoring. Data augmentation techniques enhance model robustness, while post-processing strategies refine segmentation outcomes. Future work involves expanding the dataset for improved generalization and creating a user-friendly interface. This research underscores the transformative role of deep learning in satellite image analysis.
Coffee Break 10:15 AM - 10:40 AM
21 August 2024 • 10:40 AM - 11:40 AM PDT
Session Chair:
Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
13137-30
21 August 2024 • 10:40 AM - 11:00 AM PDT
Show Abstract +
Recently, the image compression field has seen a shift in paradigm thanks to the rise of neural network-based models, such as the future JPEG AI standard. While most research to date has focused on image coding for humans, JPEG AI is planning to address machine vision by presenting a number of non-normative decoders addressing multiple image processing and computer vision tasks. While the impact of conventional image compression on classification tasks has already been addressed, no study has been conducted to assess the impact of learning-based image compression on such tasks. In this study, the impact of learning-based image compression, including JPEG AI, on the classification task is reviewed and discussed. The study reviews the impact of JPEG AI compression on a variety of image classification models and shows the superiority of JPEG AI over other learning-based compression models.
13137-31
21 August 2024 • 11:00 AM - 11:20 AM PDT
Show Abstract +
Rapid development and deployment of GPU based computation has led to an improvement in diffusion generation of video and images. Further, a rapid reduction in the effective cost of compression using NNC techniques provides opportunities to compress images and videos in new ways. The overall structure of diffusion based generative video and images is leveraged to take advantage of the compressed latents and lower overall compression costs and latency. This paper presents an architecture to compress latents for transmission and reduce overall latency and cost as compared to alternatives using traditional Codecs or NNC on the raw image. It further presents computational cost, quantitative and perceptual quality, and latency for this architecture as compared to the alternatives.
13137-32
21 August 2024 • 11:20 AM - 11:40 AM PDT
Show Abstract +
The Versatile Video Coding (VVC) video coding standard specifies a tool named Reference Picture Resampling (RPR), designed for dynamic adaptive resolution change. This tool is also included in the Enhanced Coding Model (ECM) currently developed as exploratory work by JVET. RPR is well designed to support frame changing resolution without inserting intra refresh picture. Video streaming and low delay scenarios can take advantage of RPR to ensure a smooth frame-based bit-rate adaptation, compared to traditional techniques that can generates bitrate leaps. Substantial video coding gains may be obtained from this new feature by properly deciding at the encoding of the optimal picture resolution to be used per video segment.
In this paper, a neural network regressor to predict the picture resolution change decision is presented, and adaptation of downscaling factor is proposed to improve VVC coding efficiency in the context of random access and all intra modes configurations.
Lunch/Exhibition Break 11:40 AM - 1:10 PM
21 August 2024 • 1:10 PM - 3:35 PM PDT
Session Chair:
Ryan Zhijun Lei, Meta (United States)
13137-33
21 August 2024 • 1:10 PM - 1:30 PM PDT
Show Abstract +
The current scale of online video streaming requires hardware accelerated video transcoding solutions. Historically, hardware solutions have been excellent at offloading the computationally intensive tasks from CPUs, but often came with the penalty of being inflexible and not quickly adaptable to emerging market trends. We are presenting an architecture, which maintains all the benefits of hardware acceleration but also adds unparalleled level of programmability and flexibility. This architecture supports a wide spectrum of markets ranging from ultra-low latency encoding all the way to high quality Video On Demand markets with only firmware changes. These capabilities are achieved by a strategic combination of built-in hardware acceleration components and many embedded CPUs that have having full control over the video encoding pipeline flow. This architecture not only provides deterministic timing, which is critical for ultra-low latency transcoding, but it also offers flexibility and programmability allowing robust product roadmaps through simple firmware updates.
Show Abstract +
YouTube is actively driving advancements in the AV1 and AV2 video codecs to enhance streaming quality and efficiency for diverse user-generated content (UGC). Efforts include customizing the AV1 codec for UGC, optimizing quality/bitrate/compute tradeoffs, and developing hardware encoding/decoding support within YouTube's data centers to support AV1 at scale.
To accelerate adoption, YouTube works to increase AV1 transcoding coverage, expand device compatibility, and contributes to the Alliance for Open Media (AOM) for the ongoing improvement of AV1 and AV2.
Research focuses on novel quality metrics, hardware-software analysis, and potential modifications to the codecs to support emerging use cases like AR and VR. To ensure the practical feasibility of AV2, we assess hardware complexity and propose methods to reduce it. YouTube also prioritizes reducing AV1/AV2 encoder complexity using approaches like machine learning-based partition search pruning. Furthermore, YouTube led the collective efforts to modify existing tools by chairing HW Subgroup within the AOM group.
13137-35
21 August 2024 • 1:50 PM - 2:10 PM PDT
Show Abstract +
To extend the work we have done previously to benchmark encoder coding efficiency and performance for different open source software encoders, including x264, x264, libvpx, libaom and SVT-AV1, we want to also include the hardware encoders into a similar study. In this work, we have included few commercially available hardware AV1 encoder implementations from external vendors along with Meta’s MSVP VP9 encoder. A wider variety of test content are included in the study. In order to ensure a fair comparison between software and hardware encoders, we normalized the encoding performance to power used in watt-hour. In this paper, we will provide detailed description for test methodology, process for measuring compression efficiency and power usage. We will also discuss the limitations and future opportunities to improve the methodology.
13137-36
21 August 2024 • 2:10 PM - 2:30 PM PDT
Show Abstract +
In video streaming service for VOD use case, one important workflow is to transcode user uploaded videos into multiple encoded bitstreams with different bitrates and resolutions, which allows client players to leverage ABR (adaptive bitrate) algorithm to select the bitstream segments based on its available bandwidth. In this workflow, the key decision needs to be made is to determine the optimal encoding resolution and bitrate for every video at each quality or bitrate target in a ABR ladder. To tackle this challenge, an efficient two-stage convex hull based dynamic optimization framework was recently proposed. In this two-stage system, two different encoders, or encoder presets can be used to construct the convex hull to improve the computation efficiency.
In this work, we study the cross codec encoding parameter prediction problem in the two-stage system to improve compression efficiency. We first describe how we formulate the prediction as an optimization problem. We propose two methods towards this optimization with validation results. We also discuss some potential directions that can further improve the results.
Coffee Break 2:30 PM - 2:55 PM
13137-37
21 August 2024 • 2:55 PM - 3:15 PM PDT
Show Abstract +
In this paper, we will introduce how we implemented the client side ABR algorithm to enable delivery for mixed codec manifest. We will also share some results and potential opportunities for further optimization.
13137-38
AI-based content-aware encoding at scale utilizing hardware resources in video ASICs for data center
21 August 2024 • 3:15 PM - 3:35 PM PDT
Show Abstract +
This paper presents a method to simplify content-aware encoding for streaming services using AI, aiming to enhance user experience and efficiency in bandwidth-limited environments. Unlike traditional encoding, which uses fixed bitrates, this AI-driven approach optimizes the bitrate based on the content's complexity, significantly reducing the necessary computational steps and bitrates. It achieves this by predicting an optimized Adaptive Bitrate (ABR) ladder through minimal encoding steps and lightweight analysis, resulting in substantial bitrate savings and streamlined workflow. The approach also fits well with the trend of integrating video processing ASICs in data centers, further enhancing cost-effectiveness and scalability.
21 August 2024 • 3:35 PM - 4:55 PM PDT
Session Chair:
Yuriy A. Reznik, Brightcove, Inc. (United States)
13137-39
21 August 2024 • 3:35 PM - 3:55 PM PDT
Show Abstract +
The use of DNA molecules as a storage medium has been recently proposed as a solution to the exponentially increasing demand for data storage, achieving lower energy consumption and higher information density. The nucleotides composing the molecules can be regarded as quaternary symbols, but constraints are generally imposed to avoid sequences prone to errors during sequencing, storage, and synthesis. While the majority of previous works in the field have proposed methods for translating general binary data into nucleotides, others have presented algorithms tailored for specific data types such as images as well as joining source and channel coding into a single process. This paper proposes and evaluates a method that integrates DNA Fountain codes with state-of-the-art compression coding techniques, targeting the storage of images and three-dimensional point clouds. Results demonstrate that the proposed method outperforms previous techniques for coding images directly into DNA, putting forward a first benchmark for the coding of point clouds.
13137-40
21 August 2024 • 3:55 PM - 4:15 PM PDT
Show Abstract +
Karhuen-Loeve Transform (KLT) is a valuable tool in many applications, but its computation is not exactly trivial. Generally, it requires finding the solution of an eigenvector problem, and with general types of inputs, the typical path forward is to use iterative numerical methods. Such methods are usually complex. In some cases, KLTs allow approximations by sinusoidal transforms – DCT-II likely the best-known example, but the number of such cases is limited, and usually constrained to very simple (1-st order) processes. However, as we will show in this paper, for some short sizes, KLTs can still be computed analytically, with only mild assumptions about the structure of the covariance matrix. For example, we show analytic solutions for arbitrary real symmetric 3x3 covariance matrixes. With symmetric 3-diagonal and some special cases of 5-diagonal matrices the solutions can also be found. In the end, we discuss a few possible applications of such transforms for image and video coding.
13137-41
21 August 2024 • 4:15 PM - 4:35 PM PDT
Show Abstract +
We review the history of the development of one of the most iconic tools in image and video coding – the zigzag scan. Despite its apparent obviousness, we will show that its development was a non-trivial process that took several years, multiple iterations, and multiple ideas that eventually led to the formation of its final "zigzag" shape. Remarkably, we also discover that early variants of the zigzag scan appeared before the invention of the DCT, intra-predictors, and many other techniques in image and video coding algorithms. It is one of the oldest and most fundamental techniques in this context. The paper also traces the evolution of image and video codec architectures over the last six decades and brings examples of uses of the zigzag scan in modern-era image and video coding standards.
13137-42
21 August 2024 • 4:35 PM - 4:55 PM PDT
Show Abstract +
We review the history of the development of transform-based image and video codecs and re-construct logical chains that led to the inventions of DCT, zigzag scan, adaptive coding, and hybrid DPCM + transform-based architecture. We also review the subsequent evolution of this architecture and explain the reasoning behind multiple transform choices in modern video codecs (HEVC, VVC, etc.), in-loop filters, etc. Finally, we also describe the role of fast transform algorithms in image and video codec evolution and will give an outlook on current developments, including increasing use of CNNs, learning methods, and performance-energy usage tradeoffs that may shape future architectures.
21 August 2024 • 5:00 PM - 5:45 PM PDT
Session Chair: Jennifer Barton, The Univ. of Arizona (United States)
5:00 PM - 5:05 PM:
Welcome and Opening Remarks
5:00 PM - 5:05 PM:
Welcome and Opening Remarks
13115-501
The route to attosecond pulses
(Plenary Presentation)
21 August 2024 • 5:05 PM - 5:45 PM PDT
Show Abstract +
When an intense laser interacts with a gas of atoms, high-order harmonics are generated. In the time domain, this radiation forms a train of extremely short light pulses, of the order of 100 attoseconds. Attosecond pulses allow the study of the dynamics of electrons in atoms and molecules, using pump-probe techniques. This presentation will highlight some of the key steps of the field of attosecond science.
Program Committee
California Polytechnic State Univ., San Luis Obispo (United States)
View call for papers
What you will need to submit
- Title
- Author(s) information
- Speaker biography (1000-character max including spaces)
- Abstract for technical review (200-300 words; text only)
- Summary of abstract for display in the program (50-150 words; text only)
- Keywords used in search for your paper (optional)
- Check the individual conference call for papers for additional requirements (i.e. extended abstract PDF upload for review or instructions for award competitions)