RIKEN PREDICTION

RIKEN website

2023-04-25[Past Event]The 15th Prediction Science Seminar (2023-4-6)

The 15th Prediction Science Seminar was held on Thursday April 6th 2023, from 10:00 to 11:30, and was presented by Dr. Masahiro Suzuki from the University of Tokyo.
  The talk of Dr. Suzuki was focused on deep generative models. Deep generative models are a type of statistical models that can learn to create new data that is similar to existing samples. They have many practical applications, such as creating realistic images for video games, generating text for language translation, and even synthesizing new drugs for medical research.
  The basic mathematical framework underpinning deep generative models in general was outlined. The concept of variational autoencoders, along with some properties and variants, was presented in more depth. The talk next moves to conditional deep generative model, where the generative probabilistic process can be conditioned by an external variable, e.g. a user prompt. Diffusion models, which fall in this category, received a considerable amount of attention lately. The presentation sketches the inner working of the diffusion process, and features (image) results of prior works.
  Dr. Suzuki then spends some time on multimodal learning. This branch of machine learning focuses on models simultaneously dealing with different kinds of data, e.g. text and image. His work on JMVAE [Suzuki+ 16] is presented, as well as rough survey of multimodal VAEs [Suzuki+ 22].
  The final part of the presentation is about the use of deep generative models as world models, i.e. using generative process to simulate the mechanics of a given environment. Examples include world models for video games, which are used to complement traditional reinforcement learning approaches, generative query networks, capable of predicting what an observer would see from a scene at a given viewpoint. The use of large language models as world models if also presented. World models hinge on the latent representation of the environment. Research on this difficult topic, specifically object-centric representation is explained.
  Finally, Dr. Suzuki presents Pixyz, a Python library that offers an alternative programming paradigm for deep generative models. Pixyz enables a coding style that is closer to an actual mathematical formulation, and as such, an easier transition from theoretical results to concrete applications.
This talk was extremely well-received and prompted many questions from the audience.

Reference:
[Suzuki+ 16] Suzuki, Masahiro, Kotaro Nakayama, and Yutaka Matsuo. "Joint multimodal learning with deep generative models." arXiv preprint arXiv:1611.01891 (2016).
[Suzuki+ 22] Suzuki, Masahiro, and Yutaka Matsuo. "A survey of multimodal deep generative models." Advanced Robotics 36.5-6 (2022): 261-278.

Author: Cedric Ho Thanh

News 2023