Summary:
-
Innovative AI Model
Waymo is advancing its autonomous driving technology by exploring the use of Google’s Gemini multimodal large language model (MLLM) to train its robotaxis. -
New Research Paper
The company introduced an End-to-End Multimodal Model for Autonomous Driving, referred to as EMMA, which processes sensor data to predict future trajectories for its vehicles. -
Shift from Traditional Approaches
Historically, autonomous systems relied on specific modules for functions like perception and planning, which faced scaling issues and difficulties adapting to novel environments. -
Advantages of Gemini
Waymo argues that MLLMs like Gemini offer broad “world knowledge” and improved reasoning capabilities, potentially addressing the limitations of traditional systems. -
Challenges Ahead
While EMMA shows promise in helping robotaxis navigate complex situations, there are limitations in processing 3D sensor inputs, and future research is necessary to mitigate risks associated with MLLMs, including potential inaccuracies.
Read more at: The Verge