Before Imagen came out in May 2022, models like DALL·E, GLIDE, and Latent Diffusion had already made big progress in turning text into images. Imagen stood out because it used a large frozen T5 language model to understand text, which gave it much better results. It also found that making the text model bigger mattered more than making the image model bigger. Imagen produced very realistic 1024×1024 images and set a new quality benchmark. Google didn’t invent the idea but refined it in a smarter way, combining language and diffusion models more effectively.
Did Imagen really achieve something new, or did others already publish similar work, and Google is just taking credit for polishing it?
Original: English
1 month, 1 week ago by
ModernSlave
Log in to add a comment.
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Anonymous
- Mdi
- plxic
- ModernSlave
- klop
Comments (0)
No comments yet. Be the first to comment!