The code release will be delayed. Thanks for your patience! We evaluate the model's multi-modal capabilities on five major categories of multi-modal tasks: Referring Expression Comprehension, ...