Picture for Jixuan Chen

Jixuan Chen

Multi-view Crowd Tracking Transformer with View-Ground Interactions Under Large Real-World Scenes

Add code
Apr 21, 2026
Viaarxiv icon

CocoaBench: Evaluating Unified Digital Agents in the Wild

Add code
Apr 14, 2026
Viaarxiv icon

DeliveryBench: Can Agents Earn Profit in Real World?

Add code
Dec 22, 2025
Figure 1 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 2 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 3 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 4 for DeliveryBench: Can Agents Earn Profit in Real World?
Viaarxiv icon

OpenCUA: Open Foundations for Computer-Use Agents

Add code
Aug 12, 2025
Viaarxiv icon

OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Add code
Jun 12, 2025
Figure 1 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 2 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 3 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 4 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Viaarxiv icon

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Add code
May 19, 2025
Viaarxiv icon

Wan: Open and Advanced Large-Scale Video Generative Models

Add code
Mar 26, 2025
Figure 1 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 2 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 3 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 4 for Wan: Open and Advanced Large-Scale Video Generative Models
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Figure 1 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 2 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 3 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 4 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Viaarxiv icon

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Add code
Nov 12, 2024
Figure 1 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 2 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 3 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 4 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Viaarxiv icon

COMMA: A Communicative Multimodal Multi-Agent Benchmark

Add code
Oct 10, 2024
Figure 1 for COMMA: A Communicative Multimodal Multi-Agent Benchmark
Figure 2 for COMMA: A Communicative Multimodal Multi-Agent Benchmark
Figure 3 for COMMA: A Communicative Multimodal Multi-Agent Benchmark
Figure 4 for COMMA: A Communicative Multimodal Multi-Agent Benchmark
Viaarxiv icon