Despite the impressive performance of existing vision-guided robot grasping methods in dense clutter, their reliance on a fixed view often results in incomplete object geometry in the view boundary and limits grasping in more challenging large-scale dense clutter. Moreover, analyzing all objects during grasping can detract from the reasoning for specific objects. This work proposes the Monozone-centric Instance Grasping Policy (MCIGP) to solve these problems. Specifically, the first part is the Monozone View Alignment (MVA), wherein we design the dynamic monozone that can align the camera view according to different objects during grasping, thereby alleviating view boundary effects and realizing grasping in large-scale dense clutter scenarios. Then, we devise the Instance-specific Grasp Detection (ISGD) to predict and optimize grasp candidates for one specific object within the monozone, ensuring an in-depth analysis of this object. We performed over 8,000 real-world grasping experiments in different cluttered scenarios with 300 novel objects, demonstrating that MCIGP significantly outperforms seven competitive grasping methods. Notably, in a large-scale densely cluttered scene involving 100 different household goods, MCIGP pushed the grasp success rate to 84.9%. To the best of our knowledge, no previous work has demonstrated similar performance.
Highlights
Both incomplete object geometry and over-analyzing all objects in a scene can lead a grasping model to generate inferior grasp candidates, potentially causing objects to splash at high speed, posing safety risks to human workers. This work demonstrates that simply moving the camera toward a target object and analyzing only that object can substantially improve grasp success rates while alleviating these issues through solid experiments, offering feasible insights for achieving safe grasping in dense clutter.
The pipeline of MCIGP: moving camera (MVA) → segmentation (CPS) → grasp candidate generation → sampling and optimiztion (GCO & GCS & OGR) → final grasp.
Results - Comparison with First Group Baselines (Non-cluttered)
Results - Comparison with First Group Baselines (Piled 20 Objects)
Results - Comparison with Second Group Baselines (Piled 50 Objects)
Results - Comparison with GraspNet 6D (Piled 50 Objects)
Results - The Difference Between D-MVA and Q-MVA (Piled 10 Objects)
Results - The Impact of with or without CPS (Piled 100 Objects)
Results - The Impact of with or without GCO (Piled 100 Objects)
BibTeX
@article{li2025mcigp,
title={Monozone-Centric Instance Grasping Policy in Large-scale Dense Clutter},
author={Chenghao Li, and Nak Young Chong},
journal={IEEE/ASME Transactions on Mechatronics},
year={2025}
}