TY - JOUR
T1 - PolyDexFrame
T2 - Deep Reinforcement Learning-Based Pick-and-Place of Objects in Clutter
AU - Imtiaz, Muhammad Babar
AU - Qiao, Yuansong
AU - Lee, Brian
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/8
Y1 - 2024/8
N2 - This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile robotic manipulations involving different robotic grippers in a completely self-supervised manner. The problem was tackled as a reinforcement learning problem; after the Markov decision process (MDP) was designed, the off-policy model-free Q-learning algorithm was deployed using deep Q-networks as a Q-function approximator. Four distinct robotic manipulations, i.e., grasp from the prehensile manipulation category and inward slide, outward slide, and suction grip from the non-prehensile manipulation category were considered as actions. The Q-function comprised four fully convolutional networks (FCN) corresponding to each action based on memory-efficient DenseNet-121 variants outputting pixel-wise maps of action-values jointly trained via the pixel-wise parametrization technique. Rewards were awarded according to the status of the action performed, and backpropagation was conducted accordingly for the FCN generating the maximum Q-value. The results showed that the agent learned the sequential combination of the polydexterous prehensile and non-prehensile manipulations, where the non-prehensile manipulations increased the possibility of prehensile manipulations. We achieved promising results in comparison to the baselines, differently designed variants, and density-based testing clutter.
AB - This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile robotic manipulations involving different robotic grippers in a completely self-supervised manner. The problem was tackled as a reinforcement learning problem; after the Markov decision process (MDP) was designed, the off-policy model-free Q-learning algorithm was deployed using deep Q-networks as a Q-function approximator. Four distinct robotic manipulations, i.e., grasp from the prehensile manipulation category and inward slide, outward slide, and suction grip from the non-prehensile manipulation category were considered as actions. The Q-function comprised four fully convolutional networks (FCN) corresponding to each action based on memory-efficient DenseNet-121 variants outputting pixel-wise maps of action-values jointly trained via the pixel-wise parametrization technique. Rewards were awarded according to the status of the action performed, and backpropagation was conducted accordingly for the FCN generating the maximum Q-value. The results showed that the agent learned the sequential combination of the polydexterous prehensile and non-prehensile manipulations, where the non-prehensile manipulations increased the possibility of prehensile manipulations. We achieved promising results in comparison to the baselines, differently designed variants, and density-based testing clutter.
KW - deep Q-network
KW - deep reinforcement learning
KW - DenseNet-121
KW - fully convolutional network
KW - Markov decision process
KW - non-prehensile
KW - pixelwise-parameterization
KW - polydexterous
KW - prehensile
KW - robotic manipulation
UR - http://www.scopus.com/inward/record.url?scp=85202638827&partnerID=8YFLogxK
U2 - 10.3390/machines12080547
DO - 10.3390/machines12080547
M3 - Article
AN - SCOPUS:85202638827
SN - 2075-1702
VL - 12
JO - Machines
JF - Machines
IS - 8
M1 - 547
ER -