TY - GEN
T1 - Prehensile Robotic pick-and-place in clutter with Deep Reinforcement Learning
AU - Imtiaz, Muhammad Babar
AU - Lee, Brian
AU - Qiao, Yuansong (John)
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In this paper, we present a self-learning deep reinforcement learning-based framework for industrial pick-and-place tasks in a cluttered environment through intelligent prehensile robotic grasping. This approach aims to enable agents learn and perform pick and place regular and irregular objects in clutter through robotic grasping in order to enhance both quantity and quality in various industries. In order to do so, we design a Markov decision process (MDP) and deploy a model-free off-policy temporal difference algorithm Q-learning. We utilize end-to-end DenseNet-121 architecture fully convolutional network (FCN) in extended format for Q-function approximation. A pixelwise parameterization scheme is designed to calculate the pixelwise maps of action values. Rewards are allocated according to the success of the action performed. The proposed approach doesn't require any domain specifications, geometrical knowledge of objects or any extraordinary resources such as huge datasets or memory requirements. We have presented the training and testing results of our approach compared to its different variants and random density clutter sizes.
AB - In this paper, we present a self-learning deep reinforcement learning-based framework for industrial pick-and-place tasks in a cluttered environment through intelligent prehensile robotic grasping. This approach aims to enable agents learn and perform pick and place regular and irregular objects in clutter through robotic grasping in order to enhance both quantity and quality in various industries. In order to do so, we design a Markov decision process (MDP) and deploy a model-free off-policy temporal difference algorithm Q-learning. We utilize end-to-end DenseNet-121 architecture fully convolutional network (FCN) in extended format for Q-function approximation. A pixelwise parameterization scheme is designed to calculate the pixelwise maps of action values. Rewards are allocated according to the success of the action performed. The proposed approach doesn't require any domain specifications, geometrical knowledge of objects or any extraordinary resources such as huge datasets or memory requirements. We have presented the training and testing results of our approach compared to its different variants and random density clutter sizes.
KW - DenseNet-121
KW - Markov decision process
KW - Q-function
KW - Q-learning
KW - deep reinforcement learning
KW - fully convolutional network
KW - prehensile robotic grasping
UR - http://www.scopus.com/inward/record.url?scp=85138938782&partnerID=8YFLogxK
U2 - 10.1109/ICECET55527.2022.9873426
DO - 10.1109/ICECET55527.2022.9873426
M3 - Conference contribution
AN - SCOPUS:85138938782
T3 - International Conference on Electrical, Computer, and Energy Technologies, ICECET 2022
BT - International Conference on Electrical, Computer, and Energy Technologies, ICECET 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Electrical, Computer, and Energy Technologies, ICECET 2022
Y2 - 20 July 2022 through 22 July 2022
ER -