Cost Optimization at Early Stages of Design Using Deep Reinforcement Learning

With the increase in the complexity of the modern system on Chips(SoCs) and the demand for a lower time-to-market, automation becomes essential in hardware design. This is particularly relevant in complex/time-consuming tasks, as the optimization of design cost for a hardware component. Design cost, in fact, may depend on several objectives, as for the hardware-software trade-off. Given the complexity of this task, the designer often has no means to perform a fast and effective optimization in particular for larger and complex designs. In this paper, we introduce Deep Reinforcement Learning(DRL) for design cost optimization at the early stages of the design process. We first show that DRL is a perfectly suitable solution for the problem at hand. Afterward, by means of a Pointer Network, a neural network specifically applied for combinatorial problems, we benchmark three DRL algorithms towards the selected problem. Results obtained in different settings show the improvements achieved by DRL algorithms compared to conventional optimization methods. Additionally, by using reward redistribution proposed in the recently introduced RUDDER method, we obtain significant improvements in complex designs. Here, the obtained optimization is on average 15.18% on the area as well as 8.25% and 8.12% on the application size and execution time on a dataset of industrial hardware/software interface design

[1]  Yunguan Fu,et al.  Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization , 2018, ArXiv.

[2]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Robert Wille,et al.  Accurate Cost Estimation of Memory Systems Inspired by Machine Learning for Computer Vision , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Yinghui Xu,et al.  Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method , 2017, ArXiv.

[6]  Jürgen Teich,et al.  Model-Based Design Automation of Hardware/Software Co-Designs for Xilinx Zynq PSoCs , 2018, 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[7]  Wolfgang Ecker,et al.  Hardware-dependent Software: Principles and Practice , 2009 .

[8]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[9]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Wolfgang Ecker,et al.  Metamodeling and Code Generation in the Hardware/Software Interface Domain , 2017, Handbook of Hardware/Software Codesign.

[12]  Sepp Hochreiter,et al.  RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.

[13]  Quoc V. Le,et al.  Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.

[14]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[15]  Giovanni Righini,et al.  Heuristics from Nature for Hard Combinatorial Optimization Problems , 1996 .

[16]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[17]  Michael C. Ferris,et al.  Genetic Algorithms for Combinatorial Optimization: The Assemble Line Balancing Problem , 1994, INFORMS J. Comput..