### A Designer's Guide to Asynchronous VLSI Create low power, higher performance circuits with shorter design times using this practical guide to asynchronous design. This practical alternative to conventional synchronous design enables performance close to full-custom designs with design times that approach commercially available ASIC standard cell flows. It includes design trade-offs, specific design examples, and end-of-chapter exercises. Emphasis throughout is placed on practical techniques and real-world applications, making this ideal for circuit design students interested in alternative design styles and system-on-chip circuits, as well as for circuit designers in industry who need new solutions to old problems. **Peter A. Beerel** is CEO of TimeLess Design Automation – his own company commercializing asynchronous VLSI tools and libraries – and an Associate Professor in the Electrical Engineering Department at the University of Southern California (USC). Dr. Beerel has 15 years' experience of research and teaching in asynchronous VLSI and has received numerous awards including the VSoE Outstanding Teaching Award in 1997 and the 2008 IEEE Region 6 Outstanding Engineer Award for significantly advancing the application of asynchronous circuits to modern VLSI chips. **Recep 0. Ozdag** is IC Design Manager at Fulcrum Microsystems and a part-time Lecturer at USC, where he received his Ph.D. in 2004. **Marcos Ferretti** is one of the founders of PST Electrônica S. A. (Positron), Brazil – an automotive electronic systems manufacturing company – where he is currently Vice-President. He received his Ph.D. from USC in 2004 and was co-recipient of the USC Electrical Engineering-Systems Best Paper Award in the same year. # A Designer's Guide to Asynchronous VLSI PETER A. BEEREL University of Southern California/Timeless Design Automation RECEP O. OZDAG Fulcrum Microsystems Inc. MARCOS FERRETTI PST Electrônica S. A. (Positron) > CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Dubai, Tokyo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521872447 © Cambridge University Press 2010 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2010 Printed in the United Kingdom at the University Press, Cambridge A catalog record for this publication is available from the British Library Library of Congress Cataloging-in-Publication Data Beerel, Peter A. A Designer's Guide to Asynchronous VLSI / Peter A. Beerel, Recep O. Ozdag, Marcos Ferretti. p. cm. ISBN 978-0-521-87244-7 (Hardback) Integrated circuits-Very large scale integration-Computer-aided design. Integrated circuits-Very large scale integration-Design and construction. Ozdag, Recep O. II. Ferretti, Marcos. Title. TK7888.4.B44 2010 621.39'5-dc22 2009042290 ISBN 978-0-521-87244-7 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. To Janet, Kira, and Kim – P. A. B. To Jemelle and Charlotte – R. O. O. To Lilione – M. F. ## **Contents** | | Ack | page xi | | |---|-------|---------------------------------------------|----| | 1 | Intro | 1 | | | | 1.1 | Synchronous design basics | 2 | | | 1.2 | Challenges in synchronous design | 4 | | | 1.3 | Asynchronous design basics | 5 | | | 1.4 | Asynchronous design flows | 6 | | | 1.5 | Potential advantages of asynchronous design | 7 | | | 1.6 | Challenges in asynchronous design | 10 | | | 1.7 | Organization of the book | 11 | | 2 | Cha | 16 | | | | 2.1 | Asynchronous channels | 16 | | | 2.2 | Sequencing and concurrency | 24 | | | 2.3 | Asynchronous memories and holding state | 30 | | | 2.4 | Arbiters | 33 | | | 2.5 | Design examples | 36 | | | 2.6 | Exercises | 40 | | 3 | Mod | 43 | | | | 3.1 | Communicating sequential processes | 44 | | | 3.2 | Using asynchronous-specific languages | 46 | | | 3.3 | Using software programming languages | 47 | | | 3.4 | Using existing hardware design languages | 47 | | | 3.5 | Modeling channel communication in Verilog | 48 | | | 3.6 | Implementing VerilogCSP macros | 55 | | | 3.7 | Debugging in VerilogCSP | 58 | | | 3.8 | Summary of VerilogCSP macros | 61 | | | 3.9 | Exercises | 62 | | 4 | Pipelii | 66 | | | |---|---------------------------------------|-------------------------------------------------|-----|--| | | 4.1 | Block metrics | 67 | | | | 4.2 | Linear pipelines | 69 | | | | 4.3 | Pipeline loops | 73 | | | | 4.4 | Forks and joins | 79 | | | | 4.5 | More complex pipelines | 81 | | | | 4.6 | Exercises | 82 | | | 5 | Performance analysis and optimization | | | | | | 5.1 | Petri nets | 84 | | | | 5.2 | Modeling pipelines using channel nets | 88 | | | | 5.3 | Performance analysis | 90 | | | | 5.4 | Performance optimization | 96 | | | | 5.5 | Advanced topic: stochastic performance analysis | 100 | | | | 5.6 | Exercises | 102 | | | 6 | Deadlock | | | | | | 6.1 | Deadlock caused by incorrect circuit design | 107 | | | | | Deadlock caused by architectural token mismatch | 108 | | | | 6.3 | Deadlock caused by arbitration | 110 | | | 7 | A taxonomy of design styles | | | | | | 7.1 | Delay models | 116 | | | | | Timing constraints | 118 | | | | 7.3 | Input–output mode versus fundamental mode | 119 | | | | 7.4 | Logic styles | 119 | | | | 7.5 | Datapath design | 123 | | | | 7.6 | Design flows: an overview of approaches | 129 | | | | 7.7 | Exercises | 132 | | | 8 | Synthesis-based controller design | | | | | | 8.1 | Fundamental-mode Huffman circuits | 136 | | | | 8.2 | STG-based design | 146 | | | | 8.3 | Exercises | 149 | | | 9 | Micropipeline design | | | | | | 9.1 | Two-phase micropipelines | 152 | | | | 9.2 | Four-phase micropipelines | 159 | | | | 9.3 | True-four-phase pipelines | 162 | | | | 9.4 | Delay line design | 164 | | | | | | Contents | ix | |----|--------|-----------------------------------------------|----------|-----| | | | | | | | | 9.5 | Other micropipeline techniques | | 168 | | | 9.6 | Exercises | | 169 | | 10 | Syntax | x-directed translation | | 172 | | | 10.1 | Tangram | | 173 | | | 10.2 | Handshake components | | 174 | | | 10.3 | Translation algorithm | | 176 | | | 10.4 | Control component implementation | | 177 | | | 10.5 | Datapath component implementations | | 178 | | | 10.6 | Peephole optimizations | | 187 | | | 10.7 | Self-initialization | | 188 | | | 10.8 | Testability | | 189 | | | 10.9 | Design examples | | 192 | | | | Summary | | 196 | | | 10.11 | Exercises | | 197 | | 11 | Quasi- | -delay-insensitive pipeline templates | | 200 | | | 11.1 | Weak-conditioned half buffer | | 200 | | | 11.2 | Precharged half buffer | | 204 | | | 11.3 | Precharged full buffer | | 216 | | | 11.4 | Why input-completion sensing? | | 217 | | | 11.5 | Reduced-stack precharged half buffer (RSPCHB) | | 220 | | | 11.6 | Reduced-stack precharged full buffer (RSPCFB) | | 229 | | | 11.7 | Quantitative comparisons | | 232 | | | 11.8 | Token insertion | | 232 | | | 11.9 | Arbiter | | 236 | | | 11.10 | Exercises | | 238 | | 12 | Timed | pipeline templates | | 240 | | | 12.1 | Williams' PS0 pipeline | | 240 | | | 12.2 | Lookahead pipelines overview | | 242 | | | 12.3 | Dual-rail lookahead pipelines | | 242 | | | 12.4 | Single-rail lookahead pipelines | | 247 | | | 12.5 | High-capacity pipelines (single-rail) | | 250 | | | 12.6 | Designing non-linear pipeline structures | | 253 | | | 12.7 | Lookahead pipelines (single-rail) | | 255 | | | 12.8 | Lookahead pipelines (dual-rail) | | 257 | | | 12.9 | High-capacity pipelines (single-rail) | | 259 | | | 12.10 | Conditionals | | 262 | | | 12.11 | Loops | | 263 | | | 12.12 | Simulation results | | 264 | | | 12.13 | Summary | | 266 | x Contents | 13 | Single | -track pipeline templates | 26′ | | |----|------------------------------------|--------------------------------------------------|-----|--| | | 13.1 | Introduction | 267 | | | | 13.2 | GasP bundled data | 269 | | | | 13.3 | Pulsed logic | 270 | | | | 13.4 | Single-track full-buffer template | 27 | | | | 13.5 | STFB pipeline stages | 27: | | | | 13.6 | STFB standard-cell implementation | 283 | | | | 13.7 | Back-end design flow and library development | 290 | | | | 13.8 | The evaluation and demonstration chip | 290 | | | | 13.9 | Conclusions and open questions | 299 | | | | 13.10 | Exercises | 300 | | | 14 | Asynchronous crossbar | | | | | | 14.1 | Fulcrum's Nexus asynchronous crossbar | 303 | | | | 14.2 | Clock domain converter | 309 | | | 15 | Design example: the Fano algorithm | | | | | | 15.1 | The Fano algorithm | 313 | | | | 15.2 | The asynchronous Fano algorithm | 32 | | | | 15.3 | An asynchronous semi-custom physical design flow | 329 | | | | Index | | 330 | | ## **Acknowledgments** There are many people that helped make this book a reality that deserve our thanks and recognition. First, we'd like to thank many colleagues that we had the pleasure of working with and whose research has shaped our understanding of asynchronous design. This included Dr. Peter Beerel's advisors Professors Teresa Meng and David Dill at Stanford University, Professor Kenneth Yun of University of San Diego, Professors Chris Myers and Kenneth Stevens of University of Utah, Professor Steven Nowick at Columbia University, and Professor Steve Furber at Manchester University. Special thanks goes to Dr. Ivan Sutherland and Marly Roncken, now at Portland State University, for their leadership, vision, support, and encouragement. Moreover, we'd like to thank other members of the USC asynchronous VLSI/CAD research group. This included Dr. Sunan Tugsinavisut who drafted an early version of Chapter 9 on Micropipelines and Dr. Sangyun Kim who helped write and early version of Chapters 4 and 5 on performance analysis. In addition, we'd like to acknowledge Arash Saifhashemi for his help in drafting Chatper 3 and in particular developing the VerilogCSP modeling support for asynchronous designs. Other former students that also supported this work include Dr. Aiguo Xie, Mallika Prakash, Gokul Govindu, Amit Bandlish, Prasad Joshi, and Pankaj Golani. Special thanks go to Dr. Georgios Dimou who is helping take the culmination of much of this work to market by co-founding TimeLess Design Automation with Dr. Peter Beerel. We would also like to recognize the various industrial and government funding sources that made much of our research possible, including Intel, NSF, and Semiconductor Research Corporation. In particular, much of the research on asynchronous pipelines was supported via a NSF Large Scale ITR research grant that funded both Dr. Recep Ozdeg and Dr. Marcos Ferretti's thesis work on advanced asynchronous pipelines and the developed back-end ASIC flow. Also, we would like to recognize the support from the MOSIS Education Program and Fulcrum Microsystems in the fabrication and test of the demonstration design presented in Chapter 13. We also thank all our colleagues at USC for their continued support and guidance and in particular Professors Mel Breuer, Massoud Pedram, Alice Parker, and Sandeep Gupta. We also acknowledge the competence and professionalism of the Cambridge University Press personnel, especially Susan Parkinson for the great help χij #### **Acknowledgments** reviewing the book and the reviewers for their excellent feedback. In particular, Marly Roncken provided invaluable feedback on Chapter 10, Syntax-Directed Translation. Moreover, the students of EE 552, Asynchronous VLSI at USC identified and helped fix many bugs in early drafts of the book. Lastly, we'd like to thank the support of our significant others without whose support and understanding this three-year effort could never have been completed, including Janet A. Martin, Jemelle A. Pulido-Ozdag, and Lilione Sousa Ferretti.