This report introduces the notion of trivial computation, where the appearance of simple operands reduces the complexity of a potentially difficult operation. An example of a trivial operation is integer divide-by-two; the division becomes a simple shift operation. Also discussed is the concept of redundant computation, where some operation repeatedly does the same function because it repeatedly sees the same operands. Using two separate benchmark suites, the SPEC benchmarks and the Perfect Club, and concentrating on multiplication, we find a surprising amount of trivial and redundant operation. Various architectural means of exploiting this knowledge to improve computational efficiency include detection of trivial operands, memoization, and the result cache.
Cited By
- Vicarte J, Shome P, Nayak N, Trippel C, Morrison A, Kohlbrenner D and Fletcher C Opening pandora's box Proceedings of the 48th Annual International Symposium on Computer Architecture, (347-360)
- Suresh A, Rohou E and Seznec A Compile-time function memoization Proceedings of the 26th International Conference on Compiler Construction, (45-54)
- Calderón A, García A, García-Carballeira F, Carretero J and Fernández J (2016). Improving performance using computational compression through memoization, International Journal of High Performance Computing Applications, 30:4, (469-485), Online publication date: 1-Nov-2016.
- Baudisch D and Schneider K (2015). Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks, International Journal of Parallel Programming, 43:1, (86-129), Online publication date: 1-Feb-2015.
- Calderón A, Carretero J, García-Carballeira F, Fernandez J, Higuero D and Bergua B Improving MPI applications with a new MPI_Info and the use of the memoization Proceedings of the 20th European MPI Users' Group Meeting, (7-12)
- Long G, Franklin D, Biswas S, Ortiz P, Oberg J, Fan D and Chong F Minimal Multi-threading Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, (337-348)
- Choi B and Cho J (2008). Partial resolution for redundant operation table, Microprocessors & Microsystems, 32:2, (79-94), Online publication date: 1-Mar-2008.
- Vandierendonck H, Manet P, Delavallee T, Loiselle I and Legat J By-passing the out-of-order execution pipeline to increase energy-efficiency Proceedings of the 4th international conference on Computing frontiers, (97-104)
- Cheng X and Hsiao M Region-level approximate computation reuse for power reduction in multimedia applications Proceedings of the 2005 international symposium on Low power electronics and design, (119-122)
- Hirasawa S and Hiraki K Utilizing dynamic data value localities in internal variables Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies, (305-309)
- Kumar K (2003). Value reuse optimization, ACM SIGPLAN Notices, 38:8, (60-66), Online publication date: 1-Aug-2003.
- Huang J and Lilja D (2003). Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse, IEEE Transactions on Computers, 52:8, (1032-1050), Online publication date: 1-Aug-2003.
- Chung E, Benini L and De Micheli G Automatic source code specialization for energy reduction Proceedings of the 2001 international symposium on Low power electronics and design, (80-83)
- Huang J and Lilja D (2000). Extending Value Reuse to Basic Blocks with Compiler Support, IEEE Transactions on Computers, 49:4, (331-347), Online publication date: 1-Apr-2000.
- Connors D and Hwu W Compiler-directed dynamic computation reuse Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, (158-169)
- Molina C, González A and Tubella J Dynamic removal of redundant computations Proceedings of the 13th international conference on Supercomputing, (474-481)
- Sazeides Y and Smith J (1999). Limits of Data Value Predictability, International Journal of Parallel Programming, 27:4, (229-256), Online publication date: 1-Aug-1999.
- Sazeides Y and Smith J The predictability of data values Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, (248-258)
- Sodani A and Sohi G (1997). Dynamic instruction reuse, ACM SIGARCH Computer Architecture News, 25:2, (194-205), Online publication date: 1-May-1997.
- Sodani A and Sohi G Dynamic instruction reuse Proceedings of the 24th annual international symposium on Computer architecture, (194-205)
- Azam M, Franzon P and Liu W Low power data processing by elimination of redundant computations Proceedings of the 1997 international symposium on Low power electronics and design, (259-264)
- Lipasti M and Shen J Exceeding the dataflow limit via value prediction Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, (226-237)
- Lipasti M, Wilkerson C and Shen J (1996). Value locality and load value prediction, ACM SIGPLAN Notices, 31:9, (138-147), Online publication date: 1-Sep-1996.
- Lipasti M, Wilkerson C and Shen J (1996). Value locality and load value prediction, ACM SIGOPS Operating Systems Review, 30:5, (138-147), Online publication date: 1-Dec-1996.
- Lipasti M, Wilkerson C and Shen J Value locality and load value prediction Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, (138-147)
- Cmelik B and Keppel D (1994). Shade: a fast instruction-set simulator for execution profiling, ACM SIGMETRICS Performance Evaluation Review, 22:1, (128-137), Online publication date: 1-May-1994.
- Cmelik B and Keppel D Shade: a fast instruction-set simulator for execution profiling Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, (128-137)
Index Terms
- Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
Recommendations
A Combined Decimal and Binary Floating-Point Multiplier
ASAP '09: Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and ProcessorsIn this paper, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 754-2008 Floating-point Standard. The multiplier design operates on either (1) 64-bit binary encoded ...
Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set
Redundant notations are used implicitly or explicitly in many digital designs. They have been studied in details and a general framework is known to reduce the redundancy of a notation down to the minimally redundant digit set. We present here an ...
On high-performance parallel decimal fixed-point multiplier designs
Display Omitted Simplified logic and structure for 8421-5421 BCD multiplier are proposed.With a novel 4221 BCD full adder, 4221-8421 conversion is no longer needed.4221 BCD multiplier using the new 4221 BCD full adder is proposed.The proposed 8421-5421 ...