skip to main content
research-article

FPGA Implementations of Kernel Normalised Least Mean Squares Processors

Authors Info & Claims
Published:15 December 2017Publication History
Skip Abstract Section

Abstract

Kernel adaptive filters (KAFs) are online machine learning algorithms which are amenable to highly efficient streaming implementations. They require only a single pass through the data and can act as universal approximators, i.e. approximate any continuous function with arbitrary accuracy. KAFs are members of a family of kernel methods which apply an implicit non-linear mapping of input data to a high dimensional feature space, permitting learning algorithms to be expressed entirely as inner products. Such an approach avoids explicit projection into the feature space, enabling computational efficiency. In this paper, we propose the first fully pipelined implementation of the kernel normalised least mean squares algorithm for regression. Independent training tasks necessary for hyperparameter optimisation fill pipeline stages, so no stall cycles to resolve dependencies are required. Together with other optimisations to reduce resource utilisation and latency, our core achieves 161 GFLOPS on a Virtex 7 XC7VX485T FPGA for a floating point implementation and 211 GOPS for fixed point. Our PCI Express based floating-point system implementation achieves 80% of the core’s speed, this being a speedup of 10× over an optimised implementation on a desktop processor and 2.66× over a GPU.

References

  1. Nikolaos Alachiotis and Alexandros Stamatakis. 2011. FPGA Optimizations for a Pipelined Floating-Point Exponential Unit. In Proceedings of the 7th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC 2011), Belfast, UK, March 23-25, 2011. Springer, Berlin, Heidelberg, 316--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Davide Anguita, Luca Carlino, Alessandro Ghio, and Sandro Ridella. 2011. A FPGA core generator for embedded classification systems. Journal of Circuits, Systems and Computers 20, 02, 263--282.Google ScholarGoogle ScholarCross RefCross Ref
  3. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 281--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Badong Chen, Songlin Zhao, Pingping Zhu, and José Carlos Principe. 2012. Quantized kernel least mean square algorithm. IEEE Transactions on Neural Networks and Learning Systems 23, 1, 22--32.Google ScholarGoogle ScholarCross RefCross Ref
  5. Badong Chen, Nanning Zheng, and Jose C. Principe. 2013. Survival kernel with application to kernel adaptive filtering. In The 2013 International Joint Conference on Neural Networks (IJCNN’13). IEEE, 1--6.Google ScholarGoogle Scholar
  6. Marc Claesen and Bart De Moor. 2015. Hyperparameter search in machine learning. In The XI Metaheuristics International Conference (MIC’15).Google ScholarGoogle Scholar
  7. David Cox and Nicolas Pinto. 2011. Beyond simple features: A large-scale feature search approach to unconstrained face recognition. In 2011 IEEE International Conference on Automatic Face 8 Gesture Recognition and Workshops (FG’11). IEEE, 8--15.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Detrey and F. de Dinechin. 2005. A parameterized floating-point exponential function for FPGAs. In Proceedings of 2005 IEEE International Conference on Field-Programmable Technology. 27--34.Google ScholarGoogle ScholarCross RefCross Ref
  9. Scott C. Douglas, Quanhong Zhu, and Kent F. Smith. 1998. A pipelined LMS adaptive FIR filter architecture without adaptation delay. IEEE Transactions on Signal Processing 46, 3, 775--779. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. J. Fraser, D. J. M. Moss, JunKyu Lee, S. Tridgell, C. T. Jin, and P. H. W. Leong. 2015. A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation. In 25th International Conference on Field Programmable Logic and Applications (FPL’15). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  11. Matthew Jacobsen, Yoav Freund, and Ryan Kastner. 2012. RIFFA: A reusable integration framework for FPGA accelerators.. In IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE, 216--219. http://dblp.uni-trier.de/db/conf/fccm/fccm2012.html/#JacobsenFK12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Jamro, K. Wiatr, and M. Wielgosz. 2007. FPGA implementation of 64-bit exponential function for HPC. In International Conference on Field Programmable Logic and Applications (FPL’07). 718--721.Google ScholarGoogle Scholar
  13. Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8, 2165--2176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Neil Lawrence, Matthias Seeger, and Ralf Herbrich. 2003. Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems. 609--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathemational Software 5, 3, 308--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Weifeng Liu, José C. Príncipe, and Simon Haykin. 2011. Kernel Adaptive Filtering: A Comprehensive Introduction. Vol. 57. John Wiley 8 Sons, Hoboken, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guoz-hu Long, Fuyun Ling, and John G. Proakis. 1989. The LMS algorithm with delayed coefficient adaptation. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 9, 1397--1405.Google ScholarGoogle ScholarCross RefCross Ref
  18. Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, and Hans Peter Graf. 2012. A massively parallel, energy efficient programmable accelerator for learning and classification. ACM Transactions on Architecture and Code Optimization 9, 1, Article 6, 30 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yeyong Pang, Shaojun Wang, Yu Peng, Nicholas J. Fraser, and Philip H. W. Leong. 2013. A low latency kernel recursive least squares processor using FPGA technology. In FPT. 144--151.Google ScholarGoogle Scholar
  20. M. Papadonikolakis and C. Bouganis. 2008. A scalable FPGA architecture for non-linear SVM training. In International Conference on ICECE Technology (FPT’08). 337--340.Google ScholarGoogle Scholar
  21. Nicolas Pinto, David Doukhan, James J. DiCarlo, and David D. Cox. 2009. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLOS Computational Biology 5, 11, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  22. John Platt and others. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. https://www.microsoft.com/en-us/research/publication/sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/.Google ScholarGoogle Scholar
  23. Rainer D. Poltmann. 1995. Conversion of the delayed LMS algorithm into the LMS algorithm. Signal IEEE Processing Letters 2, 12, 223.Google ScholarGoogle ScholarCross RefCross Ref
  24. Robin Pottathuparambil and Ron Sass. 2009. A parallel/vectorized double-precision exponential core to accelerate computational science applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’09). ACM, New York, NY,285--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Carl E. Rasmussen and Christoper K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xiaowei Ren, Pengju Ren, Badong Chen, Tai Min, and Nanning Zheng. 2014. Hardware implementation of KLMS algorithm using FPGA. In 2014 International Joint Conference on Neural Networks (IJCNN’14). IEEE, 2276--2281.Google ScholarGoogle ScholarCross RefCross Ref
  27. Cédric Richard, José Carlos M. Bermudez, and Paul Honeine. 2009. Online prediction of time series data with kernels. IEEE Transactions on Signal Processing, 57, 3, 1058--1067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bernhard Scholkopf and Alexander J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Matthias Seeger. 2000. Relationships between Gaussian processes, support vector machines and smoothing splines. Machine Learning).Google ScholarGoogle Scholar
  30. Stephen Tridgell, Duncan J. M. Moss, Nicholas J. Fraser, and Philip H. W. Leong. 2015. Braiding: A scheme for resolving hazards in NORMA. In Proceedings of the International Conference on Field Programmable Technology (FPT’15). 136--143.Google ScholarGoogle Scholar
  31. Steven Van Vaerenbergh. 2012. Kernel Methods Toolbox KAFBOX: a Matlab benchmarking toolbox for kernel adaptive filtering. Retrieved October 1, 2017 at http://sourceforge.net/p/kafbox.Google ScholarGoogle Scholar
  32. S. Van Vaerenbergh, J. Via, and I. Santamaria. 2006. A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06). Vol. 5. 789--792.Google ScholarGoogle Scholar
  33. R. Clint Whaley and Antoine Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience 35, 2, 101--121. http://www.cs.utsa.edu/whaley/papers/spercw04.ps. Google ScholarGoogle ScholarCross RefCross Ref
  34. B. Widrow and M. E. Hoff Jr. 1960. Adaptive switching circuits. In IRE WESCON Convention Record. 96--104.Google ScholarGoogle Scholar
  35. Maciej Wielgosz, Ernest Jamro, and Kazimierz Wiatr. 2008. Highly efficient structure of 64-bit exponential function implemented in FPGAs. In Proceedings of the 4th International Workshop, Reconfigurable Computing: Architectures, Tools and Applications (ARC’08), London, UK, March 26-28, 2008. Roger Woods, Katherine Compton, Christos Bouganis, and Pedro C. Diniz (Eds.). Springer, Berlin, 274--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. James H. Wilkinson. 1994. Rounding Errors in Algebraic Processes. Dover Publications, Incorporated, Mineola, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhang Xianyi, Wang Qian, and Zaheer Chothia. 2014. Openblas. Retrieved October 1, 2017 from http://xianyi.github.io/OpenBLAS.Google ScholarGoogle Scholar
  38. Ying Yi, Roger Woods, Lok-Kee Ting, and CFN Cowan. 2005. High speed FPGA-based implementations of delayed-LMS filters. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 39, 1--2, 113--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Yukawa. 2012. Multikernel adaptive filtering. IEEE Transactions on Signal Processing 60, 9, 4672--4682. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FPGA Implementations of Kernel Normalised Least Mean Squares Processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 4
      December 2017
      119 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/3166118
      • Editor:
      • Steve Wilton
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 December 2017
      • Accepted: 1 June 2017
      • Revised: 1 January 2017
      • Received: 1 April 2016
      Published in trets Volume 10, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader