research-article

FPGA Implementations of Kernel Normalised Least Mean Squares Processors

Authors:
Nicholas J. Fraser

School of Electrical and Information Engineering, The University of Sydney, Australia

School of Electrical and Information Engineering, The University of Sydney, Australia
View Profile

,
Junkyu Lee

School of Electrical and Information Engineering, The University of Sydney, Australia

School of Electrical and Information Engineering, The University of Sydney, Australia
View Profile

,
Duncan J. M. Moss

School of Electrical and Information Engineering, The University of Sydney, Australia

School of Electrical and Information Engineering, The University of Sydney, Australia
View Profile

,
Julian Faraone

School of Electrical and Information Engineering, The University of Sydney, Australia

School of Electrical and Information Engineering, The University of Sydney, Australia
View Profile

,
Stephen Tridgell

School of Electrical and Information Engineering, The University of Sydney, Australia

School of Electrical and Information Engineering, The University of Sydney, Australia
View Profile

,
Craig T. Jin

School of Electrical and Information Engineering, The University of Sydney, Australia

School of Electrical and Information Engineering, The University of Sydney, Australia
View Profile

,
Philip H. W. Leong

School of Electrical and Information Engineering, The University of Sydney, Australia

School of Electrical and Information Engineering, The University of Sydney, Australia
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 10 Issue 4Article No.: 26pp 1–20https://doi.org/10.1145/3106744

Published:15 December 2017Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Kernel adaptive filters (KAFs) are online machine learning algorithms which are amenable to highly efficient streaming implementations. They require only a single pass through the data and can act as universal approximators, i.e. approximate any continuous function with arbitrary accuracy. KAFs are members of a family of kernel methods which apply an implicit non-linear mapping of input data to a high dimensional feature space, permitting learning algorithms to be expressed entirely as inner products. Such an approach avoids explicit projection into the feature space, enabling computational efficiency. In this paper, we propose the first fully pipelined implementation of the kernel normalised least mean squares algorithm for regression. Independent training tasks necessary for hyperparameter optimisation fill pipeline stages, so no stall cycles to resolve dependencies are required. Together with other optimisations to reduce resource utilisation and latency, our core achieves 161 GFLOPS on a Virtex 7 XC7VX485T FPGA for a floating point implementation and 211 GOPS for fixed point. Our PCI Express based floating-point system implementation achieves 80% of the core’s speed, this being a speedup of 10× over an optimised implementation on a desktop processor and 2.66× over a GPU.

References

Nikolaos Alachiotis and Alexandros Stamatakis. 2011. FPGA Optimizations for a Pipelined Floating-Point Exponential Unit. In Proceedings of the 7th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC 2011), Belfast, UK, March 23-25, 2011. Springer, Berlin, Heidelberg, 316--327. Google ScholarDigital Library
Davide Anguita, Luca Carlino, Alessandro Ghio, and Sandro Ridella. 2011. A FPGA core generator for embedded classification systems. Journal of Circuits, Systems and Computers 20, 02, 263--282.Google ScholarCross Ref
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 281--305. Google ScholarDigital Library
Badong Chen, Songlin Zhao, Pingping Zhu, and José Carlos Principe. 2012. Quantized kernel least mean square algorithm. IEEE Transactions on Neural Networks and Learning Systems 23, 1, 22--32.Google ScholarCross Ref
Badong Chen, Nanning Zheng, and Jose C. Principe. 2013. Survival kernel with application to kernel adaptive filtering. In The 2013 International Joint Conference on Neural Networks (IJCNN’13). IEEE, 1--6.Google Scholar
Marc Claesen and Bart De Moor. 2015. Hyperparameter search in machine learning. In The XI Metaheuristics International Conference (MIC’15).Google Scholar
David Cox and Nicolas Pinto. 2011. Beyond simple features: A large-scale feature search approach to unconstrained face recognition. In 2011 IEEE International Conference on Automatic Face 8 Gesture Recognition and Workshops (FG’11). IEEE, 8--15.Google ScholarCross Ref
J. Detrey and F. de Dinechin. 2005. A parameterized floating-point exponential function for FPGAs. In Proceedings of 2005 IEEE International Conference on Field-Programmable Technology. 27--34.Google ScholarCross Ref
Scott C. Douglas, Quanhong Zhu, and Kent F. Smith. 1998. A pipelined LMS adaptive FIR filter architecture without adaptation delay. IEEE Transactions on Signal Processing 46, 3, 775--779. Google ScholarDigital Library
N. J. Fraser, D. J. M. Moss, JunKyu Lee, S. Tridgell, C. T. Jin, and P. H. W. Leong. 2015. A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation. In 25th International Conference on Field Programmable Logic and Applications (FPL’15). 1--6.Google ScholarCross Ref
Matthew Jacobsen, Yoav Freund, and Ryan Kastner. 2012. RIFFA: A reusable integration framework for FPGA accelerators.. In IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE, 216--219. http://dblp.uni-trier.de/db/conf/fccm/fccm2012.html/#JacobsenFK12. Google ScholarDigital Library
E. Jamro, K. Wiatr, and M. Wielgosz. 2007. FPGA implementation of 64-bit exponential function for HPC. In International Conference on Field Programmable Logic and Applications (FPL’07). 718--721.Google Scholar
Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8, 2165--2176. Google ScholarDigital Library
Neil Lawrence, Matthias Seeger, and Ralf Herbrich. 2003. Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems. 609--616. Google ScholarDigital Library
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathemational Software 5, 3, 308--323. Google ScholarDigital Library
Weifeng Liu, José C. Príncipe, and Simon Haykin. 2011. Kernel Adaptive Filtering: A Comprehensive Introduction. Vol. 57. John Wiley 8 Sons, Hoboken, NJ. Google ScholarDigital Library
Guoz-hu Long, Fuyun Ling, and John G. Proakis. 1989. The LMS algorithm with delayed coefficient adaptation. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 9, 1397--1405.Google ScholarCross Ref
Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, and Hans Peter Graf. 2012. A massively parallel, energy efficient programmable accelerator for learning and classification. ACM Transactions on Architecture and Code Optimization 9, 1, Article 6, 30 pages. Google ScholarDigital Library
Yeyong Pang, Shaojun Wang, Yu Peng, Nicholas J. Fraser, and Philip H. W. Leong. 2013. A low latency kernel recursive least squares processor using FPGA technology. In FPT. 144--151.Google Scholar
M. Papadonikolakis and C. Bouganis. 2008. A scalable FPGA architecture for non-linear SVM training. In International Conference on ICECE Technology (FPT’08). 337--340.Google Scholar
Nicolas Pinto, David Doukhan, James J. DiCarlo, and David D. Cox. 2009. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLOS Computational Biology 5, 11, 1--12.Google ScholarCross Ref
John Platt and others. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. https://www.microsoft.com/en-us/research/publication/sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/.Google Scholar
Rainer D. Poltmann. 1995. Conversion of the delayed LMS algorithm into the LMS algorithm. Signal IEEE Processing Letters 2, 12, 223.Google ScholarCross Ref
Robin Pottathuparambil and Ron Sass. 2009. A parallel/vectorized double-precision exponential core to accelerate computational science applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’09). ACM, New York, NY,285--285. Google ScholarDigital Library
Carl E. Rasmussen and Christoper K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. Google ScholarDigital Library
Xiaowei Ren, Pengju Ren, Badong Chen, Tai Min, and Nanning Zheng. 2014. Hardware implementation of KLMS algorithm using FPGA. In 2014 International Joint Conference on Neural Networks (IJCNN’14). IEEE, 2276--2281.Google ScholarCross Ref
Cédric Richard, José Carlos M. Bermudez, and Paul Honeine. 2009. Online prediction of time series data with kernels. IEEE Transactions on Signal Processing, 57, 3, 1058--1067. Google ScholarDigital Library
Bernhard Scholkopf and Alexander J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA. Google ScholarDigital Library
Matthias Seeger. 2000. Relationships between Gaussian processes, support vector machines and smoothing splines. Machine Learning).Google Scholar
Stephen Tridgell, Duncan J. M. Moss, Nicholas J. Fraser, and Philip H. W. Leong. 2015. Braiding: A scheme for resolving hazards in NORMA. In Proceedings of the International Conference on Field Programmable Technology (FPT’15). 136--143.Google Scholar
Steven Van Vaerenbergh. 2012. Kernel Methods Toolbox KAFBOX: a Matlab benchmarking toolbox for kernel adaptive filtering. Retrieved October 1, 2017 at http://sourceforge.net/p/kafbox.Google Scholar
S. Van Vaerenbergh, J. Via, and I. Santamaria. 2006. A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06). Vol. 5. 789--792.Google Scholar
R. Clint Whaley and Antoine Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience 35, 2, 101--121. http://www.cs.utsa.edu/whaley/papers/spercw04.ps. Google ScholarCross Ref
B. Widrow and M. E. Hoff Jr. 1960. Adaptive switching circuits. In IRE WESCON Convention Record. 96--104.Google Scholar
Maciej Wielgosz, Ernest Jamro, and Kazimierz Wiatr. 2008. Highly efficient structure of 64-bit exponential function implemented in FPGAs. In Proceedings of the 4th International Workshop, Reconfigurable Computing: Architectures, Tools and Applications (ARC’08), London, UK, March 26-28, 2008. Roger Woods, Katherine Compton, Christos Bouganis, and Pedro C. Diniz (Eds.). Springer, Berlin, 274--279. Google ScholarDigital Library
James H. Wilkinson. 1994. Rounding Errors in Algebraic Processes. Dover Publications, Incorporated, Mineola, NY. Google ScholarDigital Library
Zhang Xianyi, Wang Qian, and Zaheer Chothia. 2014. Openblas. Retrieved October 1, 2017 from http://xianyi.github.io/OpenBLAS.Google Scholar
Ying Yi, Roger Woods, Lok-Kee Ting, and CFN Cowan. 2005. High speed FPGA-based implementations of delayed-LMS filters. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 39, 1--2, 113--131. Google ScholarDigital Library
M. Yukawa. 2012. Multikernel adaptive filtering. IEEE Transactions on Signal Processing 60, 9, 4672--4682. Google ScholarDigital Library

Index Terms

FPGA Implementations of Kernel Normalised Least Mean Squares Processors
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing

Recommendations

Kernel Normalised Least Mean Squares with Delayed Model Adaptation

Kernel adaptive filters (KAFs) are non-linear filters which can adapt temporally and have the additional benefit of being computationally efficient through use of the “kernel trick”. In a number of real-world applications, such as channel equalisation, ...
Read More
A Microcoded Kernel Recursive Least Squares Processor Using FPGA Technology

Kernel methods utilize linear methods in a nonlinear feature space and combine the advantages of both. Online kernel methods, such as kernel recursive least squares (KRLS) and kernel normalized least mean squares (KNLMS), perform nonlinear regression in ...
Read More
Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine Library
IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

In machine learning and scientific computing, some of the biggest challenges are efficient and performant portable computing. With our Parallel Least Squares Support Vector Machine (PLSSVM) library, we have not only developed an unrivaled Support ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 10, Issue 4
December 2017
119 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3166118
Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 December 2017
- Accepted: 1 June 2017
- Revised: 1 January 2017
- Received: 1 April 2016
Published in trets Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FPGAs
hyperparameter search
machine learning
pipeline
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 199
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FPGA Implementations of Kernel Normalised Least Mean Squares Processors

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Kernel Normalised Least Mean Squares with Delayed Model Adaptation

A Microcoded Kernel Recursive Least Squares Processor Using FPGA Technology

Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine Library

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

FPGA Implementations of Kernel Normalised Least Mean Squares Processors

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Kernel Normalised Least Mean Squares with Delayed Model Adaptation

A Microcoded Kernel Recursive Least Squares Processor Using FPGA Technology

Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine Library

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media