Abstract
Kernel adaptive filters (KAFs) are online machine learning algorithms which are amenable to highly efficient streaming implementations. They require only a single pass through the data and can act as universal approximators, i.e. approximate any continuous function with arbitrary accuracy. KAFs are members of a family of kernel methods which apply an implicit non-linear mapping of input data to a high dimensional feature space, permitting learning algorithms to be expressed entirely as inner products. Such an approach avoids explicit projection into the feature space, enabling computational efficiency. In this paper, we propose the first fully pipelined implementation of the kernel normalised least mean squares algorithm for regression. Independent training tasks necessary for hyperparameter optimisation fill pipeline stages, so no stall cycles to resolve dependencies are required. Together with other optimisations to reduce resource utilisation and latency, our core achieves 161 GFLOPS on a Virtex 7 XC7VX485T FPGA for a floating point implementation and 211 GOPS for fixed point. Our PCI Express based floating-point system implementation achieves 80% of the core’s speed, this being a speedup of 10× over an optimised implementation on a desktop processor and 2.66× over a GPU.
- Nikolaos Alachiotis and Alexandros Stamatakis. 2011. FPGA Optimizations for a Pipelined Floating-Point Exponential Unit. In Proceedings of the 7th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC 2011), Belfast, UK, March 23-25, 2011. Springer, Berlin, Heidelberg, 316--327. Google ScholarDigital Library
- Davide Anguita, Luca Carlino, Alessandro Ghio, and Sandro Ridella. 2011. A FPGA core generator for embedded classification systems. Journal of Circuits, Systems and Computers 20, 02, 263--282.Google ScholarCross Ref
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 281--305. Google ScholarDigital Library
- Badong Chen, Songlin Zhao, Pingping Zhu, and José Carlos Principe. 2012. Quantized kernel least mean square algorithm. IEEE Transactions on Neural Networks and Learning Systems 23, 1, 22--32.Google ScholarCross Ref
- Badong Chen, Nanning Zheng, and Jose C. Principe. 2013. Survival kernel with application to kernel adaptive filtering. In The 2013 International Joint Conference on Neural Networks (IJCNN’13). IEEE, 1--6.Google Scholar
- Marc Claesen and Bart De Moor. 2015. Hyperparameter search in machine learning. In The XI Metaheuristics International Conference (MIC’15).Google Scholar
- David Cox and Nicolas Pinto. 2011. Beyond simple features: A large-scale feature search approach to unconstrained face recognition. In 2011 IEEE International Conference on Automatic Face 8 Gesture Recognition and Workshops (FG’11). IEEE, 8--15.Google ScholarCross Ref
- J. Detrey and F. de Dinechin. 2005. A parameterized floating-point exponential function for FPGAs. In Proceedings of 2005 IEEE International Conference on Field-Programmable Technology. 27--34.Google ScholarCross Ref
- Scott C. Douglas, Quanhong Zhu, and Kent F. Smith. 1998. A pipelined LMS adaptive FIR filter architecture without adaptation delay. IEEE Transactions on Signal Processing 46, 3, 775--779. Google ScholarDigital Library
- N. J. Fraser, D. J. M. Moss, JunKyu Lee, S. Tridgell, C. T. Jin, and P. H. W. Leong. 2015. A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation. In 25th International Conference on Field Programmable Logic and Applications (FPL’15). 1--6.Google ScholarCross Ref
- Matthew Jacobsen, Yoav Freund, and Ryan Kastner. 2012. RIFFA: A reusable integration framework for FPGA accelerators.. In IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE, 216--219. http://dblp.uni-trier.de/db/conf/fccm/fccm2012.html/#JacobsenFK12. Google ScholarDigital Library
- E. Jamro, K. Wiatr, and M. Wielgosz. 2007. FPGA implementation of 64-bit exponential function for HPC. In International Conference on Field Programmable Logic and Applications (FPL’07). 718--721.Google Scholar
- Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson. 2004. Online learning with kernels. IEEE Transactions on Signal Processing 52, 8, 2165--2176. Google ScholarDigital Library
- Neil Lawrence, Matthias Seeger, and Ralf Herbrich. 2003. Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems. 609--616. Google ScholarDigital Library
- C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathemational Software 5, 3, 308--323. Google ScholarDigital Library
- Weifeng Liu, José C. Príncipe, and Simon Haykin. 2011. Kernel Adaptive Filtering: A Comprehensive Introduction. Vol. 57. John Wiley 8 Sons, Hoboken, NJ. Google ScholarDigital Library
- Guoz-hu Long, Fuyun Ling, and John G. Proakis. 1989. The LMS algorithm with delayed coefficient adaptation. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 9, 1397--1405.Google ScholarCross Ref
- Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, and Hans Peter Graf. 2012. A massively parallel, energy efficient programmable accelerator for learning and classification. ACM Transactions on Architecture and Code Optimization 9, 1, Article 6, 30 pages. Google ScholarDigital Library
- Yeyong Pang, Shaojun Wang, Yu Peng, Nicholas J. Fraser, and Philip H. W. Leong. 2013. A low latency kernel recursive least squares processor using FPGA technology. In FPT. 144--151.Google Scholar
- M. Papadonikolakis and C. Bouganis. 2008. A scalable FPGA architecture for non-linear SVM training. In International Conference on ICECE Technology (FPT’08). 337--340.Google Scholar
- Nicolas Pinto, David Doukhan, James J. DiCarlo, and David D. Cox. 2009. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLOS Computational Biology 5, 11, 1--12.Google ScholarCross Ref
- John Platt and others. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. https://www.microsoft.com/en-us/research/publication/sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/.Google Scholar
- Rainer D. Poltmann. 1995. Conversion of the delayed LMS algorithm into the LMS algorithm. Signal IEEE Processing Letters 2, 12, 223.Google ScholarCross Ref
- Robin Pottathuparambil and Ron Sass. 2009. A parallel/vectorized double-precision exponential core to accelerate computational science applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’09). ACM, New York, NY,285--285. Google ScholarDigital Library
- Carl E. Rasmussen and Christoper K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Xiaowei Ren, Pengju Ren, Badong Chen, Tai Min, and Nanning Zheng. 2014. Hardware implementation of KLMS algorithm using FPGA. In 2014 International Joint Conference on Neural Networks (IJCNN’14). IEEE, 2276--2281.Google ScholarCross Ref
- Cédric Richard, José Carlos M. Bermudez, and Paul Honeine. 2009. Online prediction of time series data with kernels. IEEE Transactions on Signal Processing, 57, 3, 1058--1067. Google ScholarDigital Library
- Bernhard Scholkopf and Alexander J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Matthias Seeger. 2000. Relationships between Gaussian processes, support vector machines and smoothing splines. Machine Learning).Google Scholar
- Stephen Tridgell, Duncan J. M. Moss, Nicholas J. Fraser, and Philip H. W. Leong. 2015. Braiding: A scheme for resolving hazards in NORMA. In Proceedings of the International Conference on Field Programmable Technology (FPT’15). 136--143.Google Scholar
- Steven Van Vaerenbergh. 2012. Kernel Methods Toolbox KAFBOX: a Matlab benchmarking toolbox for kernel adaptive filtering. Retrieved October 1, 2017 at http://sourceforge.net/p/kafbox.Google Scholar
- S. Van Vaerenbergh, J. Via, and I. Santamaria. 2006. A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06). Vol. 5. 789--792.Google Scholar
- R. Clint Whaley and Antoine Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience 35, 2, 101--121. http://www.cs.utsa.edu/whaley/papers/spercw04.ps. Google ScholarCross Ref
- B. Widrow and M. E. Hoff Jr. 1960. Adaptive switching circuits. In IRE WESCON Convention Record. 96--104.Google Scholar
- Maciej Wielgosz, Ernest Jamro, and Kazimierz Wiatr. 2008. Highly efficient structure of 64-bit exponential function implemented in FPGAs. In Proceedings of the 4th International Workshop, Reconfigurable Computing: Architectures, Tools and Applications (ARC’08), London, UK, March 26-28, 2008. Roger Woods, Katherine Compton, Christos Bouganis, and Pedro C. Diniz (Eds.). Springer, Berlin, 274--279. Google ScholarDigital Library
- James H. Wilkinson. 1994. Rounding Errors in Algebraic Processes. Dover Publications, Incorporated, Mineola, NY. Google ScholarDigital Library
- Zhang Xianyi, Wang Qian, and Zaheer Chothia. 2014. Openblas. Retrieved October 1, 2017 from http://xianyi.github.io/OpenBLAS.Google Scholar
- Ying Yi, Roger Woods, Lok-Kee Ting, and CFN Cowan. 2005. High speed FPGA-based implementations of delayed-LMS filters. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 39, 1--2, 113--131. Google ScholarDigital Library
- M. Yukawa. 2012. Multikernel adaptive filtering. IEEE Transactions on Signal Processing 60, 9, 4672--4682. Google ScholarDigital Library
Index Terms
- FPGA Implementations of Kernel Normalised Least Mean Squares Processors
Recommendations
Kernel Normalised Least Mean Squares with Delayed Model Adaptation
Kernel adaptive filters (KAFs) are non-linear filters which can adapt temporally and have the additional benefit of being computationally efficient through use of the “kernel trick”. In a number of real-world applications, such as channel equalisation, ...
A Microcoded Kernel Recursive Least Squares Processor Using FPGA Technology
Kernel methods utilize linear methods in a nonlinear feature space and combine the advantages of both. Online kernel methods, such as kernel recursive least squares (KRLS) and kernel normalized least mean squares (KNLMS), perform nonlinear regression in ...
Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine Library
IWOCL '23: Proceedings of the 2023 International Workshop on OpenCLIn machine learning and scientific computing, some of the biggest challenges are efficient and performant portable computing. With our Parallel Least Squares Support Vector Machine (PLSSVM) library, we have not only developed an unrivaled Support ...
Comments