ABSTRACT
Large scale Web search engines are complex and highly optimized systems devised to operate on dedicated clusters of processors. Any, even a small, gain in performance is beneficial to economical operation given the large amount of hardware resources deployed in the respective data centers. Performance is fully dependent on users behavior which is featured by unpredictable and drastic variations in trending topics and arrival rate intensity. In this context, discrete-event simulation is a powerful tool either to predict performance of new optimizations introduced in search engine components or to evaluate different scenarios under which alternative component configurations are able to process demanding workloads. These simulators must be fast, memory efficient and parallel to cope with the execution of millions of events in small running time on few processors. In this paper we propose achieving this objective at the expense of performing approximate parallel simulation.
- C. S. Badue, J. M. Almeida, V. Almeida, R. A. Baeza-Yates, B. A. Ribeiro-Neto, A. Ziviani, and N. Ziviani. Capacity planning for vertical search engines. CoRR, abs/1006.5059, 2010.Google Scholar
- C. Bonacic, C. Garc?a, M. Mar?n, M. Prieto-Mat?as, and F. Tirado. Building efficient multi-threaded search nodes. In CIKM, 2010. Google ScholarDigital Library
- A. Boukerche and S. K. Das. Dynamic load balancing strategies for conservative parallel simulations. SIGSIM Simul. Dig., 27(1):20--28, 1997. Google ScholarDigital Library
- A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, 2003. Google ScholarDigital Library
- C. D. Carothers and R. M. Fujimoto. Efficient execution of time warp programs on heterogeneous, now platforms. TPDS, 11(3) 2000. Google ScholarDigital Library
- L.-l. Chen, Y.-s. Lu, Y.-p. Yao, S.-l. Peng, and L.-d. Wu. A well-balanced time warp system on multi-core environments. In PADS, 2011. Google ScholarDigital Library
- V. Gil-Costa, J. Lobos, A. Inostrosa and M. Marin. Capacity planning for search engines: An approach based on coloured Petri nets. In ICATPN, 2012. Google ScholarDigital Library
- G. DAngelo and M. Bracuto. Distributed simulation of large-scale and detailed models. IJSPM, 5(2) 2009.Google Scholar
- B. Fitzpatrick. Distributed caching with memcached. J. of Linux, 2004:72--76, 2004. Google ScholarDigital Library
- A. Freire, F. Cacheda, V. Formoso and V. Carneiro. Analysis of performance evaluation techniques for Large Scale Information Retrieval. In LSDS-IR, 2013.Google Scholar
- R. Fujimoto. Parallel discrete event simulation. Comm. ACM, 33(10):30--53, Oct. 1990. Google ScholarDigital Library
- Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In WWW, 2009. Google ScholarDigital Library
- R. Geist, J. Hicks, M. Smotherman, and J. Westall. Parallel simulation of petri nets on desktop pc hardware. In WSC, 2005. Google ScholarDigital Library
- D. W. Glazer and C. Tropper. On process migration and load balancing in time warp. TPDS, 4(3) 1993. Google ScholarDigital Library
- S. Jafer and G. A. Wainer. A performance evaluation of the conservative devs protocol in parallel simulation of devs-based models. In SpringSim (TMS-DEVS), 2011. Google ScholarDigital Library
- K. Jensen and L. Kristensen. Coloured Petri Nets. Springer-Verlag Berlin Heidelberg, 2009.Google ScholarCross Ref
- M. Knoke and G. Hommel. Dealing with global guards in a distributed simulation of colored petri nets. In DS-RT, 2005. Google ScholarDigital Library
- M. Knoke, F. Kuhling, A. Zimmermann, and G. Hommel. Towards correct distributed simulation of high-level petri nets with fine-grained partitioning. In ISPA, 2004. Google ScholarDigital Library
- J. Liu and R. Rong. Hierarchical composite synchronization. In PADS, 2012. Google ScholarDigital Library
- Q. Liu and G. A. Wainer. Multicore acceleration of discrete event system specification systems. Simulation, 88(7) 2012. Google ScholarDigital Library
- M. L. Loper and R. Fujimoto. Exploiting temporal uncertainty in process-oriented distributed simulations. In WSC, 2004. Google ScholarDigital Library
- M. Marin. Time Warp on BSP. In ESM, 1998.Google Scholar
- L. Mellon and D. West. Architectural optimizations to advanced distributed simulation. In WSC, 1995. Google ScholarDigital Library
- M. Marzolla. LibCppSim: A SIMULA-like, Portable Process-Oriented Simulation Library in C++. In ESM, 2004.Google Scholar
- D. M. Nicol: Discrete-Event Simulation in Performance Evaluation. Performance Evaluation 2000: 443--457. Google ScholarDigital Library
- K.S. Panesar and R.M. Fujimoto. Adaptive flow control in Time Warp. In PADS, 1997. Google ScholarDigital Library
- L. G. Valiant. A bridging model for multi-core computing. J. Comput. Syst. Sci. 1(77), 2011. Google ScholarDigital Library
- L. G. Valiant. A bridging model for parallel computation. Comm. ACM, 33(8) 1990. Google ScholarDigital Library
- R. Vitali, A. Pellegrini, and F. Quaglia. Towards symmetric multi-threaded optimistic simulation kernels. In PADS, 2012. Google ScholarDigital Library
- BSPonMPI. http://bsponmpi.sourceforge.net/Google Scholar
- MPI. http://www.mcs.anl.gov/research/projects/mpi/Google Scholar
- OpenMP. http://openmp.org/Google Scholar
- CPN Tools. http://cpntools.org/Google Scholar
Index Terms
Approximate parallel simulation of web search engines
Recommendations
A workstation-based parallel direct-execution simulator
It is important to understand and efficiently predict the performance of large codes executing on massively parallel machines. However, these very large machines are scarce, expensive, and generally unavailable to large segments of the research ...
Overlap Among Major Web Search Engines
ITNG '06: Proceedings of the Third International Conference on Information Technology: New GenerationsOur study examined the overlap among results retrieved by three major Web search engines for a large set of more than 10,316 queries. Previous smaller studies have discussed the lack of overlap in results returned by Web search engines for the same ...
A workstation-based parallel direct-execution simulator
PADS '97: Proceedings of the eleventh workshop on Parallel and distributed simulationIt is important to understand and efficiently predict the performance of large codes executing on massively parallel machines. However, these very large machines are scarce, expensive, and generally unavailable to large segments of the research ...
Comments