Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-12T10:59:32.005Z Has data issue: false hasContentIssue false

Job replication on multiserver systems

Published online by Cambridge University Press:  01 July 2016

Yusik Kim*
Affiliation:
University of California, Berkeley
Rhonda Righter*
Affiliation:
University of California, Berkeley
Ronald Wolff*
Affiliation:
University of California, Berkeley
*
Postal address: Department of Industrial Engineering and Operations Research, 4141 Etcheverry Hall, Berkeley, CA 94720, USA.
Postal address: Department of Industrial Engineering and Operations Research, 4141 Etcheverry Hall, Berkeley, CA 94720, USA.
Postal address: Department of Industrial Engineering and Operations Research, 4141 Etcheverry Hall, Berkeley, CA 94720, USA.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Parallel processing is a way to use resources efficiently by processing several jobs simultaneously on different servers. In a well-controlled environment where the status of the servers and the jobs are well known, everything is nearly deterministic and replicating jobs on different servers is obviously a waste of resources. However, in a poorly controlled environment where the servers are unreliable and/or their capacity is highly variable, it is desirable to design a system that is robust in the sense that it is not affected by the poorly performing servers. By replicating jobs and assigning them to several different servers simultaneously, we not only achieve robustness but we can also make the system more efficient under certain conditions so that the jobs are processed at a faster rate overall. In this paper we consider the option of replicating jobs and study how the performance of different ‘degrees’ of replication, ranging from no replication to full replication, affects the performance of a system of parallel servers.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2009 

References

Borst, S., Boxma, O., Groote, J. F. and Mauw, S. (2003). Task allocation in a multiserver system. J. Sched. 6, 423436.Google Scholar
Dobber, M. (2006). Robust applications in time-shared distributed systems. , Vrije Universiteit Amsterdam.Google Scholar
Foster, I., Kesselman, C. and Tuecke, S. (2001). The anatomy of the grid: enabling scalable virtual organizations. Internat. J. High Performance Comput. Appl. 15, 200222.Google Scholar
Koole, G. and Righter, R. (2008). Resource allocation in grid computing. J. Sched. 11, 163173.Google Scholar
Korpela, E. et al. (2001). SETI@home-massively distributed computing for SETI. Comput. Sci. Eng. 3, 7883.CrossRefGoogle Scholar
Larson, S. M., Snow, C. D., Shirts, M. and Pande, V. S. (2009). Folding@home and genome@home: using distributed computing to tackle previously intractable problems in computational biology. Preprint. Available at http://arxiv.org/abs/0901.0866.Google Scholar
Leistman, A. L. and Campbell, R. H. (1986). A fault-tolerant scheduling problem. IEEE Trans. Soft. Eng. 12, 10881089.Google Scholar
Litke, A., Skoutas, D., Tserpes, K. and Varvarigou, T. (2007). Efficient task replication and management for adaptive fault tolerance in mobile grid environments. Future Generation Computer Systems 23, 163178.Google Scholar
Shaked, M. and Shanthikumar, J. G. (1994). Stochastic Orders and Their Applications. Academic Press, Boston, MA.Google Scholar
Shaked, M. and Shanthikumar, J. G. (2007). Stochastic Orders. Springer, New York.CrossRefGoogle Scholar