Site reliability engineer
Site reliability engineer (SRE) is a job description given to software engineers focused on reliability, scalability, and the development of cloud computing infrastructure. SREs develop, maintain and operate software that automates the traditional roles of the system administrator at large scale, such as configuration and cluster management systems, and that support reliability and scalability goals, such as container virtualization and the systems architecture of microservices.[1][2]
Companies employing the SRE title or involved in the SRE profession include Google, Dropbox, Mozilla, Airbnb, Netflix,[2] eBay, LinkedIn, Facebook, Adobe, Microsoft, Uber, IBM and Twitter.[3] At Google, SREs have been responsible for developing the Google File System (GFS), its successor Colossus (CFS), and the Borg cluster management system.[4]
See also
References
- ↑ Site Reliability Engineering: How Google Runs Production Systems. O'Reilly Media. 2016.
- 1 2 Donald FIscher (2016-03-02). "Are site reliability engineers the next data scientists?". TechCrunch.
- ↑ USENIX. "SREcon".
- ↑ Charles Babcock (2015-11-11). "Kubernetes Yields 'Operations Dividend,' Still Working On Scalability". InformationWeek.