Top 22 Site Reliability Engineer Interview Questions & Answers 2020

  • Site reliability engineers communicate with other engineers, product owners, and customers and come up with targets and measures. This helps them to ensure system availability. One can easily understand the perfect time to take action once all have agreed upon a system’s uptime and availability.
  • They introduce error budgets in order to measure risk, balance availability and feature development. When there are no unrealistic reliability targets, a team has the flexibility to deliver updates and improvements to a system.
  • SRE believes in reducing toil. That results in automating tasks that require a human operator to work manually.
  • A site reliability engineer should have an in-depth understanding of the systems and their connectivity.
  • Site reliability engineers have the task of discovering the problems early to reduce the cost of failure.

1. Tell me the difference between DevOps & SRE

  • SRE treats Ops more like a software engineering problem.
  • DevOps focuses on both Dev and Ops departments to bridge these two worlds.
  • SRE is focused on embracing consistent technologies and information access across the IT teams.
  • DevOps focuses on automation and the adoption of technology.
  • DevOps is primarily focused on the process performance and results achieved with the feedback loop to realize continuous improvement.
  • SRE requires measurement of SLOs as the dominant metrics since the framework observes Ops problems as software engineering problems.

2. Why do you think that you will become a Site Reliability Engineer?

  • The inter-relationship of SRE with DevOps and other popular frameworks
  • The underlying principles behind SRE
  • Service Level Objectives (SLO’s) and their user focus
  • Service Level Indicators (SLI’s) and the modern monitoring landscape
  • Error budgets and the associated error budget policies
  • Toil and its effect on an organization’s productivity
  • Some practical steps that can help to eliminate toil
  • Observability as something to indicate the health of a service
  • SRE tools, automation techniques, and the importance of security
  • Anti-fragility, our approach to failure and failure testing
  • The organizational impact that introducing SRE brings

3. Have you ever heard of SLO? If yes then explain

4. Explain Data Structure. Name some data structure.

5. How do you differentiate between process and thread?

  • When execution of a program allows you to perform the appropriate actions specified in the program, that’s called process.
  • On the other hand, the thread is the segment of processes.
  • Process is not lightweight. Threads are lightweight.
  • The process takes more time to terminate. Threads take more time to terminate.
  • Process creation takes more time. Thread creation takes less time.
  • The process takes more time in context switching. Threads take less time in context switching.
  • The process is more isolated. Threads share memory.
  • The process does not share data. Threads share data with each other.

6. What is Error Budgets? And for what error budgets is used?

7. Define the Error budget policy?

8. What activity means Reducing Toil?

  1. Creating external automation
  2. Creating internal automation
  3. Enhancing the service to not require maintenance intervention.

9. Define Service Level Indicators

10. Enlist all the Linux signals you are aware of

11. Have you ever heard of TCP? Please enlist some TCP connection list

12. Few TCP connection states are:

14. What are the Linux kill command? Enlist all the Linux kill commands with its functions

15. What is cloud computing?

16. How would you describe the functions of an ideal DevOps team?

17. What is observability, how to improve organizations’ systems observability?

  • Understand what types of data flow from an environment, and which of those data types are relevant and useful to your observability goals
  • Get a clear vision of what a team cares about and figure out how your strategy is making sense of data by distilling, curating, transforming it into actionable insights into the performance of your systems.
  • Observability offer potentially useful clues about an organization’s DevOps maturity level.

18. What is DHCP, for what it used?

  1. Requesting IP addresses and networking parameters automatically from the Internet service provider (ISP)
  2. Reducing the need for a network administrator or a user to manually assign IP addresses to all network devices.

19. What is the difference between snat and dnat?

  1. On either side of a NAT device, we have an outside world and inside the world, When the inside world communicates with the outside world SNAT happens. When the outside world communicates with the inside world DNAT happens.
  2. When many internal private IP addresses get translated to one public IP address, it’s called Static SNAT. When many internal private IP addresses get translated to many public IP addresses it’s called Dynamic SNAT
  3. Source NAT changes the source address in the IP header packet. Source NAT changes the destination address in the IP header packet.
  4. SNAT allows multiple hosts on the “inside” to get to any host on the “outside”. DNAT allows multiple hosts on the “outside” to get to any host on the “inside”

20. Define Hardlink and soft link with examples

21. How will you secure your Docker containers?

  1. Choose third party containers carefully
  2. Enable Docker content trust
  3. Set resource limit for your containers
  4. Consider a third-party security tool
  5. Use Docker Bench Security

22. Can you describe the Best SRE Tools for each Stage of DevOps?



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store



We, "NovelVista Learning Solution" have expertise in providing high end training & Certification programs for ITIL®, PRINCE2,PMP, SIAM, Cloud, AWS, Devops etc