38228 | Site Reliability Engineer - Incident Response Job at Brilliant, Grapevine, TX

WDVBam1vNEFKODVmTFROb0t4MEw4bmg2dnc9PQ==
  • Brilliant
  • Grapevine, TX

Job Description

Added - 12/22/25

38228 | Site Reliability Engineer - Incident Response

Technology

Grapevine, Texas | Direct Hire

Job Description

Job Title: Sr. Site Reliability Engineer

Location: Atlanta, GA

Salary Range: $130,000 - $160,000

Benefits: Healthcare, PTO, 401k

 

About the Role

The SRE is a senior-level role focused on accelerating incident resolution and improving enterprise incident management practices. This individual partners closely with engineering teams during live incidents to troubleshoot issues using monitoring and logging tools. Following incidents, they deliver clear, executive-level summaries that communicate impact, root cause, and resolution. This role also plays a key part in evaluating incident response effectiveness and driving systemic reliability improvements.

Core Competencies and Qualifications

  • Bachelor’s degree in a related discipline with 4 years of relevant experience, or an equivalent combination of education and experience

  • Must be authorized to work in the United States without current or future sponsorship

  • Strong ability to design, build, and maintain engineering solutions and tooling that improve reliability, automate incident response, and reduce operational toil

  • Skilled in interpreting logs, metrics, and traces to identify root causes during live incidents

  • Experience with observability platforms such as Datadog, Splunk, New Relic, or similar tools

  • Strong programming background in Python, Java, or C#, with experience supporting production-grade services and automation

  • Proven ability to design reliable, scalable, and highly available systems using sound software engineering practices

  • Experience developing automation to improve incident response, monitoring, deployment, and recovery processes

  • Ability to collaborate closely with software engineering teams to influence architecture and operational readiness

  • Experience leveraging AI and machine learning tools to enhance incident response, automation, and daily engineering workflows

  • Strong analytical skills with attention to detail in validating incident data and identifying trends

  • Solid understanding of DevOps concepts, including CI/CD pipelines, cloud-native infrastructure, caching, and scaling

  • Experience calculating and interpreting metrics such as MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve)

Responsibilities Outside of Active On-Call

Post-Incident Review Development

  • Draft and deliver executive-level post-incident summaries

  • Develop and coach teams on blameless postmortem practices

  • Create templates and guide structured root cause analysis methods such as 5 Whys or fishbone diagrams

  • Maintain a centralized library of learnings and cross-cutting incident themes

Incident Process Improvement

  • Support engineering teams during incidents by assisting with rapid diagnosis and resolution

  • Analyze data from observability platforms to form informed conclusions about root causes

  • Evaluate incident response effectiveness to identify systemic reliability gaps

  • Standardize incident response workflows, including roles, communication, and escalation paths

  • Create or refine runbooks, incident command frameworks, and severity classification guidelines

Metrics and Insights

  • Build dashboards tracking incident frequency, MTTR, MTTA, and recurrence rates

  • Use incident data to inform reliability-focused OKRs and engineering investment decisions

Tooling and AI Solutions

  • Identify repetitive or high-impact tasks suitable for automation

  • Develop and enhance scripts, bots, and AI-driven workflows for monitoring, alerting, and incident triage

  • Evaluate and integrate emerging AI and ML technologies to improve detection, root cause analysis, and reporting

  • Ensure tools and automations meet security, maintainability, and best practice standards

  • Document and share new tools and solutions to support adoption across teams

Cross-Team Collaboration

  • Work with engineering managers and incident leaders to gather and validate incident data

  • Partner with product, infrastructure, and leadership teams to promote reliability best practices

  • Act as a reliability consultant to teams experiencing significant or recurring incidents

  • Recommend improvements to monitoring, alerting, and response processes to reduce future incident impact

Brilliant Staffing, LLC is an Equal Opportunity Employer and encourages applications from all individuals regardless of race, color, religion, gender, gender identity, sexual orientation, national origin, disability, or veteran status.

#LI-AG1

arrow_back Back to Listings

Job Tags

Similar Jobs

Iowa Department of Administrative Services

Education Program Consultant (School Improvement) Job at Iowa Department of Administrative Services

 ...Education Program Consultant (School Improvement) Print ( Apply ? Education Program Consultant (School Improvement) Salary $66,726.40 - $103,147.20 Annually Location Des Moines - 50319 - Polk County, IA Job Type Full-time... 

Troon

Pool Attendant/Porter Job at Troon

General Purpose: Manages the day-to-day operation, maintenance, and administration of the Clubs aquatic facility. Ensures that all...  ...maintained. Supervises the safety and cleanliness of the club swimming pool and facilities. Essential Duties: Provide a comfortable... 

Dynamics ATS

Warehouse Associate Job at Dynamics ATS

Warehouse Associate JOB-10045489 Anticipated Start Date Dec. 18, 2025 Location Roswell, GA Type of Employment Contract Employer Info Our client is a provider of electrical, industrial, communications, maintenance, repair and operating...

Snelling Austin

Meat Cutter / Meat Processor Job at Snelling Austin

 ...Key Responsibilities: Cut, trim, and portion meat products (beef, pork, poultry, etc.) to meet customer and production specifications. Operate cutting tools, knives, saws, and other meat processing equipment safely and efficiently. Inspect meat for quality,... 

Industrial Design

Architectural Designer Job at Industrial Design

 ...Industrial Design LLC is a growing full-service AE Firm in the Chandler Arizona area. We primarily serve the Semiconductor Industry...  ...professional designers who are looking to join a growing, vibrant Architectural team; you will work alongside our stamping Registered...