[Remote] Staff Platform Infrastructure Engineer: Observability

Other Jobs To Apply

No other job posts for this day.

Note: The job is a remote job and is open to candidates in USA. Jack Henry is a technology company redefining how community banks and credit unions connect with the people they serve. The Staff Platform Infrastructure Engineer will be responsible for designing, building, and maintaining observability solutions that empower product teams with the insights needed to operate reliable services.


Responsibilities

  • Plan, design and build the observability blueprints used by Jack Henry's development and engineering teams. Craft the overarching strategy and create detailed designs for how observability will be implemented for various products & services across the organization. This includes defining data flows & telemetry pipelines into the appropriate platform (like Datadog, Honeycomb, Prometheus, etc.) and establishing best practices for instrumentation
  • Resolve critical observability incidents that impact the ability to monitor and understand system behavior. This engineer will lead the effort to pinpoint the root cause, implement solutions, and prevent recurrence. This requires deep expertise in observability tools and techniques
  • Design and implement automated pipelines for deploying, configuring, and managing observability tools and instrumentation. This includes automating tasks such as agent installation, configuration updates, and alert provisioning, leveraging IaC principles and tools
  • Responsible for ensuring the observability systems themselves are healthy, reliable, and providing accurate data. Also champion the use of observability to improve the overall health and resilience of production systems, enabling faster detection and mitigation of potential problems
  • Actively engage with product teams to understand their upcoming projects, technology choices, and observability needs. Ensure that observability is embedded early in the development lifecycle and provide the insights necessary to optimize application performance and reliability
  • May perform other job duties as assigned

Skills

  • Minimum of 10 years of experience in Software Development, Observability Engineering, or Site Reliability Engineering
  • Minimum of 5 years of in-depth experience with Observability platforms like Datadog, Dynatrace, Honeycomb, New Relic, Splunk, or Prometheus
  • Minimum of 4 years of cloud experience with Azure, AWS, or GCP
  • Minimum 4 years of experience building and managing telemetry pipelines, including at least 1 year of hands-on experience with OpenTelemetry
  • Minimum of 4 years of experience with Kubernetes environments
  • Understanding and experience with declarative infrastructure using Terraform
  • Must be able to work an on-call rotation that may include weekends as the business need dictates
  • Proven ability to collect logs, metrics & traces and implement Observability solutions for applications and infrastructure
  • Solid understanding of distributed tracing and experience instrumenting applications to analyze performance bottlenecks
  • Hands-on experience configuring and deploying OTEL collectors for telemetry data collection, processing and export
  • Strong understanding of Kubernetes architecture and experience managing observability within K8s environments
  • Proficiency in using Kustomize for Kubernetes configurations and Terraform for infrastructure provisioning
  • Experience integrating observability practices and tools into CI/CD pipelines for automated deployments
  • Exceptional analytical and problem-solving skills to diagnose and resolve complex issues within observability systems, including data pipeline failures, instrumentation errors, and performance bottlenecks
  • Ability to demonstrate a strong Site Reliability Engineering (SRE) mindset with a focus on automation, proactive monitoring, and continuous improvement to ensure system reliability and availability
  • Experience defining and implementing Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure system performance and drive data-driven decisions

Benefits

  • Comprehensive benefits designed to support your physical, mental, and financial health

Company Overview

  • Jack Henry (Nasdaq: JKHY) is a well-rounded financial technology company that strengthens the connections between people and their financial institutions through technology and services that reduce the barriers to financial health. It was founded in 1976, and is headquartered in Monett, Missouri, USA, with a workforce of 5001-10000 employees. Its website is http://www.jackhenry.com.

  • Back to blog

    Common Interview Questions And Answers

    1. HOW DO YOU PLAN YOUR DAY?

    This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

    2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

    When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

    3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

    Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

    4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

    With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

    5. HOW DO YOU PROCESS INFORMATION?

    Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

    6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

    Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

    7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

    Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

    8. HOW TO PRIORITIZE WORK?

    The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

    9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

    Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

    10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

    This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...