Job Description
United States•Remote
Full-time
Job details
Job type
- Full-time
Shift and schedule
- On call
BenefitsPulled from the full job description
- 401(k)
- Health insurance
- Paid time off
- Vision insurance
- Health savings account
- Dental insurance
- Life insurance
Full job description
Overview
At company we bring high definition to patient data, delivering actionable insights for Clinicians and empowering them with the knowledge to make better decisions. Our mission is to help providers achieve better care outcomes through these insights.
We are seeking a skilled Site Reliability Engineer to join our team. This position is ideal for an individual with a strong background in software engineering and operations who has experience working with federal agencies such as the Department of Veterans Affairs(VA). The candidate will be responsible for ensuring the reliability and scalability of our services by effectively monitoring, deploying, and maintenance of our infrastructure.
Please note – We can only accept applications from U.S. citizens or Green Card holders.
What you will do
- Develop, maintain, and continuously improve CI/CD pipelines, branch management practices, and release management processes.
- Act as primary system administrator for infrastructure in VA environments, ensuring system availability, performance, and compliance with federal security standards.
- Design, implement, and operate monitoring and alerting solutions; validate and triage alerts as the first line of defense for operational issues.
- Proactively identify and resolve scalability, reliability, and performance risks across applications, infrastructure, and data pipelines.
- Lead and participate in incident response, including on-call and off-hour support, root cause analysis, and post-incident remediation.
- Manage ETL pipelines and data flow operations, including health checks, restart automation, recovery workflows, and operational validation.
- Coordinate closely with VA’s IT team, federal stakeholders, and internal development teams to resolve infrastructure and operational incidents.
- Own and evolve disaster recovery, backup, and resilience strategies.
- Document and maintain runbooks, SOPs, system configurations, deployment procedures, and escalation protocols.
- Evaluate and introduce new tools and technologies to improve system reliability, security, and operational efficiency.
What you bring to the team
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- Minimum of 3 years of experience as a Site Reliability Engineer or similar role, particularly in environments subject to federal compliance requirements.
- Experience working directly with federal agencies, preferably the Department of Veterans Affairs, including navigating their compliance, security, and operational protocols.
- Strong understanding of monitoring tools and software (e.g., Prometheus, Grafana, ELK stack).
- Experience with backup, disaster recovery, and failover planning.
- Proficient with containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Experienced with cloud services (e.g., AWS, Azure) and managing cloud infrastructure.
- Proficient in scripting languages (e.g., Python, Bash) for automation and SQL.
- Excellent problem-solving skills and ability to work under pressure.
Bonus if you have:
- Certification in cloud technologies and security standards.
- Experience with infrastructure as code (e.g., Terraform, Ansible).
- Experience working with Jenkins and/or other CI/CD pipelines management tools
- Familiarity with VPNs, VLANs, and secured network architecture — particularly for healthcare or government systems.
- Experience with cloud and application security platforms, such as WAF(e.g. SignalScience/Fastly), RASP (e.g. TCell/Rapid7, Tenable), CNAPP (e.g. Lacework/Fortinet, Wiz) etc.
Company Description
company is a data analytics company at the intersection of artificial intelligence, bioinformatics, and decision science. Our tailored software suite empowers healthcare organizations to reduce costs, improve patient care outcomes, and mitigate the spread of infectious diseases. Bitscopic also provides solutions for managing complex clinical trials, operational efficiency, and other bioinformatics areas.
WHY JOIN US?
- Mission-Driven – We are self-funded, owned by the team, and entirely focused on making a meaningful impact in healthcare while maintaining collaborative, transparent communication and ego-free team culture.
- Team Culture – We all value work-life balance, personal and professional growth, competitive compensation, continuous learning, and fun. We build essential products to positively impact and save lives. On our team, everyone’s voice is a vital contribution to the mission. Most of our team members have been with us for over six years.
- Freedom to Focus – Leverage flexibility, autonomy, and schedule to do your best work. Control your approach to achieve results, along with the responsibility of delivering for the team and our customers.
- Location Independent – Work from anywhere. We’ve been a distributed team since our founding in 2012.
Benefits:
- 401(k)
- Dental insurance
- Health insurance
- Health savings account
- Life insurance
- Paid time off
- Vision insurance
Work Location: Remote