Job Description
3.83.8 out of 5 stars
United States
Full-time
Job details
Job type
- Full-time
Shift and schedule
- On call
Full job description
The IC2 Site Reliability Engineer in our OCI Sovereign Cloud team supports daily operations for a secure, large-scale OCI-based cloud environment powering mission-critical federal government workloads. This entry-level position focuses on maintaining and supporting existing infrastructure, implementing incremental improvements, and ensuring operational health and compliance. Working within a Linux-centric environment, you will leverage scripting and basic automation to manage deployments, perform fleet maintenance, and maintain system health under the supervision and guidance of senior engineers.
Key Responsibilities:
- Perform routine operational tasks such as deployments, patching, fleet maintenance, and basic troubleshooting for cloud-based systems.
- Tune team-specific alarms and thresholds, escalate incidents appropriately, and support the management of metrics, KPIs, and system health dashboards.
- Participate in incident response by quickly triaging and escalating incidents, executing operational playbooks, and documenting issues for senior review. You will follow established procedures under supervision and contribute to root-cause analysis by gathering data and providing initial troubleshooting support.
- Serve as a technical support point of contact, troubleshooting and resolving technical issues, assisting customers with environment setup and debugging, and providing timely communication and status updates to customers and internal teams.
- Own, maintain, and improve runbooks to ensure consistency and clarity for operational processes.
- Implement defined enhancements to existing tools, documentation, and monitoring solutions.
- Collaborate closely with other team members and escalate complex issues for further investigation and resolution.
- Participate in on-call rotations with support from senior engineers, ensuring continuity of coverage and timely response.
- Ensure compliance with all security, operational, and documentation standards.
Minimum Qualifications:
- U.S. Citizenship and possess and maintains TS/SCI w/Poly security clearance.
- Hands-on experience with Linux systems administration.
- Scripting ability with Python or Bash.
- Understanding of basic cloud concepts (networking, compute, identity, observability).
- Strong problem-solving skills and willingness to learn complex systems.
- Ability to work collaboratively with technical teams and communicate effectively.
Preferred Qualifications:
- Exposure to company Cloud Infrastructure (OCI) or other major cloud platforms.
- Familiarity with Infrastructure-as-Code tools such as Terraform or Ansible.
- Experience supporting production systems or participating in on-call rotations.
- Understanding of security best practices within classified environments.
Why Join Us?
This is an opportunity to grow your career in a highly collaborative team supporting mission-critical systems in one of the world’s leading cloud environments. You will receive significant coaching and guidance while gaining hands-on experience with real-world enterprise operations.