- Oracle
- BENGALURU, KA
- 2 weeks ago
Job Description
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
- Design, implement, and operate scalable, secure, and highly available infrastructure for cloud and AI-driven applications on OCI.
- Apply SRE best practices including SLI/SLO definition, error budgets, automated monitoring, incident response, and post-incident reviews.
- Instrument systems using observability tools (Grafana, Prometheus, APM) to monitor performance, availability, latency, and resource utilization.
- Lead major incident management, perform deep root-cause analysis, and implement long-term preventive fixes.
- Drive large-scale noise reduction initiatives by tuning alerts, eliminating duplicate alarms, and improving monitoring quality.
- Automate common operational tasks to minimize manual intervention and improve MTTR.
Automation & DevOps
- Build and maintain automation for infrastructure provisioning, deployments, monitoring, and remediation using Terraform, Ansible, Python, Shell, or PowerShell.
- Develop CI/CD pipelines and Infrastructure-as-Code frameworks to ensure repeatable and reliable deployments.
- Identify and eliminate toil by continuously improving operational processes through automation.
- Collaborate closely with engineering, DevOps, and platform teams to improve system resilience and scalability.
- Strong problem-solving and critical-thinking skills with attention to detail.
- Proactive, solution-oriented mindset with a focus on fixing root causes.
- Passion for automation and continuous improvement.
- Ability to work effectively under pressure in high-stakes environments.
- Eagerness to learn, innovate, and mentor others.
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling 1-888-404-2494 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
