Career Profile

Site Reliability Engineering Leader with several years of resourceful experience in driving excellence for reliability through maintenance of SLAs, efficient processes, monitoring implementation,automation development, engineering reliability back into applications and maximizing performance

  • Ability to post-mortem the unexpected incidents to solve future hazards
  • Skilled evaluating new possibilities and capacity planning aptitudes
  • Comfortable with handling the operations, monitoring and alerting
  • Knowledge and experience in building processes and automation to support other teams
  • Ability to persuade organizations to do what needs to be done

Experiences

Sr. Cloud Reliability Engineering Manager

2019 - Present
Oracle, Bangalore
  • Drive the adoption and implementation of SRE function and associated tooling to improve resiliency and reliability in Construction Engineering GBU
  • Improve tooling and instrumentation to implement Chaos Engineering
  • Understand technical architectures, capacity plans, tooling needs, automation plans, product launch plans, and other issues and create comprehensive plans for prioritizing technical and resourcing challenges
  • Work closely with recruiting staff to expand the team, including sourcingcandidates, interviewing candidates, participating in conferences/events, and on-boarding new employees

Sr. Site Reliability Engineering Manager

2018 - 2019
Mediakind (Formerly Ericsson global Media Solutions), Bangalore
  • The SRE group is focused on improving the availability and responsiveness of internal and external components and Platforms through the application of engineering best practices, tooling and instrumentation advances and cross organizational coordination.
  • Responsible for developing and managing a team of engineers who are focused on Site and Service Reliability.
  • Lead a team of System Reliability Engineers responsible for supporting services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, automation, and release
  • Handling communication and providing transparency on major site issues to the executive management team and rest of the organization
  • Maintain the relationship with any relevant service providers (internal or external), keeping them accountable to the agreed SLAs
  • Attention to detail and accuracy and ability to spot long term trends in a production enterprise environment
  • Effectively respond to Monitoring alerts, incident tickets, email requests or other channels coming in to Site Reliability Engineering team.
  • Contribute to the development of new principles and concepts. Develop and implement policies, procedures, and standards

DevOps Manager

2016 - 2018
McAfee, Bangalore
  • Lead a team of System Reliability Engineers to scale systems sustainably through mechanisms such as automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Interface with Dev/QA/OPS teams to identify root cause analysis and instrument triggers to prevent future network degradation and outages
  • Provide leadership and direction to SRE staff that are responsible for break- fix, uptime and reliability for core services, distribution, and customer access network elements and related interfaces
  • Work closely with recruiting staff to expand the team, including sourcing candidates, interviewing candidates, participating in conferences/events, and on-boarding new employees.

Site Reliability Engineer

2012 - 2016
SAP ARIBA, Bangalore
  • Escalate issues as needed to product development or service engineering team per documented procedures, while at the same time establishing a contingency plan to eliminate any intermittent service disruption
  • Document and detail areas of improvement to bolster architecture, design, technical requirements and service specifications.
  • Present architecture, design, and technical choices to internal audiences Design and deploy metrics, monitoring, and logging systems on AWS / Infra systems to understand the system performance and isolate bottlenecks.
  • helps drive efforts to improve triage time and bring down MTTR (Mean Time to Repair) and provides follow-up support to provide mitigation in the future
  • Proactively monitor availability and performance of the SAP ARIBA cloud products using the required toolset
  • Effectively respond to Monitoring alerts, incident tickets, email requests or other channels coming in to Site Reliability Engineering team

Tech Lead, Systems Engineering & Operations

2012 - 2016
Yahoo, Bangalore
  • Leading the Service Engineering and Operations Efforts for Producers Desktop(PD2 and Editopia) - state-of-the-art editorial tools that enable Yahoo! editors deliver the best viewing experience in key landing sites such as www.yahoo.com, news.yahoo.com
  • Worked on Y! Post, a platform for feeds acquisition and processing system based on the Map-Reduce paradigm.
  • Applied best practices at all times and encouraged others to do same, which helped to maintain effective security.
  • Motivated software engineers who are passionate about architecture, defining issues, and building at scale in cloud environments.
  • Oversee all testing and troubleshooting for Y ! Post application and documented issue resolutions for development team.

Manager,Data Center

2008 - 2008
Bank of Maharashtra (PSB), Pune
  • Led team for 100% uptime of Core Banking System.Apart from systems management, I was also involved in vendor management and general management.
  • Optimized staff productivity by managing inter-team conflict resolution, yearly performance reviews, hiring and terminating processes, training initiatives, scheduling, time and attendance and payroll.
  • Oversaw Bank team with 6 customer service representatives and 5 vendor resources and implemented training for all new CBS employees.
  • Developed quality assurance controls to maintain consistent design approach and effective results.
  • Provided strong program leadership to improve development and drive continuous improvement of Core banking System.

Junior Engineer , IT

2004 - 2008
Ordance Factory Board (Min of Defence), Govt of India
  • As the core member of OFB COMNET Team, I was Instrumental during commissioning, implementation and testing phases of it.
  • Configured and deployed HelpDesk tracking software with modules for inventory monitoring, staff administration, customer relationship management and report generation.
  • Installing, configuring and Hardening the operating systems RHEL 4/5, SCO Unix as per the customer standards.
  • Proactive maintenance on systems by timely upgrading patches in SCO Unix and RHEL.
  • Colloborated with team engineers and other personnel to implement operating procedures, resolve system malfunctions, and provide technical information.
  • Simplified complex specifications into logical design requirements for implementation into production.
  • Researched new technologies and assessed feasibility for inclusion in new concepts.

Skills & Proficiency

Observability

Cloud Technologies

Devops and Agile

Python

Team building and process improvement

Problem Management and RCA