📧 arivu.p@live.in 📱 +1 346-599-0347

Arivazhagan Pandiyan

Site Reliability Engineer Lead | Azure Administrator | Cloud & Infrastructure Monitoring | Datadog

About Me

I'm a Site Reliability Engineer with 10+ years of experience ensuring high availability, scalability, and performance for mission-critical systems in financial and healthcare domains. My expertise spans Microsoft Azure (AZ-104 certified), automation (PowerShell, Python), and advanced monitoring tools like Datadog, Dynatrace, Zabbix, and PagerDuty.

I specialize in:

  • Designing fault-tolerant architectures that improve uptime and reliability.
  • Driving automation initiatives that save hundreds of hours annually.
  • Reducing MTTR through proactive monitoring and incident management.
  • Collaborating across global teams to implement process improvements and optimize infrastructure.

Passionate about building resilient systems and leveraging data-driven insights to enhance operational efficiency. Always exploring new technologies and best practices to stay ahead in the evolving cloud and SRE landscape.

Professional Experience

Site Reliability Engineer Lead

Aug 2023 – Jul 2024

New American Funding

  • SRE Leadership: Defined SLIs/SLOs and led incident response, post-mortems, and Root Cause Analysis (RCA).
  • Cloud/Reliability: Designed scalable, highly-available systems on Microsoft Azure.
  • Automation: Drove automation of repetitive tasks and deployment tools.

Technology Operations Associate

Oct 2017 – Oct 2022

Wells Fargo India Solutions

  • Production Support: Managed and monitored internal servers, applications, and hardware using ServiceNow.
  • Scripting: Used PowerShell for server troubleshooting (CPU/Disk/Memory).
  • Vendor Management: Coordinated extensively with vendors (IBM, DELL, HDS, Brocade) on hardware failures and production issues.

System Administrator

Sept 2015 – Oct 2017

NTT Data Global Delivery

  • Server Management: Monitored and managed live Windows and Linux production servers.
  • Process Improvement: Managed the development and rollout of IT Service Delivery tools, focusing on change management and automation strategies.

Support Engineer

Nov 2014 – Sept 2015

First source solution

  • Server Monitoring: Managed background applications and monitored servers; addressed alerts for Disk, CPU, and Memory utilization using XSmart-control up.
  • Customer Service: Handled customer inquiries and utilized the Kayako ticketing system.

Technical Assistant Engineer

June 2013 – Aug 2014

Cliptos Technologies

  • Technical Support: Installed and maintained software packages.
  • Healthcare Solutions: Assisted users with application installation and configuration, particularly for the Healthcare domain (MEDITOS Clinical Management Solutions).

Technical Skills

Core Skills

Microsoft Azure Team Leadership Cloud Administration Uptime Zabbix Dynatrace Datadog Incident Management PagerDuty System Monitoring PowerShell Azure Appinsight Scrum Azure Alert Kusto Query Language (KQL) Python Shell Scripting

Certifications

🏆 AZ-104 Azure Administrator Associate
📊 Datadog Fundamentals Certification

Operating Systems

Windows Linux UNIX Azure VMware

Vendors

HP IBM DELL HDS Brocade Pure Storage NetApp Cisco Oracle

Ticketing Tools

BMC Remedy Pac 2k ServiceNow Kayako Jira Incident Management

Alert Tools

IBM Netcool Xsmart ControlUp OpsGenie PagerDuty

Let's Connect