Monitoring and Metrics Engineer


Job Summary

In this highly visible role, you will have the responsibility of ensuring that Apple’s world class Silicon Engineering Group will have the infrastructure and tools needed to engineer and design the worlds most advanced silicon devices and products. You will utilize your deep understanding of developing and maintaining monitoring, alerting, and metrics gathering/analysis systems for compute clusters, storage systems, networks, web infrastructure & applications, database servers, and directory services. You will utilize your extensive communication skills to interface with internal teams, enabling Apple’s world-class product development.

Key Qualifications

  • Typically requires at least 5+ years of experience in a large compute environment. Finance, Oil & Gas, Scientific Research, or R&D environments preferred.
  • You will have demonstrated skills in the following areas:
  • •Experience monitoring a large scale distributed computing environment
  • •Ability to correlate event data between network, storage, and systems
  • •Experience with Nagios/Ganglia or similar technologies like Zenoss, Groundwork, Solarwinds
  • •Experience deploying networking monitoring tools for bandwidth, sflow, latency, and snmp alerts a plus
  • •Experience automating the installation and configuration of monitoring systems / agents
  • •Ability to scale monitoring solutions geographically
  • •Experience with aggregation and visualization solutions such as Elasticsearch+Logstash+Kibana or Splunk
  • •Proficient in Ruby, Python, Perl or other common high level scripting language
  • •Experience using configuration management tools like Puppet, Chef, or CFengine
  • •Familiarity with stream processing systems like Kafka, Spark, or Storm for large-scale data analytics
  • •Familiarity with continuous integration / deployment systems such as Jenkins or Travis
  • •Linux system administration experience (RHEL/CentOS preferred)
  • •Understanding of NFS and NAS appliances (NetApp preferred)
  • •Understanding of Layer 2 / Layer 3 networking (Arista or Cisco preferred)
  • •Understanding of revision control systems (SVN, git, Perforce)
  • •Understanding of LDAP (OpenLDAP, DSEE, OpenDirectory)
  • Must be analytical and possess strong organizational / problem-solving skills.


You will be responsible for supporting an internal CSE, SRE, and Architecture teams by deploying, enhancing, maintaining, and tuning monitoring, alerting, and metrics gathering / analysis tools. Your role will directly impact the development, enhancement and maintenance of compute clusters, storage systems, network interconnects, LAMP stacks, and directory services.


MS/BS Degree or equivalent