Data Engineer- Internet Monitoring


Amazon’s network is a key differentiator for Amazon Cloud Computing and Web Services (AWS), enabling the global operation of thousands of applications across millions of servers worldwide. The network is fundamental to the success of and hundreds of thousands of AWS Customers. And, right alongside there is THE most pervasive, important, and complex communications network in the world — the Internet.

While we tend to think of it as single entity, the Internet is actually comprised of tens of thousands of independently-administered networks, with routing protocols facilitating the constant flow of traffic within and between them. As if that weren’t complicated enough, the flow of traffic across the Internet is also subject to various political, regulatory, and business requirements. In short, even when things are humming along smoothly, the Internet is amazingly complex. Our mission is to study when and how it breaks. More specifically, we study how Amazon’s network connects to the public internet in order to detect when those connections are disrupted and impact our customers. In light of this, we are looking for a Data Engineer to join our R&D effort and help drive insight into this fruitful and wide-open problem space.

Come join us and …

  • Do what nobody else in the world is doing… literally
  • Gain world class knowledge and expertise on the inner workings of the Internet, working with top-tier Network and Software Engineers
  • Define and Develop Amazon’s Internet Monitoring architecture
  • Play in the piles of data to discover patterns that push our understanding and knowledge of Internet performance and availability anomalies
  • Build massive real-time systems which inform and drive complex changes across the Internet
  • Gain practical experience building incredible distributed systems software using Amazon Web Services

The ideal candidate will have a demonstrated affinity for data analysis, knowledge of data processing and storage best practices, experience in leverage disparate data sources, and a keen eye for data hygiene.

Basic Qualifications

  • Proven track record of automating transformation of unstructured or poorly-structured data into readily-consumable formats
  • Knowledge of relational data modeling concepts and basic SQL
  • Experience programming in a general-purpose language (e.g. Java, Scala, Python, Go, Ruby, C/C++, JavaScript, etc.)
  • Comfortable working in a Unix command-line environment
  • Basic data analysis skills
  • Strong problem-solving and troubleshooting abilities
  • Knowledge of data storage best practices and use cases

Preferred Qualifications

  • Experience in ETL pipelining and data warehousing
  • Knowledge of distributed, large-scale data storage and analysis solutions (e.g. RedShift, Dynamo, Cassandra, Spark, EMR, Hadoop)
  • Familiarity with streaming data analysis solutions (e.g. Kafka, Kinesis)
  • Web development and UI/UX experience a plus
  • Familiarity with networking protocols (TCP/IP, ICMP, BGP)

To apply for this job please visit the following URL: →