We are looking for a savvy Data Engineer to be part of a data science and cross-functional team that typically include data scientists, engineers, software engineers, product managers, and end users on projects that involve large, complex and dynamic data sets. These project are aiming at building blocks toward Smarter Factory.
As a Data Engineer, you will need to understand production data and associated manufacturing processes; design and develop procedures and tools in a scalable fashion that customize and optimize the process of assembling, manipulating, and managing large and dynamic datasets to support large data based projects and other analytics needs.
- Leverage existing big data platform and data pipeline tools to develop optimal automated processes to assemble and process large data sets from different sources and ensure optimal and consistent delivery of analysis-ready data at scale.
- Work alongside data scientists, product engineers and software engineers to translate off-line analysis to viable product implementable in production. Strong expectation of playing as part of a team
- Identify data processing process improvement opportunities and design customized tools to turn the opportunities to applications.
- Identify data issues, evaluate their impacts on analytic projects, and find/implement solutions
- Execute complex data engineering projects
- MS in Computer Sciences, Engineering, other quantitative fields or equivalent plus a minimum of 5 years relevant experience
- Proficient in object-oriented scripting languages. 3+ year experience in Java or Python programming.
- Expert in SQL and relational databases. Hands-on experience in developing database applications.
- 3+ year experience on integrating, manipulating, processing, and extracting value from large disconnected datasets.
- Strong grasp of data structures
- Working knowledge with big data tools (Hadoop, Spark, Thrift API, HiveQL) and familiar with binary encoded files (AVRO, ORC, Parquet, etc.)
- Experience supporting and working with cross-functional teams in a dynamic environment
- Knowledge in HDD production and testing engineering or failure analysis
- Experience in R or SAS
- In-depth understanding to multivariate analysis, regression and statistical inference.
- General understanding of machine learning/deep learning methods and experience in building ML models.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift