Big Data Engineer
About The Position
ThetaRay is the leading provider of AI-based Big Data analytics.
We are dedicated to helping financial organizations combat financial cybercrimes such as money laundering and fraud facilitating malicious crimes such as terrorist financing, narco trafficking, and human trafficking which negatively impact the global economy.
Our Intuitive AI solutions for Anti Money Laundering and Fraud Detection enable clients to manage risk, detect money laundering schemes, uncover fraud, expose bad loans, uncover operational issues, and reveal valuable new growth opportunities.
We are looking for a Big Data Engineer to join our growing team of data experts.
The hire will be responsible for designing, implementing, and optimizing ETL processes and data pipeline flows within the ThetaRay system.
The ideal candidate has experience in building data pipelines and data transformations enjoy optimizing data systems and building them from the ground up.
The Big Data Engineer will support our data scientists with the implementation of the relevant data flows based on the data scientist’s features design.
They must be self-directed and comfortable supporting multiple production implementations for various use cases, part of which will be conducted on-premise at customer locations.
● Implement and maintain data pipeline flows in production within the ThetaRay system based on the data scientist’s design.
● Design and implement solution-based data flows for specific use cases, enabling applicability of implementations within the ThetaRay product.
● Building a Machine Learning data pipeline.
● Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
● Work with product, R&D, data, and analytics experts to strive for greater functionality in our systems.
● Train customer data scientists and engineers to maintain and amend data pipelines within the product.
● 1+ years of hands-on experience in working with Apache Spark cluster
● 1+ years of Hands-on experience and knowledge of Spark scripting languages: PySpark/Scala/Java/R
● 2+ years of Hands-on experience with SQL
● 1+ years of experience with data transformation, validations, cleansing, and ML feature engineering in a Big Data Engineer role
● BSc degree or higher in Computer Science, Statistics, Informatics, Information Systems, Engineering, or another quantitative field.
● Experience working with and optimizing ‘big data’ data pipelines, architectures, and data sets.
● Strong analytic skills related to working with structured and semi-structured datasets.
● Build processes supporting data transformation, data structures, metadata, dependency, and workload management.
● Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
● Business oriented and able to work with external customers and cross-functional teams.
● Willingness to travel abroad to customers as needed (up to 25%)
Nice to have
● Experience with Linux
● Experience in building Machine Learning pipeline
● Experience with Elasticsearch
● Experience with Zeppelin/Jupyter
● Experience with workflow automation platforms such as Jenkins or Apache Airflow
● Experience with Microservices architecture components, including Docker and Kubernetes.