**CAN'T SIT REMOTELY OUT OF CALIFORNIA**
Requirements for This Role:
- Experience: 7+ years working with modern data technologies and/or building data-intensive distributed systems.
- Programming Skills: Expert-level proficiency in Java/Scala or Python, with a proven ability to write high-quality, maintainable code.
- Database and Scripting: Strong knowledge of SQL and Bash.
- Cloud Technologies: Experience in leveraging and building cloud-native technologies for scalable data processing.
- Data Systems: Previous experience with both batch and streaming systems, understanding their limitations and challenges.
- Data Processing Technologies: Familiarity with a range of technologies such as Flink, Spark, Polars, Dask, etc.
- Data Storage Solutions: Knowledge of various storage technologies, including S3, RDBMS, NoSQL, Delta/Iceberg, Cassandra, Clickhouse, Kafka, etc.
- Data Formats and Serialization: Experience with multiple data formats and serialization systems like Arrow, Parquet, Protobuf/gRPC, Avro, Thrift, JSON, etc.
- ETL Pipelines: Proven track record of managing complex data ETL pipelines using tools like Kubernetes, Argo Workflows, Airflow, Prefect, Dagster, etc.
- Schema Governance: Prior experience dealing with schema governance and schema evolution.
- Data Quality Control: Experience in developing data quality control processes to detect and address data gaps or inaccuracies.
- Mentorship: A desire to mentor less experienced team members and promote best practices and high standards in code quality.
- Problem-Solving: Strong technical problem-solving abilities.
- Agile Environment: Proven capability to work in an agile, fast-paced environment, prioritize multiple tasks and projects, and handle the demands of a trading environment efficiently.