Purple Drive

Data Engineer

Purple Drive, Boston, Massachusetts, us, 02298

Data Engineer (Local Candidates Only)

The client's Security Architecture, Analytics & Fusion Engineering (SA2FE) team is looking for a Data Engineer. The Fusion Analytics and Data Engineering team delivers models, insights, and tooling to help Cybersecurity teams make faster, more informed decisions as we work to secure the client's digital footprint. As a Data/Analytics Engineer, you will develop the data flows, analytics pipelines, and production machine-learning systemsin collaboration with data product managers, architects, engineers, and other team membersto create ETL pipelines, automation, and analytics & ML-driven data products that support our mission to build predictive models and intelligent systems that help secure State Street's information and infrastructure.

What you will be responsible for

As a Data Engineer, you will:

•Use your understanding of large scale data processing and analytics to wrangle our unique cybersecurity data and create automation, analyses and tools that point to the most significant business, governance, and risk management impacts.

•Participate in the design and buildout of petabyte scale systems for high availability, high throughput, data consistency, security, and end user privacy, defining our next generation of data analytics tooling

•Build data modeling, automation, and ELT workflows to produce Raw, Rationalized, co-Related, and Reporting data flows for graph, timeseries, structured, and semi-structured cybersecurity data

Education & Qualifications

Minimum Qualifications

•B.S., or M.S. in Computer Science or equivalent work experience

•5+ years of experience building large scale distributed systems and data analytics processes on cloud native, in-memory, and fit-for-purpose hybrid infrastructure. Experience with cybersecurity data and globally distributed log & event processing systems with data mesh and data federation as the architectural core is highly desirable.

•Experience in big data technologies like Presto/Trino, Spark & Flink, Airflow & Prefect, RedPanda & Kafka, Iceberg & Delta Lake, Snowflake & Databricks, MemGraph & Neo4J as well as modern security tooling like Splunk, Panther, Datadog, Elastic, Arcsight etc.

•Experience designing and building data warehouse, data lake or lake house using batch, streaming, lambda and data mesh solutions and with improving efficiency, scalability, and stability of system resources.

•Experience working with data warehouses or Databases like Snowflake, Redshift, Postgres, Cassandra etc

•Experience writing and optimizing complex SQL and ETL development and designing and building data warehouse, data lake or lake house solutions. Experience building Data APIs and integrations using tools like GraphQL, Apache Arrow, gRPC, ProtoBuf, designing large scale stream processing systems with Flink, Kafka, NiFI, and similar technologies.

•Experience with distributed systems and distributed data storage and large-scale data warehousing solutions, like BigQuery, Athena, Snowflake, Redshift, Presto, etc.

•Experience working with large datasets and best in class data processing technologies for both stream and batch processing, graph and time series data, notebooks and analytic visualization environments.

•Strong communication and collaboration skills particularly across teams or with functions like data scientists or business analyst.

Preferred Experience

•5+ years of experience with Python, Java, or similar languages, with cloud infrastructure (e.g. AWS, GCP, Azure), and deep experience working with big data processing infrastructures and ELT orchestration

•Experience developing distributed batch and real-time feature stores, and developing coordinated batch, streaming and online model execution workflows, building and optimizing large scale data processing jobs

in Spark, GraphX/GraphFrames, Spark Structured Streaming, as well as scaling graph and time-series native operations.

•Experience with designing for data lineage, federation, governance, compliance, security, and privacy - hands on experience with commercial DataSecOps platforms like Immuta, Satori and/or experience building custom access control (RBAC/ABAC), data masking, tokenization, and FPE systems for cloud data lake environments. Experience with globally distributed federated data systems is highly desirable.

•Experience with data quality monitoring and with building continuous data pipelines and implementing history and time-travel using modern data lake storage layers like Delta Lake, Iceberg, and LakeFS

•Experience with MLOps and iterative cycles of end-to-end development, MRM coordination, deployment, and monitoring of production grade ML models in a regulated high-growth tech environment5+ years of experience with Python, Java, or similar languages, with cloud infrastructure (e.g. AWS, GCP, Azure), and deep experience working with big data processing infrastructures and ELT orchestration

Why this role is important to us

Our technology function, Global Technology Services (GTS), is vital to the client and is the key enabler for our business to deliver data and insights to our clients. We're driving the company's digital transformation and expanding business capabilities using industry best practices and AI driven, digital-first customer experiences.

We offer a collaborative environment where technology skills and innovation are valued in a global organization. We're looking for top technical talent to join our team and deliver creative technology solutions that help us become an end-to-end, next-generation financial services company. Join us if you want to grow your technical skills, solve real problems and make your mark on our industry!

Skills:

SPLUNK , POSTGRES , BIGQUERY , GRAPHQL , DATA PROCESSING , AUTOMATION , CLOUD INFRASTRUCTURE , AZURE , JAVA , NEO4J , AWS , DATA MODELING , ML , PYTHON