Infinite Computer Solutions
Job description
Data Pipeline Development: Design and implement robust, scalable ETL/ELT pipelines using GCP Dataflow, Dataproc, and BigQuery. Develop streaming data ingestion and real-time processing pipelines using Pub/Sub and Cloud Functions. Integrate structured and semi-structured data from Oracle databases and AWS S3 buckets into the GCP ecosystem. Cloud Platform Engineering:
Build and manage data solutions using GCP-native services with a strong focus on automation and performance. Leverage BigQuery for building analytical data warehouses and data marts. Use Cloud Storage, Cloud Composer, and Data Catalog for pipeline orchestration, metadata management, and storage. Integration & Interoperability:
Build cross-cloud data integration between AWS S3 and GCP services. Extract data from Oracle databases, apply transformation logic, and load into GCP-based stores using native or hybrid connectors. Performance Optimization & Monitoring:
Optimize query performance and storage costs in BigQuery. Set up monitoring, logging, and alerting for data pipelines using Stackdriver, Cloud Logging, and Monitoring APIs. Security & Governance:
Implement fine-grained access control, data encryption, and secure data handling practices as per compliance requirements (e.g., HIPAA, GDPR). Collaborate with DevOps and security teams to maintain secure environments and adhere to best practices. Required Qualifications:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field. 5+ years of professional experience in Data engineering. Hands-on expertise in GCP native tools including:
BigQuery Dataflow (Apache Beam) Dataproc (Apache Spark/Hadoop) Cloud Functions Pub/Sub Cloud Storage
Experience in cloud orchestration tools like Cloud Composer (Airflow). Solid experience in data ingestion from Oracle DBs using connectors, Dataflow templates, or batch/CDC mechanisms. Experience integrating with AWS S3 buckets (reading/writing/parsing) from GCP workloads. Proficiency in SQL, Python, and optionally Java or Scala. Strong understanding of data modeling, data warehousing, and distributed data processing. Knowledge of CI/CD pipelines for data engineering using tools like Terraform, Cloud Build, or GitHub Actions.
Preferred Qualifications:
GCP Certification: Professional Data Engineer or Cloud Architect. Experience with hybrid cloud or multi-cloud data strategies. Familiarity with Terraform/IaC for provisioning GCP resources. Exposure to data quality frameworks and metadata management tools. Soft Skills:
Strong analytical, problem-solving, and communication skills. Proven ability to collaborate in a cross-functional agile environment. Self-starter with the ability to manage multiple tasks and deliver high-quality results under tight deadlines.
Data Pipeline Development: Design and implement robust, scalable ETL/ELT pipelines using GCP Dataflow, Dataproc, and BigQuery. Develop streaming data ingestion and real-time processing pipelines using Pub/Sub and Cloud Functions. Integrate structured and semi-structured data from Oracle databases and AWS S3 buckets into the GCP ecosystem. Cloud Platform Engineering:
Build and manage data solutions using GCP-native services with a strong focus on automation and performance. Leverage BigQuery for building analytical data warehouses and data marts. Use Cloud Storage, Cloud Composer, and Data Catalog for pipeline orchestration, metadata management, and storage. Integration & Interoperability:
Build cross-cloud data integration between AWS S3 and GCP services. Extract data from Oracle databases, apply transformation logic, and load into GCP-based stores using native or hybrid connectors. Performance Optimization & Monitoring:
Optimize query performance and storage costs in BigQuery. Set up monitoring, logging, and alerting for data pipelines using Stackdriver, Cloud Logging, and Monitoring APIs. Security & Governance:
Implement fine-grained access control, data encryption, and secure data handling practices as per compliance requirements (e.g., HIPAA, GDPR). Collaborate with DevOps and security teams to maintain secure environments and adhere to best practices. Required Qualifications:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field. 5+ years of professional experience in Data engineering. Hands-on expertise in GCP native tools including:
BigQuery Dataflow (Apache Beam) Dataproc (Apache Spark/Hadoop) Cloud Functions Pub/Sub Cloud Storage
Experience in cloud orchestration tools like Cloud Composer (Airflow). Solid experience in data ingestion from Oracle DBs using connectors, Dataflow templates, or batch/CDC mechanisms. Experience integrating with AWS S3 buckets (reading/writing/parsing) from GCP workloads. Proficiency in SQL, Python, and optionally Java or Scala. Strong understanding of data modeling, data warehousing, and distributed data processing. Knowledge of CI/CD pipelines for data engineering using tools like Terraform, Cloud Build, or GitHub Actions.
Preferred Qualifications:
GCP Certification: Professional Data Engineer or Cloud Architect. Experience with hybrid cloud or multi-cloud data strategies. Familiarity with Terraform/IaC for provisioning GCP resources. Exposure to data quality frameworks and metadata management tools. Soft Skills:
Strong analytical, problem-solving, and communication skills. Proven ability to collaborate in a cross-functional agile environment. Self-starter with the ability to manage multiple tasks and deliver high-quality results under tight deadlines.