Macpower Digital Assets Edge
System Administrator (GCP/AWS/Azure, PySpark, BigQuery, and Google Airflow)
Macpower Digital Assets Edge, San Jose, California, United States, 95199
Job Overview:
This role involves managing and optimizing Big Data environments (PySpark, BigQuery, Airflow) across Google Cloud, AWS, or Azure platforms, ensuring efficient, secure, and cost-effective operations. Key responsibilities include 24x7 support, data pipeline optimization, automation, and troubleshooting, with a strong emphasis on DevOps, CI/CD, and disaster recovery.
Roles and Responsibilities: (Google Cloud/AWS/Azure, PySpark, BigQuery, and Google Airflow) Participate in 24x7x365 rotational shift support and operations for SAP environments. Serve as a team lead responsible for maintaining the upstream Big Data ecosystem, handling millions of financial transactions daily using PySpark, BigQuery, Dataproc, and Google Airflow. Streamline and optimize existing Big Data systems and pipelines while developing new ones, ensuring efficient and cost-effective performance. Manage the operations team during your designated shift and make necessary changes to the underlying infrastructure. Provide day-to-day support, improve platform functionality using DevOps practices, and collaborate with development teams to enhance database operations. rchitect and optimize data warehouse solutions using BigQuery to enable efficient data storage and retrieval. Install, build, patch, upgrade, and configure Big Data applications. dminister and configure BigQuery environments, including datasets and tables. Ensure data integrity, availability, and security on the BigQuery platform. Implement partitioning and clustering strategies for optimized query performance. Define and enforce access policies for BigQuery datasets. Set up query usage caps and alerts to control costs and prevent overages. Troubleshoot issues in Linux-based systems with strong command-line proficiency. Create and maintain dashboards and reports to monitor key metrics such as cost and performance. Integrate BigQuery with other GCP services like Dataflow, Pub/Sub, and Cloud Storage. Enable BigQuery usage through tools such as Jupyter Notebook, Visual Studio Code, and CLI utilities. Implement data quality checks and validation processes to maintain data accuracy. Manage and monitor data pipelines using Airflow and CI/CD tools like Jenkins and Screwdriver. Collaborate with data analysts and scientists to gather data requirements and translate them into technical implementations. Provide guidance and support to application development teams for database design, deployment, and monitoring. Demonstrate proficiency in Unix/Linux fundamentals, scripting in Shell/Perl/Python, and using Ansible for automation. Contribute to disaster recovery planning and ensure high availability, including backup and restore operations. Experience with geo-redundant databases and Red Hat clustering is a plus. Ensure timely delivery within defined SLAs and project milestones, adhering to best practices for continuous improvement. Coordinate with support teams including DB, Google, PySpark data engineering, and infrastructure. Participate in Incident, Change, Release, and Problem Management processes. Must Have Skills, Experience:
4-8 years of relevant experience. Strong experience with Big Data technologies including PySpark, BigQuery, and Google Airflow. Hands-on expertise in cloud platforms (Google Cloud, AWS, or Azure) and Linux system troubleshooting. Proficiency in automation and DevOps tools such as Shell/Python scripting, CI/CD processes, and Ansible.
This role involves managing and optimizing Big Data environments (PySpark, BigQuery, Airflow) across Google Cloud, AWS, or Azure platforms, ensuring efficient, secure, and cost-effective operations. Key responsibilities include 24x7 support, data pipeline optimization, automation, and troubleshooting, with a strong emphasis on DevOps, CI/CD, and disaster recovery.
Roles and Responsibilities: (Google Cloud/AWS/Azure, PySpark, BigQuery, and Google Airflow) Participate in 24x7x365 rotational shift support and operations for SAP environments. Serve as a team lead responsible for maintaining the upstream Big Data ecosystem, handling millions of financial transactions daily using PySpark, BigQuery, Dataproc, and Google Airflow. Streamline and optimize existing Big Data systems and pipelines while developing new ones, ensuring efficient and cost-effective performance. Manage the operations team during your designated shift and make necessary changes to the underlying infrastructure. Provide day-to-day support, improve platform functionality using DevOps practices, and collaborate with development teams to enhance database operations. rchitect and optimize data warehouse solutions using BigQuery to enable efficient data storage and retrieval. Install, build, patch, upgrade, and configure Big Data applications. dminister and configure BigQuery environments, including datasets and tables. Ensure data integrity, availability, and security on the BigQuery platform. Implement partitioning and clustering strategies for optimized query performance. Define and enforce access policies for BigQuery datasets. Set up query usage caps and alerts to control costs and prevent overages. Troubleshoot issues in Linux-based systems with strong command-line proficiency. Create and maintain dashboards and reports to monitor key metrics such as cost and performance. Integrate BigQuery with other GCP services like Dataflow, Pub/Sub, and Cloud Storage. Enable BigQuery usage through tools such as Jupyter Notebook, Visual Studio Code, and CLI utilities. Implement data quality checks and validation processes to maintain data accuracy. Manage and monitor data pipelines using Airflow and CI/CD tools like Jenkins and Screwdriver. Collaborate with data analysts and scientists to gather data requirements and translate them into technical implementations. Provide guidance and support to application development teams for database design, deployment, and monitoring. Demonstrate proficiency in Unix/Linux fundamentals, scripting in Shell/Perl/Python, and using Ansible for automation. Contribute to disaster recovery planning and ensure high availability, including backup and restore operations. Experience with geo-redundant databases and Red Hat clustering is a plus. Ensure timely delivery within defined SLAs and project milestones, adhering to best practices for continuous improvement. Coordinate with support teams including DB, Google, PySpark data engineering, and infrastructure. Participate in Incident, Change, Release, and Problem Management processes. Must Have Skills, Experience:
4-8 years of relevant experience. Strong experience with Big Data technologies including PySpark, BigQuery, and Google Airflow. Hands-on expertise in cloud platforms (Google Cloud, AWS, or Azure) and Linux system troubleshooting. Proficiency in automation and DevOps tools such as Shell/Python scripting, CI/CD processes, and Ansible.