Disys - Oak Brook

Bigdata Engineer

Disys - Oak Brook, Tampa

· Design interfaces to data warehouses/data storages and machine learning/Big Data applications using open source tools such as Scala, Java, Python, Perl, and shell scripting. · Design and create data pipelines to maintain stable data flow to machine learning models, both in batch mode and near real-time mode. · Interface with Engineering/Operations/System Admin/Data Scientist teams to ensure data pipelines and processes fit within the production framework. · Ensure tools and environments adhere to strict security protocols. · Deploy machine learning models and serve their outputs as RESTful API calls. · Collaborate closely with subject matter experts (SMEs) and Data Scientists to understand business needs and perform efficient feature engineering for machine learning models. · Maintain code and libraries in the code repository. · Work with system administration teams to proactively resolve issues and install tools and libraries on the AWS platform. · Research and develop architecture and solutions most appropriate for the problems at hand. · Maintain and improve tools to assist Analytics in ETL, retrospective testing, efficiency, repeatability, and R&D. · Lead by example regarding software best practices, including code style, architecture, documentation, source control, and testing. · Support the Chief Data Scientist, Data Scientists, and Big Data Engineers in creating innovative approaches to challenging problems using Machine Learning, Big Data, and Cloud technologies. · Handle ad hoc requirements to create reports for end users. Required Skills Strong skills with Apache Spark (Spark SQL) and SCALA with at least 2+ years of experience. Understanding of AWS Big Data components and tools. Strong Java skills with experience in web services and web development. Hands-on experience with model deployment. Experience with application deployment on Docker and/or Kubernetes or similar technology. Linux scripting is a plus. Fundamental understanding of AWS cloud components. 2+ years of experience in data ingestion, cleansing/processing, storing, and querying large datasets. 2+ years of experience in engineering large-scale data solutions with Java/Tomcat/SQL/Linux. Experience in data extraction, transformation, and loading (ETL) from various sources. Exposure to structured and unstructured data contents. Experience with data cleansing/preparation on Hadoop/Apache Spark Ecosystem – MapReduce, Hive, HBase, Spark SQL. Experience with distributed streaming tools like Apache Kafka. Experience with multiple file formats (Parquet, Avro, OCR). Knowledge of Agile development cycles. Efficient coding skills to optimize performance and cost on AWS. Experience in building stable, scalable, high-speed data streams and web serving platforms. Enthusiastic self-starter with teamwork skills. Graduate (MS) or Undergraduate degree in Computer Science, Engineering, or relevant field. Nice to have: Strong software development experience. Ability to write custom Map/Reduce programs for complex data processing. Familiarity with streaming data processing systems like Apache Storm or Spark Streaming. Additional Information All your information will be kept confidential according to EEO guidelines. #J-18808-Ljbffr