Connecting...

W1siziisimnvbxbpbgvkx3rozw1lx2fzc2v0cy9wzw9wbgvzb2x2zwqvanbnl2jhbm5lci1kzwzhdwx0lmpwzyjdxq

Junior Big Data Engineer

Junior Big Data Engineer

Job Title: Junior Big Data Engineer
Location: Gauteng
Industry:
Reference: JN -052019-16284
Contact Name: Nicole Vine
Contact Email: nicole.vine@peoplesolved.com
Job Published: May 22, 2019 14:51

Job Description

Responsibilities

  • Writing ETL jobs for data migration
  • Translate old SQL and ETL scripts
  • Writing scripts and data pipelines for data cleaning, transformation and data enrichment
  • Performing data validation and integrity checks
  • Performing data storage optimisation and tuning (HDFS, Hive, HBase)
  • Creating and managing schemas in different data storage systems
  • Writing and tuning SQL-based and MapReduce jobs
  • Schema partitioning, setting up buckets, indexes and appropriate replication mechanism
  • Moving data, between different systems
  • Working with workflow management tools such as Oozie and Airflow
  • Documenting data operations
  • Writing scripts in Python and Ruby
  • Writing operational guidelines, documentation and tutorials

 

Required skills

  • Computer Science related degree
  • Extensive knowledge of databases, data warehousing architecture and concepts
  • Extensive knowledge and experience working with various database technologies, relational, nosql, key-value stores
  • Experience writing ETL jobs
  • Familiar with different data structures and storage systems (columnar, nosql, etc)
  • Familiar with scripting lanauges Python and Ruby
  • Comfortable writing advanced SQL queries
  • Knowledge and preferably practical experience working with big data technologies such as Hadoop, mapreduce, Hive and Sqoop
  • Comfortable processing different file formats: text, csv, binary, etc and data serialisation
  • Familiar with workflow management tools such as Oozie and Airflow
  • Familiar with Git and related devops tasks
  • Software engineering skills
  • Ability to write clean, reliable and scalable code
  • Comfortable working with Linux and command-line
  • Familiar with data governance, metadata and data lifecycle management concepts

Get similar jobs like these by email

By submitting your details you agree to our T&C's