· A software engineer background with specialized skills in creating scalable, production-hardened solutions around data
· Be able to do extensive multithreading or client/server systems
· Have an interest and realize the importance and value of data analytics so that you have taken it upon yourself to create analytics or put effort into capturing data that the average software engineer wouldn’t
· Prefer candidates who have previously specialized in some form of distributed systems or big data projects
· Understand when to stick to the best practices and when it makes sense to diverge in pursuit of speed or to deliver higher scalability
· Distributed systems
An understanding of resource allocation and of network bandwidth requirements, how to create virtual machines and containers, replication, how to partition datasets and message queues, and how to handle failures and fault tolerance.
Not just a knowledge of syntax, but also responsible for continuous integration, unit tests, and engineering processes.
Range from simple counts and sums to more complex products that extract new dimensions from data, or sometimes, it's a simple report that’s given to another business unit.
· Visual communication
Show what’s happening with data, so that others can readily use the results.
· Verbal communication
Operate like internal solutions consultants in helping other teams be successful in using the data products and convey to these teams what data is available.
Not only to hold the product system together, but also make it break-free and possible to upgrade even with smallest improvements.
Help teams lay out data, creating the data definitions and designing its representation when it is stored, retrieved, transmitted, or received.
· Domain knowledge
Deeply understand the domain for which you’re creating data products. Retail, health care, banking and finance, transportation, communications and media, education...
· Other important skills
o Data Governance
How secure is the data? Should the data be masked? Does it have personally identifiable information (PII) that needs to be hidden?
o Data Lineage
Where did the data originally come from? When there is a problem with the data, where do we look back to find the source data?
o Data Metadata
What happens when you have thousands of data products? How do you keep track of their metadata and schemas?
o Discovery Systems
How do you catalog your datasets and help potential users find them?
· Create data products and the architecture to build data products
· Choose the right technologies for the data and the use cases
· Build the systems, either on-premises or in the cloud, which have data pipelines to run on and scale widely
· Take raw data and transform it so that it becomes usable by the entire organization
· Maintain the data in formats ready for use, keeping in mind the tendency for both the shape of data and the enterprise demand for it to change
· Write and test the code for data pipelines using the APIs of various tools they call on
· Master the necessary programming and to understand the distributed systems they deal with
Các công việc tương tự