Data Engineering Manager
London / £100000 - £110000
£100000 - £110000
DATA ENGINEERING MANAGER
£100,000 - £110,000
An opportunity for a Data Engineering Manager to join an established travel business and work closely with the Head of Data to lead a team of high-performing data engineers through a cloud migration on AWS.
- Directly managing 5 Lead technical professionals and overseeing the data engineering squads.
- Leading on projects and ensuring that the data engineering team are meeting tight deadlines.
- Working with senior stakeholders to gather and deliver on technical requirements.
- Driving best practices and leading the digital transformation to AWS and implementing Databricks.
YOUR SKILLS & EXPERIENCE:
- Previous experience in Data Engineering and/or Machine Learning.
- Experience leading and managing teams.
- Strong knowledge of Python and SQL.
- Experience working in a cloud environment, ideally AWS.
- Knowledge of Databricks is preferred.
This role offers a competitive salary between £100,000 - £110,000 + extensive benefits.
HOW TO APPLY:
Please register your interest by sending your CV to Holly Neeves via the Apply link on this page.
Data Engineer Or Software Engineer: What Does Your Business Need? | Harnham US Recruitment post
We are in a time in which what we do with Data matters. Over the last few years, we have seen a rapid rise in the number of Data Scientists and Machine Learning Engineers as businesses look to find deeper insights and improve their strategies. But, without proper access to the right Data that has been processed and massaged, Data Scientists and Machine Learning Engineers would be unable to do their job properly. So who are the people who work in the background and are responsible to make sure all of this works? The quick answer is Data Engineers!… or is it? In reality, there are two similar, yet different profiles who can help help a company achieve their Data-driven goals. Data Engineers When people think of Data Engineers, they think of people who make Data more accessible to others within an organization. Their responsibility is to make sure the end user of the Data, whether it be an Analyst, Data Scientist, or an executive, can get accurate Data from which the business can make insightful decisions. They are experts when it comes to data modeling, often working with SQL. Frequently, “modern” Data Engineers work with a number of tools including Spark, Kafka, and AWS (or any cloud provider), whilst some newer Databases/Data Warehouses include Mongo DB and Snowflake. Companies are choosing to leverage these technologies and update their stack because it allows Data teams to move at a much faster pace and be able to deliver results to their stakeholders. An enterprise looking for a Data Engineer will need someone to focus more on their Data Warehouse and utilize their strong knowledge of querying information, whilst constantly working to ingest/process Data. Data Engineers also focus more on Data Flow and knowing how each Data sets works in collaboration with one another. Software Engineers – DataSimilar to a Data Engineers, Software Engineers – Data ( who I will refer to as Software Data Engineers in this article) also build out Data Pipelines. These individuals might go by different names like Platform or Infrastructure Engineer. They have to be good with SQL and Data Modeling, working with similar technologies such as Spark, AWS, and Hadoop. What separates Software Data Engineers from Data Engineers is the necessity to look at things from a macro-level. They are responsible for building out the cluster manager and scheduler, the distributed cluster system, and implementing code to make things function faster and more efficiently. Software Data Engineers are also better programers. Frequently, they will work in Python, Java, Scala, and more recently, Golang. They also work with DevOps tools such as Docker, Kubernetes, or some sort of CI/CD tool like Jenkins. These skills are critical as Software Data Engineers are constantly testing and deploying new services to make systems more efficient. This is important to understand, especially when incorporating Data Science and Machine Learning teams. If Data Scientists or Machine Learning Engineers do not have a strong Software Engineers in place to build their platforms, the models they build won’t be fully maximized. They also have to be able to scale out systems as their platform grows in order to handle more Data, while finding ways to make improvements. Software Data Engineers will also be looking to work with Data Scientists and Machine Learning Engineers in order to understand the prerequisites of what is needed to support a Machine Learning model. Which is right for your business? If you are looking for someone who can focus extensively on pulling Data from a Data source or API, before transforming or “massaging” the Data, and then moving it elsewhere, then you are looking for a Data Engineer. Quality Data Engineers will be really good at querying Data and Data Modeling and will also be good at working with Data Warehouses and using visualization tools like Tableau or Looker. If you need someone who can wear multiple hats and build highly scalable and distributed systems, you are looking for a Software Data Engineer. It’s more common to see this role in smaller companies and teams, since Hiring Managers often need someone who can do multiple tasks due to budget constraints and the need for a leaner team. They will also be better coders and have some experience working with DevOps tools. Although they might be able to do more than a Data Engineer, Software Data Engineers may not be as strong when it comes to the nitty gritty parts of Data Engineering, in particular querying Data and working within a Data Warehouse. It is always a challenge knowing which type of job to recruit for. It is not uncommon to see job posts where companies advertise that they are looking for a Data Engineer, but in reality are looking for a Software Data Engineer or Machine Learning Platform Engineer. In order to bring the right candidates to your door, it is crucial to have an understanding of what responsibilities you are looking to be fulfilled.That’s not to say a Data Engineer can’t work with Docker or Kubernetes. Engineers are working in a time where they need to become proficient with multiple tools and be constantly honing their skills to keep up with the competition. However, it is this demand to keep up with the latest tech trends and choices that makes finding the right candidate difficult. Hiring Managers need to identify which skills are essential for the role from the start, and which can be easily picked up on the job. Hiring teams should focus on an individual’s past experience and the projects they have worked on, rather than looking at their previous job titles. If you’re looking to hire a Data Engineer or a Software Data Engineer, or to find a new role in this area, we may be able to help. Take a look at our latest opportunities or get in touch if you have any questions.
The Six Steps Of Data Governance | Harnham Recruitment post
The value that data analysis can provide to organisations is becoming increasingly clear. But with all the buzz around the endless ways that data can be used to revolutionise your business processes, it can be overwhelming to know where to start. Fundamentally, what you can do with your data and how useful it may be will hinge on its quality. This is the case no matter what data you may have, whether that be customer demographics or manufacturing inventories. High-quality data is also imperative for utilising exciting and innovative new technology such as Machine Learning and AI. It’s all very well investing in tech to harness your data assets to, for example, better inform decision making, but you won’t be able to glean any useful analysis if the data is full of gaps and inconsistencies. Many will be looking at this new tech and be tempted to run before they can walk. But building quality data sets and water-tight, long-lasting processes will form the foundation for any future developments and should not be overlooked. This is where Data Governance comes into its own.Data Governance (DG) is an effective step in improving your data and turning it into an invaluable asset. It has numerous definitions but according to Data Governance Institute (DGI), “Data Governance is the exercise of decision-making and authority for data-related matters.“Essentially DG is the process of managing data during its life cycle. It ensures the availability, useability, integrity and security of your data, based on internal data standards and policies that control data usage. Good data governance is critical to success and is becoming increasingly more so as organisations face new data privacy regulations and rely on data analytics to help optimise operations and drive business decision-making. As Ted Friedman from Gartner said: ‘Data is useful. High-quality, well-understood, auditable data is priceless.’Without DG, data inconsistencies in different systems across an organisation might not get resolved. This could complicate data integration efforts and create data integrity issues that affect the accuracy of business intelligence (BI) reporting and analytics applications.Data Governance programs can differ significantly, depending on their focus but they tend to follow a similar framework:Step 1: Define goals and understand the benefits The first step of developing a strategy should be to ensure that you have a comprehensive understanding of the process and what you would like the outcome to be.A strong Data Governance strategy relies on ‘buy in’ from everyone in the business. By stressing the importance of complying with the guidelines which you will later set, you will be helping to encourage broad participation and ensure that there is a concerted and collaborated effort to maintain high standards of data quality. Leaders must be able to comprehend the benefits themselves before communicating them to their team so it may be worth investing in training around the topic.Step 2: Analyse and assess the current dataThe next step is essentially sizing up the job at hand, to see where improvements might need to be made. Data should be assessed against multiple dimensions, such as the accuracy of key attributes, the completeness of all required attributes and timeliness of data. It may also be valuable to spend time analysing the root causes of inferior data quality.Sources of poor data quality can be broadly categorised into data entry, data processing, data integration, data conversion, and stale data (over time) but there may be other elements at play to be aware of.Step 3: Set out a roadmapYour data governance strategy will need a structure in which to function, which will also be key to measuring the progress and success of the program. Set clear, measurable, and specific goals – as the saying goes – you cannot control what you cannot measure. Plans should include timeframes, resources and any costs involved, as well as identifying the owners or custodians of data assets, the governance team, steering committee, and data stewards who will all be responsible for different elements. Including business leaders or owners in this step will ensure that programs remain business-centric.Step 4: Develop and plan the data governance programBuilding around the timeline outlined you can then drill down to the nitty-gritty. DG programs vary but usually include:Data mapping and classification – sorting data into systems and classifying them based on criteria.Business glossary – establishing a common set of definitions of business terms and concepts – helping to build a common vocabulary to ensure consistency.Data catalogue – collecting metadata and using it to create an indexed inventory of available data assets.Standardisation – developing polices, data standards and rules for data use to regulate proceduresStep 5: Implement the data governance programCommunicating the plan to your team may not be a one-step process and may require a long-term training schedule and regular check-ins. The important thing to realise is that DG is not a quick fix, it will take time to be implemented and fully embraced. It also may need tweaks as it goes along and as business objectives change. All DG strategies should start small and slowly build up over time – Rome wasn’t built in a day after all. Step 6: Close the loopArguably the most important part of the process is being able to track your progress and checking in at periodic intervals to ensure that the data is consistent with the business goals and meets the data rules specified. Communicating the status to all stakeholders regularly will also help to ensure that a data quality discipline is maintained throughout.Looking for your next big role in Data & Analytics or need to source exceptional talent? Take a look at our latest Data Governance jobs or get in touch with one of our expert consultants to find out more.
Data Science Interview Questions: What The Experts Say | Harnham Recruitment post
Our friends at Data Science Dojo have compiled a list of 101 actual Data Science interview questions that have been asked between 2016-2019 at some of the largest recruiters in the Data Science industry – Amazon, Microsoft, Facebook, Google, Netflix, Expedia, etc. Data Science is an interdisciplinary field and sits at the intersection of computer science, statistics/mathematics, and domain knowledge. To be able to perform well, one needs to have a good foundation in not one but multiple fields, and it reflects in the interview. They’ve divided the questions into six categories: Machine LearningData AnalysisStatistics, Probability, and MathematicsProgrammingSQLExperiential/Behavioural QuestionsOnce you’ve gone through all the questions, you should have a good understanding of how well you’re prepared for your next Data Science interview.
Machine LearningAs one will expect, Data Science interviews focus heavily on questions that help the company test your concepts, applications, and experience on machine learning. Each question included in this category has been recently asked in one or more actual Data Science interviews at companies such as Amazon, Google, Microsoft, etc. These questions will give you a good sense of what sub-topics appear more often than others. You should also pay close attention to the way these questions are phrased in an interview. Explain Logistic Regression and its assumptions.Explain Linear Regression and its assumptions.How do you split your data between training and validation?Describe Binary Classification.Explain the working of decision trees.What are different metrics to classify a dataset?What’s the role of a cost function?What’s the difference between convex and non-convex cost function?Why is it important to know bias-variance trade off while modeling?Why is regularisation used in machine learning models? What are the differences between L1 and L2 regularisation?What’s the problem of exploding gradients in machine learning?Is it necessary to use activation functions in neural networks?In what aspects is a box plot different from a histogram?What is cross validation? Why is it used?Can you explain the concept of false positive and false negative?Explain how SVM works.While working at Facebook, you’re asked to implement some new features. What type of experiment would you run to implement these features?What techniques can be used to evaluate a Machine Learning model?Why is overfitting a problem in machine learning models? What steps can you take to avoid it?Describe a way to detect anomalies in a given dataset.What are the Naive Bayes fundamentals?What is AUC – ROC Curve?What is K-means?How does the Gradient Boosting algorithm work?Explain advantages and drawbacks of Support Vector Machines (SVM).What is the difference between bagging and boosting?Before building any model, why do we need the feature selection/engineering step?How to deal with unbalanced binary classification?What is the ROC curve and the meaning of sensitivity, specificity, confusion matrix?Why is dimensionality reduction important?What are hyperparameters, how to tune them, how to test and know if they worked for the particular problem?How will you decide whether a customer will buy a product today or not given the income of the customer, location where the customer lives, profession, and gender? Define a machine learning algorithm for this.How will you inspect missing data and when are they important for your analysis?How will you design the heatmap for Uber drivers to provide recommendation on where to wait for passengers? How would you approach this?What are time series forecasting techniques?How does a logistic regression model know what the coefficients are?Explain Principle Component Analysis (PCA) and it’s assumptions.Formulate Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) techniques.What are neural networks used for?40. Why is gradient checking important?Is random weight assignment better than assigning same weights to the units in the hidden layer?How to find the F1 score after a model is trained?How many topic modeling techniques do you know of? Explain them briefly.How does a neural network with one layer and one input and output compare to a logistic regression?Why Rectified Linear Unit/ReLU is a good activation function?When using the Gaussian mixture model, how do you know it’s applicable?If a Product Manager says that they want to double the number of ads in Facebook’s Newsfeed, how would you figure out if this is a good idea or not?What do you know about LSTM?Explain the difference between generative and discriminative algorithms.Can you explain what MapReduce is and how it works? If the model isn’t perfect, how would you like to select the threshold so that the model outputs 1 or 0 for label?Are boosting algorithms better than decision trees? If yes, why?What do you think are the important factors in the algorithm Uber uses to assign rides to drivers?How does speech synthesis works?
Data AnalysisMachine Learning concepts are not the only area in which you’ll be tested in the interview. Data pre-processing and data exploration are other areas where you can always expect a few questions. We’re grouping all such questions under this category. Data Analysis is the process of evaluating data using analytical and statistical tools to discover useful insights. Once again, all these questions have been recently asked in one or more actual Data Science interviews at the companies listed above. What are the core steps of the data analysis process?How do you detect if a new observation is an outlier?Facebook wants to analyse why the “likes per user and minutes spent on a platform are increasing, but total number of users are decreasing”. How can they do that?If you have a chance to add something to Facebook then how would you measure its success?If you are working at Facebook and you want to detect bogus/fake accounts. How will you go about that?What are anomaly detection methods?How do you solve for multicollinearity?How to optimise marketing spend between various marketing channels?What metrics would you use to track whether Uber’s strategy of using paid advertising to acquire customers works?What are the core steps for data preprocessing before applying machine learning algorithms?How do you inspect missing data?How does caching work and how do you use it in Data Science?
Statistics, Probability and MathematicsAs we’ve already mentioned, Data Science builds its foundation on statistics and probability concepts. Having a strong foundation in statistics and probability concepts is a requirement for Data Science, and these topics are always brought up in data science interviews. Here is a list of statistics and probability questions that have been asked in actual Data Science interviews.How would you select a representative sample of search queries from 5 million queries?Discuss how to randomly select a sample from a product user population.What is the importance of Markov Chains in Data Science?How do you prove that males are on average taller than females by knowing just gender or height.What is the difference between Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP)?What does P-Value mean?Define Central Limit Theorem (CLT) and it’s application?There are six marbles in a bag, one is white. You reach in the bag 100 times. After drawing a marble, it is placed back in the bag. What is the probability of drawing the white marble at least once?Explain Euclidean distance.Define variance.How will you cut a circular cake into eight equal pieces?What is the law of large numbers?How do you weigh nine marbles three times on a balance scale to select the heaviest one?You call three random friends who live in Seattle and ask each independently if it’s raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of lying. All three say “yes”. What’s the probability it’s actually raining?Explain a probability distribution that is not normal and how to apply that?You have two dice. What is the probability of getting at least one four? Also find out the probability of getting at least one four if you have n dice.Draw the curve log(x+10)
ProgrammingWhen you appear for a data science interview your interviewers are not expecting you to come up with a highly efficient code that takes the lowest resources on computer hardware and executes it quickly. However, they do expect you to be able to use R, Python, or SQL programming languages so that you can access the data sources and at least build prototypes for solutions.You should expect a few programming/coding questions in your data science interviews. You interviewer might want you to write a short piece of code on a whiteboard to assess how comfortable you are with coding, as well as get a feel for how many lines of codes you typically write in a given week. Here are some programming and coding questions that companies like Amazon, Google, and Microsoft have asked in their Data Science interviews. Write a function to check whether a particular word is a palindrome or not.Write a program to generate Fibonacci sequence.Explain about string parsing in R languageWrite a sorting algorithm for a numerical dataset in Python.Coding test: moving average Input 10, 20, 30, 10, … Output: 10, 15, 20, 17.5, …Write a Python code to return the count of words in a stringHow do you find percentile? Write the code for itWhat is the difference between – (i) Stack and Queue and (ii) Linked list and Array?
Structured Query Language (SQL)Real-world data is stored in databases and it ‘travels’ via queries. If there’s one language a Data Science professional must know, it’s SQL – or “Structured Query Language”. SQL is widely used across all job roles in Data Science and is often a ‘deal-breaker’. SQL questions are placed early on in the hiring process and used for screening. Here are some SQL questions that top companies have asked in their Data Science interviews. How would you handle NULLs when querying a data set?How will you explain JOIN function in SQL in the simplest possible way?Select all customers who purchased at least two items on two separate days from Amazon.What is the difference between DDL, DML, and DCL?96. Why is Database Normalisation Important?What is the difference between clustered and non-clustered index?
Situational/Behavioural QuestionsCapabilities don’t necessarily guarantee performance. It’s for this reason employers ask you situational or behavioural questions in order to assess how you would perform in a given situation. In some cases, a situational or behavioural question would force you to reflect on how you behaved and performed in a past situation. A situational question can help interviewers in assessing your role in a project you might have included in your resume, can reveal whether or not you’re a team player, or how you deal with pressure and failure. Situational questions are no less important than any of the technical questions, and it will always help to do some homework beforehand. Recall your experience and be prepared! Here are some situational/behavioural questions that large tech companies typically ask: What was the most challenging project you have worked on so far? Can you explain your learning outcomes?According to your judgement, does Data Science differ from Machine Learning?If you’re faced with Selection Bias, how will you avoid it?How would you describe Data Science to a Business Executive?
If you’re looking for new Data Science role, you can find our latest opportunities here. This article was written by Tooba Mukhtar and Rahim Rasool for Data Science Jojo. It has been republished with permission. You can view the original article, which includes answers to the above questions here.
CAN’T FIND THE RIGHT OPPORTUNITY?
If you can’t see what you’re looking for right now, send us your CV anyway – we’re always getting fresh new roles through the door.