With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.
Visit our Blogs & News portal or check out our recent posts below.
With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.
Visit our Blogs & News portal or check out our recent posts below.
Businesses are recognizing the increasing importance of data experts to help the company grow. As a result, the hiring demand for Data Scientists and Data Management Analysts has grown by 46% since 2019. This projection will only continue to rise in the next few years. So if you’re planning to become a data analyst or a data scientist, then here’s what you need to know. Data Analytics and Data Science: What's the Difference? Data Analysts and Data Scientists are both proficient in statistics and experienced in using database management systems. However, the key differences between these two professions revolve around their purpose for using the data. The Role of a Data Analyst These professionals organize and examine structured data to create solutions that will drive a business’ growth. They are tasked with studying sets of data using various tools, such as Excel and SQL, to uncover insights and trends that will serve as an answer to certain queries. For example, they can provide data-driven answers that can explain your marketing campaigns’ conversion rates or improve the logistics of your products. Then, they present these findings to concerned individuals and departments so they can formulate strategies that would boost revenue, efficiency, and other improvements. The Role of a Data Scientist Data Scientists are required to use their mathematical and programming skills to build statistical models that can provide solutions for a company’s potential problems. These professionals handle huge sets of both structured and unstructured data and prepare these for processing and analysis. They have to be very proficient in programming to utilize Predictive Analytics, statistics, and Machine Learning in unearthing meaningful insights from all the collected data. Their multidisciplinary approach towards data helps them draw conclusions that are valuable for specific business needs and goals. Career Paths for Aspiring Data Analysts Businesses, governments, and other institutions are on the search for individuals who are qualified in interpreting and communicating data. Data analysts are often offered huge salaries and great work benefits because the demand is so high and yet, the pool of talent is very limited. You can become qualified for a wide array of careers in data analytics through a comprehensive master’s degree program that will teach you how to interpret data and present actionable insights. These careers span from digital marketers to quantitative analysts. Graduates can work in governments and insurance companies as financial analysts who are in charge of assessing financial statements and economic trends to boost profit. On the other hand, you can also work as a marketing analyst whose responsibilities involve monitoring sales venues and evaluating consumer data. Their salaries range from $62,000 (Insight Analysts) to as much as $225,000 (highly paid Customer Analysts). Career Paths for Aspiring Data Scientists Data Scientists are experts in statistical analysis and in programming languages, such as Python and R. Thus, the average starting salary for professionals in this field is around $100,000 per year. Data Scientists would need to earn a bachelor’s degree and a master’s degree in computer science so that they would be adept at using complex software programs that are necessary for the position. If you’re more interested in software development, then you can work as a data engineer. These professionals create infrastructures that can gather and store data that analysts and other scientists may need to use. Data modellers, on the other hand, use techniques and databases to design and document data architecture. You can become a great asset to top companies in the US by pursuing a degree and a career in data analytics or data science. In this digital age, you can only expect that the demand for these positions would rise as data becomes increasingly important in driving business growth. Written by Jena Burner for harnham.com
28. May 2021
This week's guest blog is written by Moray Barclay. Around 20 years ago I was showing some draft business plans with cashflow projections to my new boss. His name was Marc Destrée and I concluded by saying I’d like to get the finance department involved. “No”, Marc replied. He paused for several seconds, looked up from his desk, and explained "Do the internal rate of return. Then we discuss. Then we give it to finance." He was right of course, for three reasons which together represent best practice. Firstly, it cemented the separate accountabilities between the different job functions responsible for the business case and financial governance. Secondly, there were no technical barriers to separating the “cashflow creation process” and the “P&L creation process” as everyone in the organisation used the same product: Excel. Thirdly, it assigned the right skills to activities. Today, organisations have no equivalent best practice upon which to build their data analytics capability. The lack of best practice is caused by fragmentation: fragmentation of job functions, fragmentation of products, and fragmentation of skills. This is not necessarily a bad thing: fragmentation drives innovation, and those organisations who get it right will gain huge competitive advantage. But the application of best practice mitigates against unnecessary fragmentation and hence unnecessary inefficiencies. So how could best practice be applied to an organisation’s data analytics capability? In other words, how we do defragment data job functions, data products and data skills? Defragmenting data job functions A good starting point to understanding best practice for data job functions is the informative and well-written publication “The scientist, the engineer and the warehouse”, authored by the highly respected Donald Farmer of TreeHive Strategy. He includes references to four job functions: (i) the data scientist, (ii) the data engineer, (iii) the business intelligence analyst and (iv) the departmental end user. (i) The data scientist: The accountability of the data scientist is to build data science models using their skills in maths and coding to solve business problems. In addition to using open source technologies, such as python and R, data scientists can and do use data science platforms such as Knime which enable them to spend more time on maths and less time on coding - more on data science platforms later. (ii) The data engineer: The accountability of the data engineer is to build robust and scalable data pipelines which automate the movement and transformation of data across the organisation’s infrastructure, using their skills in database engineering, database integration, and a technical process called extract/transform/load (ETL) and its variants – more on ETL production platforms later. (iii) The business intelligence (BI) analyst: Donald Farmer’s publication does not address the accountabilities of the BI analyst in any detail because that is not its focus. Unlike the clearly defined roles of data scientists and data engineers, there are no best practice descriptions for the role of BI analyst. Typical accountabilities often include designing data visualisations from existing datasets, building these visualisations into reports or online dashboards and automating their production, and configuring end users to ensure they only have access to data that they are approved to see. Beyond these core accountabilities, BI analysts sometimes create entirely new datasets by building complex analytic models to add value to existing datasets, using either a suitable open source technology (such as python, but used in a different way to data scientists) or a data analytic platform such as Alteryx which enables the creation of code-free analytic models. One final point - a BI analyst might also build data science models, albeit typically more basic ones than those built by data scientists. BI analysts will inevitably become more like data scientists in the future driven by their natural curiosity and ambitions, vendors creating combined data science platforms and data analytic platforms, and organisations wanting to benefit from the integration of similar functions. (iv) The departmental end-user: A departmental end-user is generally the most data-centric person within a department: it might be a sales operations professional within a sales department for example. I am told that when Excel was first introduced into organisations in the 1980’s, there would be a “go-to Excel expert”; self-evidently over time everyone learned how to use it. I was there when CRM systems like salesforce.com and Netsuite appeared 20 years later, and the same thing happened: initially there would be one or two pioneers, but eventually everyone learned to use it. The same democratisation is happening and will continue to happen with business intelligence. In the same way that CRM and Excel are used by everyone who needs to, soon anyone will be able to build their own data visualisations and reports to help identify and solve their own problems. In some organisations such as BP this is already well-established. And why stop there? If a departmental end-user can model different internal rates of return and create visualisations, then why should they not apply their own data science techniques to their own datasets? But this can only happen if the role of the BI analyst has an accountability for democratisation, in addition to those mentioned earlier.In summary, the following is a list of best practice accountabilities for the BI analyst: (1) Build and automate the initial set of business intelligence reports and visualisations (2) Create the data governance framework to enable self-service by departmental end-users (3) Act as the initial go-to business intelligence expert (4) Evangelise a data-driven culture and mentor those who want to become proficient in self-service (5) Deploy resources which over time make redundant the role of a go-to business intelligence expert (6) Over time, increase time devoted to creating innovative datasets by building complex analytic models which add value to existing datasets - using open source technologies and/or a data analytic platform (7) Work with the data science function in such a way that over time the data science function and the BI function can be merged The above best practice eventually results in the role of the BI analyst, or the BI analyst team, becoming redundant, much in the way that the role of a dedicated Excel specialist died out in the mid-1980’s. As mentioned earlier, as BI analysts will move into data science, this should not result in people losing their jobs. Defragmenting data products Unlike open source technologies there is a highly fragmented data product landscape. Products include data science platforms, data analytic platforms, platforms which are more visualisation-centric, and platforms which are more focused on data governance. There are also ETL production platforms which are in the domain of the data engineer but which include functionality to build some types of analytic models. Fragmented markets eventually consolidate. Even the broadest three cloud vendors, Amazon, Google and Microsoft, do not cover the entire landscape. For visualisation there is Quicksight, Data Studio, and Power BI respectively as well as competitive products, most obviously Tableau; for ETL production platforms there is Athena, Cloud Dataflow and Azure Data Factory, as well as competitive products such as Talend. But smaller vendors have the lead in data science platforms and data analytic platforms. The hiring by Microsoft of the python inventor Guido van Rossum two months ago points to their ambitions in data science platforms and data analytic platforms. Market consolidation in 2021 seems inevitable, but the details of actual acquisitions are not obvious. After all, it was salesforce.com which bought Tableau in 2019: not Amazon, Google or Microsoft. Best practice for organisations is to consider possible vendor consolidation as part of their procurement process, because product fragmentation means there is a corresponding fragmentation of skills. Defragmenting data skills Fragmentation of data skills means that the market for jobs, particularly contract jobs, is less elastic than it could be. The fragmentation of skills is partly caused by the fragmentation of products and their associated education resources and certification. Vendor’s product pricing typically falls into three categories: (i) more expensive commercial products (c. £500 - £5000 per user per month) which include free online education resources and certification; (ii) inexpensive commercial products (c. £5 to £50 per user per month) which usually require a corporate email address but have free online education resources and reasonably-priced certification exam fees (c £100- £200); and (iii) products which are normally expensive but have an inexpensive licensed version that cannot be used for commercial purposes, again including free online education resources and certification. The latter approach is best practice for solving the fragmentation of skills because the barriers to learning (i.e. high product cost or the need for a corporate email address) are removed. Best practice includes the Microstrategy Analyst Pass, which is available to anyone and costs $350 per year including a non-commercial product licence, online education resources and access to certification exams. University students (as well as self-educated hackers) learn open source technologies and one would expect that those skills are sufficient for them to enter the workplace in any data analytics environment. Yet several vendors who provide the more expensive commercial products (c. £500 - £5000 per user per month) and do not have discounted licences for non-commercial purposes make one exception: universities. At face value, this seems benign or even generous. But it contributes to the inelasticity of the job market at graduate level because an unintended consequence is that some graduate data analytics jobs require the graduate to be competent in a product before they have started work. Best practice is for organisations to employ graduates based on their skills in maths, statistics and open source technologies, not product. In seeking corporate acquisitions, vendors might find that their customers value “education bundling” as much as “product bundling”. Customers who are happy to pick, for example, the best visualisation product and the best data storage product from different vendors might be more attracted to their people using a single education portal with the same certification process across all products. And if an organisation can allocate 100% of its education budget to a single vendor then it will surely do so. Best practice is for vendors to consider the value of consolidating and standardising education resources, and not just products, when looking at corporate acquisitions. Defragmentating data analytics The consequence of implementing a best practice data analytics capability based on the principles of defragmentation has profound consequences for an organisation. It enables a much richer set of conversations to the one which took place 20 years ago. A young business development manager is showing some draft business plans to their new boss. They conclude by saying they’d like to get a data scientist involved. “No”, the boss replies. He pauses for several seconds, looks up from his desk and explains "Segment our customer base in different ways using different clustering techniques. Then run the cashflow scenarios. Then we discuss. Then we give it to data science." You can view Moray's original article here. Moray Barclay is an Experienced Data Analyst working in hands-on coding, Big Data analytics, cloud computing and consulting.
12. January 2021
This week's guest post is written by Moray Barclay. Two things have caused the UK’s Test & Trace application to lose 16,000 Covid-19 test results, both of which are close to my heart. The first is the application’s data pipeline, which is broken. The second is a lack of curiosity. The former does not necessarily mean that a data application will fail. But when compounded by the latter it is certain. Data Pipelines All data applications have several parts, including an interesting part (algorithms, recently in the news), a boring part (data wrangling, never in the news), a creative part (visualisation, often a backdrop to the news), and an enabling part (engineering, usually misunderstood by the news). Data engineering, in addition to the design and implementation of the IT infrastructure common to all software applications, includes the design and implementation of the data pipeline. As its name suggests, a data pipeline is the mechanism by which data is entered at one end of a data application and flows through the application via various algorithms to emerge in a very different form at the other end. A well architected data application has a single pipeline from start to finish. This does not mean that there should be no human interaction with the data as it travels down the pipeline but it should be limited to actions which can do no harm. Human actions which do no harm include: pressing buttons to start running algorithms or other blocks of code, reading and querying data, and exporting data to do manual exploratory or forensic analysis within a data governance framework. The data pipeline for Test & Trace will look something like this: a patient manually fills out a web-form, which automatically updates a patient listfor each test, the laboratory adds the test result for that patientthe lab sends an Excel file to Public Health England with the ID’s of positive patientsPHE manually transpose the data in the Excel file to the NHS Test & Trace systemthe NHS T&T system pushes each positive patient contact details to NHS T&T agentsfor each positive patient, an NHS T&T contact centre agent phones them. This is a not a single pipeline because in the middle a human being needs to open up an editable file and transpose it into another file. The pipeline is therefore broken, splitting at the point at which the second Excel file is manually created. If you put yourself in the shoes of the person receiving one of these Excel files, you can probably identify several ways in which this manual manipulation of data could lead to harm. And it is not just the data which needs to be moved manually from one side of the broken pipeline to the other side, it is the associated data types, and CSV files can easily lose data type information. This matters. You may have experienced importing or exporting data with an application which changes 06/10/20 to 10/06/20. Patient identifiers should be of data type text, even if they consist only of numbers, for future-proofing. Real numbers represented in exponential format should, obviously, be of a numeric data type. And so on. One final point: the different versions of Excel (between the Pillar 2 laboratories and PHE) are a side-show, because otherwise this implies that had the versions been the same, then everything would be fine. This is wrong. The BBC have today reported that “To handle the problem, PHE is now breaking down the test result data into smaller batches to create a larger number of Excel templates. That should ensure none hit their cap.” This solves the specific Excel incompatibility problem (assuming the process of creating small batches is error-free) but has no bearing on the more fundamental problem of the broken data pipeline, which will stay until the manual Excel manipulation is replaced by a normal and not particularly complex automated process. Curiosity So where does curiosity fit in? The first thing that any Data Analyst does when they receive data is to look at it. This is partly a technical activity, but it is also a question of judgement and it requires an element of curiosity. Does this data look right? What is the range between the earliest and the latest dates? If I graph one measurement over time (in this case positive tests over time), does the line look right? If I graph two variables (such as Day Of Week versus positive tests) what does the scatter chart look like? Better still, if I apply regression analysis to the scatter chart what is the relationship between the two variables and within what bounds of confidence? How does that relate to the forecast? Why? This is not about skills. If I receive raw data in csv format I would open it in a python environment or an SQL database. But anyone given the freedom to use their curiosity can open a csv file in Notepad and see there are actually one million rows of data and not 65,000. Anyone given the freedom to use their curiosity can graph data in Excel to see whether it has strange blips. Anyone given the freedom to use their curiosity can drill down into anomalies. Had those receiving the data from the Pillar 2 laboratories been allowed to focus some of their curiosity at what they were receiving they would have spotted pretty quickly that the 16,000 patient results were missing. As it was, I suspect they were not given that freedom: I suspect they were told to transpose as much data as they could as quickly as possible, for what could possibly go wrong? Single Data Pipeline, Singular Curiosity: Pick At Least One To reiterate, the current problems with T&T would never have arisen with a single data pipeline which excluded any manual manipulation in Excel. But knowing that the data pipeline was broken and manual manipulation was by design part of the solution, the only way to minimise the risk was to encourage people engaged in that manual process to engage their curiosity about the efficacy of the data they were manipulating. In their prototype phases – for that is the status of the T&T application - data projects will sometimes go wrong. But they are much more likely to go wrong if the people involved, at all levels, do not have enough time or freedom to think, to engage their curiosity, and to ask themselves “is this definitely right?” You can view Moray's original article here. Moray Barclay is an Experienced Data Analyst working in hands-on coding, Big Data analytics, cloud computing and consulting.
15. October 2020
As Big Data can reveal patterns, trends and associations relating to human behaviour and interactions, it’s no surprise that Data & Analytics are changing the way that the supply chain sector operates today. From informing and predicting buying trends to streamlining order processing and logistics, technological innovations are impacting the industry, boosting efficiency and improving supply chain management. Analysing behavioural patterns Using pattern recognition systems, Artificial Intelligence is able to analyse Big Data. During this process, Artificial Intelligence defines and identifies external influences which may affect the process of operations (such as customer purchasing choices) using Machine Learning algorithms. From the Data collected, Artificial Intelligence is able to determine information or characteristics which can inform us of repetitive behaviour or predict statistically probable actions. Consequently, organisation and planning can be undertaken with ease to improve the efficiency of the supply chain. For example, ordering a calculated amount of stock in preparation for a busy season can be made using much more accurate predictions - contributing to less over-stocking and potentially more profit. As a result, analysing behavioural patterns facilitates better management and administration, with a knock-on effect for improving processes. Streamlining operations Using image recognition technology, Artificial Intelligence enables quicker processes that are ideally suited for warehouses and stock control applications. Additionally, transcribing voice to text applications mean stock can be identified and processed quickly to reach its destination, reducing the human resource time required and minimising human error. Artificial intelligence has also changed the way we use our inventory systems. Using natural language interaction, enterprises have the capability to generate reports on sales, meaning businesses can quickly identify stock concerns and replenish accordingly. Intelligence can even communicate these reports, so Data reliably reaches the next person in the supply chain, expanding capabilities for efficient operations to a level that humans physically cannot attain. It’s no surprise that when it comes to warehousing and packaging operations Artificial Intelligence can revolutionise the efficiency of current systems. With image recognition now capable of detecting which brands and logos are visible on cardboard boxes of all sizes, monitoring shelf space is now possible on a real-time basis. In turn, Artificial Intelligence is able to offer short term insights that would have previously been restricted to broad annual time frames for consumers and management alike. Forecasting Many companies manually undertake forecasting predictions using excel spreadsheets that are then subject to communication and data from other departments. Using this method, there’s ample room for human error as forecasting cannot be uniform across all regions in national or global companies. This can create impactful mistakes which have the potential to make predictions increasingly inaccurate. Using intelligent stock management systems, Machine Learning algorithms can predict when stock replenishment will be required in warehouse environments. When combined with trend prediction technology, warehouses will effectively be capable enough to almost run themselves negating the risk of human error and wasted time. Automating the forecasting process decreases cycle time, while providing early warning signals for unexpected issues, leaving businesses better prepared for most eventualities that may not have been spotted by the human eye. Big Data is continuing to transform the world of logistics, and utilising it in the best way possible is essential to meeting customer demands and exercising agile supply chain management. If you’re interested in utilising Artificial Intelligence and Machine Learning to help improve processes, Harnham may be able to help. Take a look at our latest opportunities or get in touch with one of our expert consultants to find out more. Author Bio: Alex Jones is a content creator for Kendon Packaging. Now one of Britain's leading packaging companies, Kendon Packaging has been supporting businesses nationwide since the 1930s.
29. August 2019
Our friends at Data Science Dojo have compiled a list of 101 actual Data Science interview questions that have been asked between 2016-2019 at some of the largest recruiters in the Data Science industry – Amazon, Microsoft, Facebook, Google, Netflix, Expedia, etc. Data Science is an interdisciplinary field and sits at the intersection of computer science, statistics/mathematics, and domain knowledge. To be able to perform well, one needs to have a good foundation in not one but multiple fields, and it reflects in the interview. They've divided the questions into six categories: Machine LearningData AnalysisStatistics, Probability, and MathematicsProgrammingSQLExperiential/Behavioural Questions Once you've gone through all the questions, you should have a good understanding of how well you're prepared for your next Data Science interview. Machine Learning As one will expect, Data Science interviews focus heavily on questions that help the company test your concepts, applications, and experience on machine learning. Each question included in this category has been recently asked in one or more actual Data Science interviews at companies such as Amazon, Google, Microsoft, etc. These questions will give you a good sense of what sub-topics appear more often than others. You should also pay close attention to the way these questions are phrased in an interview. Explain Logistic Regression and its assumptions.Explain Linear Regression and its assumptions.How do you split your data between training and validation?Describe Binary Classification.Explain the working of decision trees.What are different metrics to classify a dataset?What's the role of a cost function?What's the difference between convex and non-convex cost function?Why is it important to know bias-variance trade off while modeling?Why is regularisation used in machine learning models? What are the differences between L1 and L2 regularisation?What's the problem of exploding gradients in machine learning?Is it necessary to use activation functions in neural networks?In what aspects is a box plot different from a histogram?What is cross validation? Why is it used?Can you explain the concept of false positive and false negative?Explain how SVM works.While working at Facebook, you're asked to implement some new features. What type of experiment would you run to implement these features?What techniques can be used to evaluate a Machine Learning model?Why is overfitting a problem in machine learning models? What steps can you take to avoid it?Describe a way to detect anomalies in a given dataset.What are the Naive Bayes fundamentals?What is AUC - ROC Curve?What is K-means?How does the Gradient Boosting algorithm work?Explain advantages and drawbacks of Support Vector Machines (SVM).What is the difference between bagging and boosting?Before building any model, why do we need the feature selection/engineering step?How to deal with unbalanced binary classification?What is the ROC curve and the meaning of sensitivity, specificity, confusion matrix?Why is dimensionality reduction important?What are hyperparameters, how to tune them, how to test and know if they worked for the particular problem?How will you decide whether a customer will buy a product today or not given the income of the customer, location where the customer lives, profession, and gender? Define a machine learning algorithm for this.How will you inspect missing data and when are they important for your analysis?How will you design the heatmap for Uber drivers to provide recommendation on where to wait for passengers? How would you approach this?What are time series forecasting techniques?How does a logistic regression model know what the coefficients are?Explain Principle Component Analysis (PCA) and it's assumptions.Formulate Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) techniques.What are neural networks used for?40. Why is gradient checking important?Is random weight assignment better than assigning same weights to the units in the hidden layer?How to find the F1 score after a model is trained?How many topic modeling techniques do you know of? Explain them briefly.How does a neural network with one layer and one input and output compare to a logistic regression?Why Rectified Linear Unit/ReLU is a good activation function?When using the Gaussian mixture model, how do you know it's applicable?If a Product Manager says that they want to double the number of ads in Facebook's Newsfeed, how would you figure out if this is a good idea or not?What do you know about LSTM?Explain the difference between generative and discriminative algorithms.Can you explain what MapReduce is and how it works? If the model isn't perfect, how would you like to select the threshold so that the model outputs 1 or 0 for label?Are boosting algorithms better than decision trees? If yes, why?What do you think are the important factors in the algorithm Uber uses to assign rides to drivers?How does speech synthesis works? Data Analysis Machine Learning concepts are not the only area in which you'll be tested in the interview. Data pre-processing and data exploration are other areas where you can always expect a few questions. We're grouping all such questions under this category. Data Analysis is the process of evaluating data using analytical and statistical tools to discover useful insights. Once again, all these questions have been recently asked in one or more actual Data Science interviews at the companies listed above. What are the core steps of the data analysis process?How do you detect if a new observation is an outlier?Facebook wants to analyse why the "likes per user and minutes spent on a platform are increasing, but total number of users are decreasing". How can they do that?If you have a chance to add something to Facebook then how would you measure its success?If you are working at Facebook and you want to detect bogus/fake accounts. How will you go about that?What are anomaly detection methods?How do you solve for multicollinearity?How to optimise marketing spend between various marketing channels?What metrics would you use to track whether Uber's strategy of using paid advertising to acquire customers works?What are the core steps for data preprocessing before applying machine learning algorithms?How do you inspect missing data?How does caching work and how do you use it in Data Science? Statistics, Probability and Mathematics As we've already mentioned, Data Science builds its foundation on statistics and probability concepts. Having a strong foundation in statistics and probability concepts is a requirement for Data Science, and these topics are always brought up in data science interviews. Here is a list of statistics and probability questions that have been asked in actual Data Science interviews. How would you select a representative sample of search queries from 5 million queries?Discuss how to randomly select a sample from a product user population.What is the importance of Markov Chains in Data Science?How do you prove that males are on average taller than females by knowing just gender or height.What is the difference between Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP)?What does P-Value mean?Define Central Limit Theorem (CLT) and it's application?There are six marbles in a bag, one is white. You reach in the bag 100 times. After drawing a marble, it is placed back in the bag. What is the probability of drawing the white marble at least once?Explain Euclidean distance.Define variance.How will you cut a circular cake into eight equal pieces?What is the law of large numbers?How do you weigh nine marbles three times on a balance scale to select the heaviest one?You call three random friends who live in Seattle and ask each independently if it's raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of lying. All three say "yes". What's the probability it's actually raining? Explain a probability distribution that is not normal and how to apply that?You have two dice. What is the probability of getting at least one four? Also find out the probability of getting at least one four if you have n dice.Draw the curve log(x+10) Programming When you appear for a data science interview your interviewers are not expecting you to come up with a highly efficient code that takes the lowest resources on computer hardware and executes it quickly. However, they do expect you to be able to use R, Python, or SQL programming languages so that you can access the data sources and at least build prototypes for solutions. You should expect a few programming/coding questions in your data science interviews. You interviewer might want you to write a short piece of code on a whiteboard to assess how comfortable you are with coding, as well as get a feel for how many lines of codes you typically write in a given week. Here are some programming and coding questions that companies like Amazon, Google, and Microsoft have asked in their Data Science interviews. Write a function to check whether a particular word is a palindrome or not.Write a program to generate Fibonacci sequence.Explain about string parsing in R languageWrite a sorting algorithm for a numerical dataset in Python.Coding test: moving average Input 10, 20, 30, 10, ... Output: 10, 15, 20, 17.5, ...Write a Python code to return the count of words in a stringHow do you find percentile? Write the code for itWhat is the difference between - (i) Stack and Queue and (ii) Linked list and Array? Structured Query Language (SQL) Real-world data is stored in databases and it ‘travels’ via queries. If there's one language a Data Science professional must know, it's SQL - or “Structured Query Language”. SQL is widely used across all job roles in Data Science and is often a ‘deal-breaker’. SQL questions are placed early on in the hiring process and used for screening. Here are some SQL questions that top companies have asked in their Data Science interviews. How would you handle NULLs when querying a data set?How will you explain JOIN function in SQL in the simplest possible way?Select all customers who purchased at least two items on two separate days from Amazon.What is the difference between DDL, DML, and DCL?96. Why is Database Normalisation Important?What is the difference between clustered and non-clustered index? Situational/Behavioural Questions Capabilities don’t necessarily guarantee performance. It's for this reason employers ask you situational or behavioural questions in order to assess how you would perform in a given situation. In some cases, a situational or behavioural question would force you to reflect on how you behaved and performed in a past situation. A situational question can help interviewers in assessing your role in a project you might have included in your resume, can reveal whether or not you're a team player, or how you deal with pressure and failure. Situational questions are no less important than any of the technical questions, and it will always help to do some homework beforehand. Recall your experience and be prepared! Here are some situational/behavioural questions that large tech companies typically ask: What was the most challenging project you have worked on so far? Can you explain your learning outcomes?According to your judgement, does Data Science differ from Machine Learning?If you're faced with Selection Bias, how will you avoid it?How would you describe Data Science to a Business Executive? If you're looking for new Data Science role, you can find our latest opportunities here. This article was written by Tooba Mukhtar and Rahim Rasool for Data Science Jojo. It has been republished with permission. You can view the original article, which includes answers to the above questions here.
22. August 2019
By Laura Gayle, BusinessWomanGuide.org Smart technology is rapidly reshaping society. From cloud storage and mobile access to the internet of things and artificial intelligence, what was once regarded as science fiction is steadily becoming reality. In response, many employers are finding ways to modernize their workplaces by creating smart offices — not because it looks cool or is the trendy thing to do, but because they've discovered that doing so provides several tangible benefits. New devices, apps, and AI-driven tools can not only make your office smarter, they also can position you for better marketing and sales efforts and provide competitive advantages in your industry. Additionally, creating a smart office offers both employees and customers a seamless experience and can attract talented millennial workers. Since millennials naturally adapt to tech innovations, they can assist in this transition to bring your company to the next level. Here are five strategies to modernise your workplace with smart tech. 1. Use cloud-based storage Cloud-based storage offers both convenience and efficiency. Many among today's workforce probably don't even realise there was a time when offices were full of filing cabinets and computer equipment. Now that mobile access has been fully integrated into nearly all workplaces, much of the bulky equipment that previously took up space is disappearing. Filing cabinets? Paper files? Things of the past. Workers today don't typically spend hours every week filing stacks of papers because most documents are digitised and stored in the cloud. Other cloud-based technologies, such as remote workplaces and managed print services, are replacing old ways of conducting business. Cloud technology has been a significant game-changer for the office environment. Not only are files and documents stored on the cloud, but also businesses are using cloud-based platforms as a part of their services or customer experiences. Think about how much “software as a service” (a model in which software is licensed and accessed remotely instead of being downloaded on a user’s computer) has become a standardised part of doing business. This is all thanks to cloud technology. As time moves forward, expect it to continue making a significant impact on the modern workplace and customer experience. 2. Invest in voice-activated devices Various well-known gadgets found in "smart” homes or apartments are now making strides in the office setting, too. For example, voice-activated products such as Siri, Echo, Alexa, and Nest are commonly found in the workplace, adding functionality by offering a seamless user experience. Employees speak and the equipment automatically does what is asked — no more fiddling with equipment and trying to get things up and running manually. Voice-activated tech also allows workers to multitask and get things done faster, such as: Coordinating and syncing calendars Sending data requests Ordering supplies Reporting problems to the appropriate departments Streamlining IT requests These are just a handful of the many tasks voice-activated tech can perform. Businesses have steadily begun to include these types of products to make conference rooms even smarter. While this concept isn't mainstream in the office quite yet, it's not hard to image it becoming the norm within the next few years as this tech fully matures. Companies focused on modernising their workplaces are jumping on the proverbial bandwagon to get a leg up on the competition. These companies will be well ahead of the game when such tech initiatives do eventually become standard in the office. 3. Use tech to put offices in the comfort zone As modern offices evolve into open-space floor plans, they've become more informal and far more flexible. With that concept in mind, offices today are more focused on comfort — a stark contrast to the drab cubicle environments of yesteryear. Shifting to the open-design work environment has been a challenge for many; however, businesses are finding ways to make this transition easier through smarter tech. Solutions they are integrating into their spaces include: Hue lighting Virtual reality meeting rooms 360-degree video conferencing Keyless entry Smart tools not only appeal to workers because of their convenience and "coolness" factor; they also serve the practical purposes of enhancing comfort, personalising the remote experience, and even preventing repetitive stress injuries. 4. Integrate tech innovations to enhance the customer experience Businesses are investing in smart technology, and customers are reaping the benefits. As companies streamline their operations and customer service processes, customers are widely experiencing the convenience and simplicity associated with smart tech. Here are some features they currently enjoy: Chatbots for instant two-way communication AI-based customer learning opportunitiesPersonalised insights and recommendations Automation and custom ordering Cloud storage of customer information and preference history Many industries are relying on artificial intelligence to improve their services. Businesses that do not offer this level of tech to customers will soon find themselves unable to meet heightened consumer expectations. 5. Use AI to gain a competitive advantage Perhaps you don't want to go as far as microchipping your employees (yes, this is also a growing trend) or issuing them Segways, but there are a lot of other relatively new gadgets and AI-driven tools that can boost the "smartness" of your business — not to mention, amp up your competitive advantage. For instance, you can use AI to track the habits and patterns of your customer base while they spend time on your website and determine where they are in their "customer journey" with your brand. Armed with this information, you can customise their web experience, along with your communications to them. This personalisation can go a long way in your marketing efforts. After all, 80 percent of consumers say they are "more likely" to do business with a company that is able to give them a personalised experience. It's also important to know that research indicates customers want way more than basic personalisation. Using smart tech can easily help you bring things up to the next level. Smart tech adds significant value to the modern office in many ways. It's unwise to purchase tech because it's trendy, but when integrated with purpose and vision, many company decision-makers find this investment offers significant benefits and, in the end, pays off nicely. Harnham are the global leaders in Data & Analytics recruitment. Take a look at our latest roles or get in touch with one of our expert consultants to learn more.
09. January 2019
By Noam Zeigerson Noam Zeigerson is a Data & Analytics Executive and entrepreneur with over 16 years’ experience delivering Data solutions. What does the role of the CDO entail and how can we succeed? Researchers at Gartner estimate that 90 per cent of enterprises will have a ‘Chief Data Officer’ (CDO) in place by the end of 2019. It also predicts that by then only half of CDOs will have been successful. So, what does the role of the CDO entail and how can we succeed? The rise in the use of data in the enterprise to inform business decisions has led to a recent phenomenon - the Chief Data Officer. Organisations will have a CDO in place to handle the many opportunities and responsibilities that arise from industrial-scale collection and harnessing of data. Unfortunately, it is rare to be successful, due to a number of challenges. As a new role, the CDO need to be in a position to increase business efficiencies and improve risk management, especially since the General Data Protection Regulation (GDPR) came into effect in May 2018. This puts the CDO in a position where business expectations will be high, and we have to make tough and potentially unpopular decisions, because the CDO’s role sits at the crossroads of IT and business. We typically responsible for defining the data and analytics strategy at our organisation. The CDO becomes instrumental in breaking down siloed departments and data repositories, which makes information easier to find and also have ramifications for the IT team. As Gartner notes, many CDOs have faced resistance, but the successful ones are working closely with their Chief Information Officer (CIO) to lead change. To be a key part of any organisation’s digital transformation, the CDO need a wide range of skills. The skills required of a Chief Data Officer The role of the CDO is multifaceted. For this reason, CDOs need to be able to combine skills from the areas of data, IT, and business to be successful. Data skills: A background in data science is crucial. A passion for statistics and a clear understanding of how to interpret data to glean insights is core to the role of the CDO. The CDO then needs to be able to communicate what those insights mean in a business context and make information easily available to all. A knowledge of data security is also critical. In the UK, the Information Commissioner’s Office (ICO), whose job it is to enforce GDPR in the country, recommends the creation of a Data Protection Officer (DPO) at each organisation. This should fall within the remit of the CDO. The value of sharing data at a senior level is recognized by UK organisations, by and large. Further down the authority chain the picture is different, with about three-quarters of executive teams and nearly half of front-line employees actually need to have access to detailed data and analytics. The CDO needs to ensure that those who need data to further inform decision making can do so and are sufficiently trained to gain business insights from that data. IT skills: Understanding how information flows is an advantage as the CDO is well placed to recommend and implement technology to democratise and operationalise data, as well as improve security. The CDO will need to manage expectations across the enterprise, so appreciating what technology can deliver is the key. Artificial Intelligence (AI) and machine learning are going to feature heavily of UK data projects, so many CDOs need to get to grips fast with this technology. Business skills: Strategic business logic is essential to success as a CDO. If the expectation of the CDO is to influence strategy based on data, then consulting experience will be valuable. Project management skills is at the forefront of the CDO’s day-to-day role. Being able to bring siloed groups together and get them striving for the same common goal is a vital skill for any CDO. It’s clear that data analytics is only going to be deployed more heavily throughout the enterprise, so the CDO’s role is only going to become more influential and pivotal within organisations as different business units seek to gain insights to improve the business further. Making a success of the CDO role Every organisation will have different objectives and expectations of their CDO. Gartner estimates that four in every five (80 per cent) CDOs will have revenue responsibilities, meaning we will be expected to drive new value, generate opportunities, and also deliver cost savings. No pressure! Given those expectations, it’s no wonder that Gartner expects only half of CDOs to succeed. The core responsibilities of the CDO includes data governance and quality, and regulatory compliance. The CDO must also address the way that technology is deployed to address these issues. The CDO needs leadership and team building skills, as we are the chief change agent in the organisation for creating a data-driven culture. This means first-class communications skills will be valuable.The Chief Data Officer is going to be essential in delivering digital transformation. Organisations who create a CDO role must support that individual and make sure that they are integrated across departments, not isolated in a silo. The C-suite must lead from the front on this and, as we saw earlier, the support of the CIO will be critical. Harnham are the global leaders in Data & Analytics recruitment. Take a look at our latest roles or get in touch with one of our expert consultants to learn more.
08. November 2018