In IT terms, the concept of Big Data has been around for a very long time. But when it was first coined in the ancient past (around 2005) no one had any real idea of just how much raw data we’d end up talking about. A report by Statista published in June 2021 states that in 2020, the total amount of data created, captured, copied, and consumed globally reached 64.2 zettabytes, and predicts this figure will exceed 180 zettabytes by 2025 (one zettabyte being equal to a billion terabytes).
But as we all know, data is only as useful as the insights you can extract from it. Within organisations, data is often stored in different forms and locations, so lacks the context and meaning required for decision-making. To make sense of it, you need to connect data sources and processing services, then move the data to a central location where it can be extracted, transformed and loaded back into platforms – the familiar ETL process.
Until fairly recently, this meant custom-building integrations solutions. Then in 2016, Microsoft launched its cloud-based data integration service, Azure Data Factory (ADF). This allowed data engineers to create ‘pipelines’, time-sliced on a specified schedule, for connecting and collecting data, transforming and enriching it, then publishing it in the required form.
The first version of ADF was widely known in the industry – and quietly acknowledged as such by Microsoft itself – to be a ‘first run’ at the concept. Many organisations were understandably hesitant about adopting it, knowing (or at least assuming) it would be superseded. And now, it has been.
ADF version 2 isn’t just an update of v1: it’s a totally different animal.
We’ve been working with ADF since its inception and v2 does everything its predecessor does – just much, much better. In our view, these are the key benefits and, if you’ve been holding back, why it could now be the right solution for your organisation.
In its latest iteration (V2) ADF is a fully cloud-based ETL and Data Integration service, ideal for users and businesses working with big data in a multi-platform environment. It’s highly scalable and cost-effective – and being code-free within Microsoft’s public cloud, it’s simple to set up and use. Like V1, it’s based around the central principle of Pipelines; units of ‘work’ that can be logically grouped together into Activities and then executed by scheduled or event-based Triggers. Both inputs and outputs are referred to as Datasets – ‘containers’ of data which can be traditional databases or data warehouses. Data Flow, Data Movement and Activity Dispatch form the integration infrastructure, acting as a ‘bridge’ between Activities and linked services.
ADF really comes into its own in businesses producing individual datasets from different, discrete platforms across a number of departments. Extracting data from these multiple platforms, Transforming it into a meaningful form and then Loading it onto a platform can provide far more powerful, complete and cohesive analytics and insights, increasing efficiency and driving growth.
We’ve worked with both cloud-native clients and those making the transition to cloud-based working to deploy and realise the benefits of ADF v2, at all scales and across a wide range of sectors. To find out more and see how ADF could work for you, please get in touch.