Benjamin Franklin said that in this world nothing can be said to be certain, except death and taxes. We could say that although not so deterministic but quite close, the world of technology still needs to preserve and protect information with backup policies. That does not change and it’s still a very unpleasant task by the way (we only remember when there is a disaster, but to avoid it, it requires a daily cost and effort).
What has changed is the nature and especially the volume of that information (and associated data services) making it increasingly complex to perform that task. At the same time, a greater uncertainty in ensuring the capacity for recovery. This leads us to look for new models and data protection paradigms that make the task more effective, efficient, predictable … and cheaper.
Strategic, organizational and process-related limitations
From our experience, we detect in companies the following main limitations in the protection of information, related to strategic, organizational or process aspects:
- Lack of a global view of the information nature and criticality. Companies usually have reasonably complete and updated (not always!) IT system blueprint but very infrequently maps of repositories and data flows, with their nature (e.g. source or derivative, conceptual model) and their criticality (based on business value).
- Protection policies defined vertically by systems, not horizontally by process logic. Linked to the above, it is common to find that data protection policies are made vertically by the logical (or physical) system where is located, and not by the nature of the source. In many cases, the backup team is completely unaware of the nature of the data to which they are applying backup policies.
- Inability to treat differently, but in combination, data protection policies and system availability. In many cases, systems recovery plan (which allow a given information service to be restarted in case of failure or disaster) and data protection plans (which allows recovering a particular repository in case of loss, accidental deletion or data corruption) are managed independently and disconnected.
- Lack of risk analysis models and associated economic impact. Most organizations lack moderately complete risk models related to the two previously mentioned aspects. And above all they do not have it modeled with the metric that everyone understands: MONEY. This means that the data protection policies (and the return of investment made) are based on subjective criteria; in some cases, spurious.
On the other side, technology limitations derived from a market of little glamour (that of backup solutions) and a growing complexity in the requirements are:
- Predictability of recovery capabilities and metrics. In general, backup tools do a more or less decent job in running backups. But ask a backup manager in the IT ops team of a large company if he/she put his/her hand in the fire (and his/her position) for being able to recover in case of disaster, the information from a critical system with the service levels agreed upon. If the answer is yes, ask if they regularly run simulations and take metrics. Backing up is reasonably easy; restoring on failure is the hard part of the story.
- Heterogeneity of environments and backup solutions. Any medium / large company ends up having multiple systems and backup solutions, complicating the processes and increasing the overall risk. The adoption of Cloud models has only added an additional factor of complexity that many ‘traditional’ solutions do not manage efficiently.
- Scalability with the volume of information to be protected. In bigdata we use the phrase ‘data has gravity’. That is, they weigh. Moving data or making copies for protecting or recovering them does nothing but increase the technical complexity of the process with the (incessant) increase in the volume of information to be managed.
- Inability to manage depending on the specific nature of the information to be protected. In many cases, our backup policies, for technical reasons, use fixed policies when we find in the repositories different data with different nature and that should not have the same protection requirements (for example , information of high volatility with voluminous information, but with low or no change rate).
Trends in the era of massive data
The unstoppable information growth to be processed and backed-up and the advent of massive data analytics projects lead to additional pressure, which can not be solved in a linear fashion. We need radical transformations in the technological approach. Some lines of innovation that we highlight are the following:
- Intrinsic protection. What this solutions look for is substantially eliminating the need to make a backup copy with a conventional backup tool. Instead, the repository self-protects and is able to provide a solution to the different scenarios of information recovery that we can have (full recovery due to physical disaster, accidental deletion, recovery of a previous version in time, assurance of the inalterability of specific contents for legal or functional reasons, …). There are many aspects to take into account, but in general the vision of a data repository as a succession of changes over time with several geographically dispersed replicas, is the basis for addressing these models.
- Reduction of technological assets to be protected. As we mentioned in the section on technology limitations, backup policies were traditionally used to recover a specific service or repository, not only to protect the contents. So, we make a backup of our whole server, with all its elements, to ensure that we can restore it in case of problems (with all the information it stores and serves). This is profoundly inefficient. The trend in software engineering and IT ops with innovations such as virtualization a long time ago and containers more recently is that services are recreatable with a descriptor of the components, not because we have saved an exact copy ‘just in case’. It is as if to protect our ‘mobility service’ by car, we have a car that is what we use and we have another exactly the same in the garage in case the first one fails or has an accident. But we also have to pass the work of doing in that second vehicle all the small bumps, scratches, wear that the original car has, so that it is exactly the same. If we were able to describe in an automated way the process of building a car, it is much more efficient to simply have that process updated and, when our car fails, to launch the process of creating a new one; naturally in our example, ‘making’ something physical like a car is complex and expensive over time; but ‘making’ a virtual server or container (especially the latter) is not. Therefore, if we design our SW services properly, we can only back-up the data, not the systems that execute them. At Tecknolab, in our DBcloudbin service, we have reached the milestone that there is not a single physical or virtual server that is backed up; only service descriptors (instructions for automatically building the car if necessary) and data repositories, which are physically decoupled from the servers that manage them (by two completely independent and heterogeneous means, one physical and one logical providing each one a robustness added to the recovery capabilities).
- Combine capabilities in the same technical process (protection, availability and security). If we have to keep running data protection; we must continue to ensure that our services are available in case of disaster; we must also ensure that the data we handle is not corrupted accidentally or intentionally (e.g. ransomware), it seems reasonable to take advantage of a single process to meet those needs, simplifying and increasing efficiency. Some modern backup solutions allow that on the backup image we have made of a system, we can start a new version of the service in case of disaster in our primary environment (therefore taking advantage of this backup for a scenario of service availability); In addition, as we have a whole sequence of backups in each of our repositories, we can identify unexpected changes (for example, abnormal rate of change in the data in that repository based on a historical trend analysis) and warn of this as a potential virus attack. There are already manufacturers that provide this combination of functionalities in the market.
In summary, we can not sustain the model of ‘business as usual’ with the scenario of geometric scaling of information volume to be managed. This is especially true in the field of data protection. We must change the processes, the way of doing things and the solutions to protect what is (and increasingly) the most important asset of our companies: information.
The adoption of public Cloud in companies can lead to identify some surprising limitations. I propose five (and none of them is security).
There is a clear trend in companies of all sizes to move loads to the different IaaS services of Public Cloud that we find in the market, being Amazon AWS, Microsoft Azure and Google Cloud Platform the main global offers (with a clear leadership as of today from AWS).
When a company makes that decision, it will usually have done some pilots and proof of concept with the solution and, we suppose and wish, made some numbers to define the business case.
My experience with this process is that there are at least five things that surprise us because we can believe that it should be resolved or be simple, when we embark on this trip. And no, none of them is the security, everlasting ‘sin’ that has suffered historically and in my opinion little founded (in general it is safer than many alternatives of private service). These are five things that you will foreseeably clash when adopting public cloud in a company of a significant size, not an individual user or a small organization or work group:
1.- Define consumption quotas or resources to limit the use to a specific budget
In a private environment, resources are explicitly delimited. They are what they are. A great advantage of the public cloud is that the capacity is (almost) unlimited and we can grow as much as we want. But in a large company, this is a double-edged sword and it is not uncommon for our organizational model to set limits for organizations or groups that, in fact, map the budgets assigned to each of these groups. Well, that which seems obvious, is not so easy to do because the main public Cloud services do not allow to define quotas, only alarms (and in many cases can not be managed easily by the resources of each group or organization of our company). These alarms do not prevent the additional consumption of resources, they only warn.
2.- Predictability of the monthly cost
We know it, or we guess it when we embark on the Cloud; this is pay-per-use in the strictest sense of the term. In principle, once again, it’s good, we only pay for what we use. But in most companies, especially in the finance&control department, the unpredictability of a cost generates many nerves. The first thing that is required to the responsible for a certain service is a figure for forecasting the expenditure, often in the long term. If we combine this with the previous point, we see that this predictability is much lower than what we could have considered at the beginning and will possibly require economic models that are much more thoughtful and sophisticated than we would have thought at first.
3.- Transparency and ease in the allocation of costs
One of the principles of the Cloud model, is cost transparency. Yes it is. Since you are billed for what you consume it is obviously easy to receive an invoice with all the items billed and the cost that it entails. And maybe get scared with the minutes. But in a medium / large company that is not enough, one has to charge-back these costs to the different organizations and cost centers that compose it. And there the thing gets more difficult, with little intrinsic help from the tools of the service provider. It can become a real hell and a significant cost in effort and time to map those costs internally.
I remember a large long-established Spanish multinational company that came, a week later, to ask us to rescue from a scrap warehouse a server that had been decommissioned and that was actually used by ‘someone’. An organization with that level of internal control, imagine how it can suffer in a model like this.
The proof that this is a real problem is that there has been a niche market of companies that have generated SaaS solutions whose main purpose is to help customers control and manage their public cloud costs, integrating more easily the different interfaces to obtain information and report it appropriately for the client and its organizational model. And that has an additional cost, of course.
4.- Comparison of costs between Clouds
A virtual machine is a virtual machine. It is invented, it does not have big differences and it works more or less the same in the basics, no matter if we run it in a virtual infrastructure based on vSphere of vmware in our private environment or in an instance of AWS in Ohio or in an Azure node in Ireland. In addition, the typology of the service (as a IaaS service) is also very similar at least in what refers to the extent to where the provider responsibility ends and where starts ours.
All public cloud services have public and reasonably transparent pricing models (this is the Cloud …) so we can potentially do fair comparisons (supposed assured the level of performance or any other characteristic that is relevant to us) instances are similar). Price comparisons seem easy: if option A is 10% cheaper, then the global service for our company will be 10% cheaper (more or less). It’s that simple.
Nothing is further from reality. There are multiple factors that knock down this simplistic assertion but I will detail only two: (1) on the one hand the cost model is much more complex than the one of ‘virtual machine price per hour’. There are dozens of billable concepts (from IP addresses to network traffic or monitoring concepts) that are not homogeneous and can vary costs significantly (like 10 or 15%); (2) on the other hand, there are discounts for sustained use, which have three radically different models among the three main players (AWS discount by the a priori reserve of instances that is also managed by type of instance further complicating the model; Google uses a automatic discount model for permanent instances and Azure integrates it into the management of the global licensing contract that 90% of large and medium-sized companies have with Microsoft, the Enterprise Agreement).
5.- The public cloud is the cheapest option
A massive service such as AWS, Azure or Google is the one that, in principle, has all the ingredients to make the most of the economy of scale. This added to the fierce competition in this market leads to the logical scenario that the cost of a Public Cloud service must necessarily be lower than other alternatives (which will have their advantages for other factors but can hardly compete on price).
Well … not always. My own experience and others that I have had the opportunity to listen to, after reasonably sophisticated comparisons, have come to the conclusion that it may be more expensive. This point would give for an encyclopedia and in general every organization is a world and its situation and scenarios are too specific to be able to assert in a resounding and generalized way if it is cheaper or more expensive. My reflection and summary is that (1) the more mature, flexible and advanced is the application architecture and the systems operations & management in a company, the more likely it is to make a public Cloud service profitable and (2) do model reasonably and as complete as possible your scenarios and costs so that you can assess correctly if it is an advantageous option at the level of costs for your company. And yes, it is something that is complex enough to analyze so that some of us earn our living with it.
A bigdata initiative must start with a bigdata strategy. We comment on the recommended approach.
Today, a large number of companies from all sectors and sizes (although especially the largest ones) are launching bigdata initiatives, partly due to the pressure of competition and new business challenges and, partly, why to hide it, for a certain ‘fashion’ or pressure of the environment (if my competition is getting into this, I too, I will not be less …).
The reality, surprising as it may seem, is that as recent IDC studies show, most companies are starting or planning the start of bigdata initiatives in the short term but do not know where to start.
In this context, in many cases the results are tragic because they make big mistakes that costs a lot of money. One of the most common is to start with the technological component: “We set up a DataLake”. The result, as I say, is tragic, partly due to the fact that the commercial wizards of the sector have created a series of myths and legends that have been internalized.
Myth 1: New technologies based on Hadoop are cheap and are deployed in a jiffy.
This is substantially false (or only partially correct and an oversimplification) that leads to throwing us in an enthusiastic technical race without an armed strategy and, of course, without a business case and moderately robust economic model. Basically we use the false mental scheme, curiously usual in many top executives of bulky payrolls that if it is much cheaper than my usual technologies (typically a DWH in this case) we should save money, no matter how we do it. And in this context, Murphy’s Law always comes out triumphant and the result in most cases is a lot of budget consumed with nothing to take to the mouth of ‘real’ result.
Initial Recommendation: Strategy and economic model first. Processes and organization later. Technology, at the end. Experiment, prioritize and monitor (adapt and adjust your model).
Actually they are concepts that I have been using for many years in transformation consultancy of any technological area, but it is more applicable than ever to this field.
A great weakness, particularly of the Latin countries, is our proverbial animosity for the strategy. We are action. Planning is about cowards. The adaptability and improvisation of the Latin character is a great asset in my opinion but always adequately integrated into a robust strategic planning.
In this sense, a first mistake is to confuse ‘data-driven’ business (driven by data) with bigdata technology. One thing does not necessarily imply the other. We must identify our data-driven business scenarios and how we are going to execute them. We will only adopt new bigdata technologies if we have a clear justification for it; and we will model it (I do not mean technically, but with a business case, I will talk about it in more detail in later posts of this blog).
Our approach to the bigdata strategy is a bidirectional model (top-down and bottom-up). The business top-down model will be in charge of identifying those ‘data-driven’ business scenarios, modeling it economically (business value) and data requirements (what data and potential analytical models I need to implement it). The bottom-up model is to model & cataloging what data sources (potential or real) we have in our business processes or we need to solve our business use cases. The intersection of both will give us the feasibility analysis and a first cost modeling. At this point we can make decisions and translate it into a strategic plan; this is, substantially, which scenarios we tackle first (the cheapest ones among those with the most impact); how are we going to monitor progress (what are my KPIs?) and how we are going to feed our model with reality as we execute it, allow us to adjust our expectations. And have fun!