Cloud archivos - Cloud, BigData, Analytics

Growth of non-relational data. Traditional databases are exploding.

by tecknolab | Jun 26, 2020 | BigData, Cloud, Technology

Business applications evolve very fast. The functional requirements are more sophisticated and we need to manage more non-relational data (photos, documents, images, videos …).
This need increases by several orders of magnitude the volume of information to be handled, as well as the complexity at software development and, above all, systems operations.

Data growth

Traditionally, business applications consisted mainly of some type of user interface (with some kind of more or less sophisticated forms technology) that allowed different users to interact with the environment to enter and consult data. In one way or another, data ended up in a traditional relational database (Oracle, SQL Server, DB2, Informix, …). The complexity came from the fact that, depending on the application and the company, some of these tables could have millions of rows and the query of data by very varied criteria resulted in very sensitive criteria for optimizing the queries (the famous ‘query plan’ ). In the most operational aspects, the headache was the ability to recover in critical situations (backup, replication, disaster recovery procedures, …).

That has remained so until relatively recently (a few years), when those applications were becoming more sophisticated and needed to cover other business demands. It was not enough to save all the data of our client in his client record at the application, we also had to save, for example, the contract document signed between the parties and had to be accessible from the application itself; or the mail messages, with all their attachments, that we have exchanged with the client or supplier in a certain business process. This requirement has been solved from the SW development department of the companies, typically by one of three alternative ways:

We store this data in a file service and associate it in the database through a link. This is reasonably simple but it gives quite a few management problems, and also technical ones. We have two repositories to manage (which synchronize backups, for example). If the data is sensitive, we will need specific security policies in two media (database and operating system) that are not integrated in a particularly simple way; we also complicate the transactional consistency (canceling a complete transaction in database due to a failure is simple, if we also have data in a filesystem, things get more complicated).
We store this data in a document management system or similar. In many aspects it is quite close to the previous scenario, with the advantage that a document manager system provides more and better management services, but also a greater complexity in operations and administration (we must manage, patch, upgrade two complex systems).
We store this non-relational data in the database. This is, at the level of software engineering, the simplest; unique interface (SQL), transactional consistency, a data-type (BLOB) that allows to handle objects of any nature. In many cases it is the option chosen by many customers. In particular, where the decision is directed or influenced by the software engineering team.

This last scenario leads us to the fact that these databases no longer only handle relational data, but that they must manage very high volumes of binary content. And, although the most advanced enterprise database technologies are capable of doing this, we quickly discovered that the cost of infrastructure and operation is skyrocketing. These critical environments require infrastructure of the highest quality and speed and that is paid for.

Object store. New storage paradigm

This leads us to look for alternatives and lately, with the explosion of the volume of data handled, the alternative of Object Stores for storing files and binary content has become very popular, where the Amazon AWS S3 service has become a de-facto standard. It seems sensible to move those binary contents to S3 or similar object storage with several clear and immediate benefits:

Unlimited storage with virtually no management required.
Much lower costs.
Possibility of exploiting these contents in alternative scenarios (for example, advanced analytics, machine learning) without having a direct impact on the databases of the transactional systems.
Simplification or elimination of conventional backup needs.
Possibility (depending on the technology) to apply information retention policies that facilitate compliance with data retention regulations, so that the repository itself ensures the inalterability and prevents its deletion during the defined period.

The advantages are multiple; but there are also drawbacks. The main one is the fact that these systems have access through an API that, without being very complex, forces us to change the entire data access and persistence layer in our business applications to save and access the information in this repository. And that can be a tedious job, subject to errors and with a certain risk, proportional to the level of complexity and obsolescence of our applications. And almost more important: it involves diverting the resources of engineering and software development in our company (always scarce) to solve an internal problem of IT, which does not provide a direct functional value to the business user.

Databases meet object stores

In this context, at Tecknolab we have proposed to provide a solution that allows to move binary content to the different options of object store repositories, both in public Cloud services (Amazon AWS, Microsoft Azure, Google Cloud Platform) and , for an ‘on-premise’ deployment in a local datacenter, with the main object storage manufacturer alternatives (Hitachi HCP, Dell-EMC ECS, IBM COS, among others). With this service, called DBcloudbin, the configuration at the database is immediate and, what is more important, transparent for the application; with the same software, the application continues to access the data through the database using SQL as before, but in reality the system is responsible for reading the data that has been moved to the object storage and providing it to the application as if were in the database. This gives us all the benefits of having the data centralized in the database (single access, transactional consistency) but with the savings of using a much cheaper infrastructure for those data that do not need the access speed of a relational database . For more details of the solution, visit https://www.dbcloudbin.com/solution

5 limitations that surprise in Public Cloud (and it’s not security)

by tecknolab | Dec 4, 2019 | Cloud, Strategy, Technology

The adoption of public Cloud in companies can lead to identify some surprising limitations. I propose five (and none of them is security).

There is a clear trend in companies of all sizes to move loads to the different IaaS services of Public Cloud that we find in the market, being Amazon AWS, Microsoft Azure and Google Cloud Platform the main global offers (with a clear leadership as of today from AWS).
When a company makes that decision, it will usually have done some pilots and proof of concept with the solution and, we suppose and wish, made some numbers to define the business case.

My experience with this process is that there are at least five things that surprise us because we can believe that it should be resolved or be simple, when we embark on this trip. And no, none of them is the security, everlasting ‘sin’ that has suffered historically and in my opinion little founded (in general it is safer than many alternatives of private service). These are five things that you will foreseeably clash when adopting public cloud in a company of a significant size, not an individual user or a small organization or work group:

1.- Define consumption quotas or resources to limit the use to a specific budget
In a private environment, resources are explicitly delimited. They are what they are. A great advantage of the public cloud is that the capacity is (almost) unlimited and we can grow as much as we want. But in a large company, this is a double-edged sword and it is not uncommon for our organizational model to set limits for organizations or groups that, in fact, map the budgets assigned to each of these groups. Well, that which seems obvious, is not so easy to do because the main public Cloud services do not allow to define quotas, only alarms (and in many cases can not be managed easily by the resources of each group or organization of our company). These alarms do not prevent the additional consumption of resources, they only warn.

2.- Predictability of the monthly cost
We know it, or we guess it when we embark on the Cloud; this is pay-per-use in the strictest sense of the term. In principle, once again, it’s good, we only pay for what we use. But in most companies, especially in the finance&control department, the unpredictability of a cost generates many nerves. The first thing that is required to the responsible for a certain service is a figure for forecasting the expenditure, often in the long term. If we combine this with the previous point, we see that this predictability is much lower than what we could have considered at the beginning and will possibly require economic models that are much more thoughtful and sophisticated than we would have thought at first.

3.- Transparency and ease in the allocation of costs
One of the principles of the Cloud model, is cost transparency. Yes it is. Since you are billed for what you consume it is obviously easy to receive an invoice with all the items billed and the cost that it entails. And maybe get scared with the minutes. But in a medium / large company that is not enough, one has to charge-back these costs to the different organizations and cost centers that compose it. And there the thing gets more difficult, with little intrinsic help from the tools of the service provider. It can become a real hell and a significant cost in effort and time to map those costs internally.
I remember a large long-established Spanish multinational company that came, a week later, to ask us to rescue from a scrap warehouse a server that had been decommissioned and that was actually used by ‘someone’. An organization with that level of internal control, imagine how it can suffer in a model like this.
The proof that this is a real problem is that there has been a niche market of companies that have generated SaaS solutions whose main purpose is to help customers control and manage their public cloud costs, integrating more easily the different interfaces to obtain information and report it appropriately for the client and its organizational model. And that has an additional cost, of course.

4.- Comparison of costs between Clouds
A virtual machine is a virtual machine. It is invented, it does not have big differences and it works more or less the same in the basics, no matter if we run it in a virtual infrastructure based on vSphere of vmware in our private environment or in an instance of AWS in Ohio or in an Azure node in Ireland. In addition, the typology of the service (as a IaaS service) is also very similar at least in what refers to the extent to where the provider responsibility ends and where starts ours.
All public cloud services have public and reasonably transparent pricing models (this is the Cloud …) so we can potentially do fair comparisons (supposed assured the level of performance or any other characteristic that is relevant to us) instances are similar). Price comparisons seem easy: if option A is 10% cheaper, then the global service for our company will be 10% cheaper (more or less). It’s that simple.
Nothing is further from reality. There are multiple factors that knock down this simplistic assertion but I will detail only two: (1) on the one hand the cost model is much more complex than the one of ‘virtual machine price per hour’. There are dozens of billable concepts (from IP addresses to network traffic or monitoring concepts) that are not homogeneous and can vary costs significantly (like 10 or 15%); (2) on the other hand, there are discounts for sustained use, which have three radically different models among the three main players (AWS discount by the a priori reserve of instances that is also managed by type of instance further complicating the model; Google uses a automatic discount model for permanent instances and Azure integrates it into the management of the global licensing contract that 90% of large and medium-sized companies have with Microsoft, the Enterprise Agreement).

5.- The public cloud is the cheapest option
A massive service such as AWS, Azure or Google is the one that, in principle, has all the ingredients to make the most of the economy of scale. This added to the fierce competition in this market leads to the logical scenario that the cost of a Public Cloud service must necessarily be lower than other alternatives (which will have their advantages for other factors but can hardly compete on price).
Well … not always. My own experience and others that I have had the opportunity to listen to, after reasonably sophisticated comparisons, have come to the conclusion that it may be more expensive. This point would give for an encyclopedia and in general every organization is a world and its situation and scenarios are too specific to be able to assert in a resounding and generalized way if it is cheaper or more expensive. My reflection and summary is that (1) the more mature, flexible and advanced is the application architecture and the systems operations & management in a company, the more likely it is to make a public Cloud service profitable and (2) do model reasonably and as complete as possible your scenarios and costs so that you can assess correctly if it is an advantageous option at the level of costs for your company. And yes, it is something that is complex enough to analyze so that some of us earn our living with it.