A Database Management System (DBMS) is a system for the management of digital databases, stand alone or multiple, including the storage of database content, creation and maintenance of data, search, and other core data functions.
An effective Database Management System is essential for competitive use of data analytics software. To data mine their vast store of information, companies rely heavily on their DBMS. Indeed, the DBMS is a core tool in the enterprise data center.
There are different types of Database Management Systems in existence, with the most prominent being Oracle, IBM DB2 and Microsoft SQL Server. These products and many more offer access to multiple users based on different levels of privileges for different users and are controlled centrally by a single administrator. The database is managed by a database administrator, or DBA.
A DBA is in charge of maintaining the database(s), with their primary responsibility to maintain data integrity. This means the DBA has to ensure that data does not get corrupted, is backed up and immediately recoverable and is secure from unauthorized access.
Four Key Components to a DBMS
The four data technologies define any given Database Management System.
1) The modeling language, which defines the language of each database hosted in the DBMS. There are a number of approaches – hierarchal, network, relational and object – but relational is the most popular and ubiquitous.
2) Data structures, which help organize the data such as individual records, files, fields and their definitions and objects such as visual media.
3) Data query language, which handles how queries are made to the system and maintains the security of the database by monitoring data access rights of users. The overwhelming favorite is the SQL language, especially in Relational Database Management Systems.
4) Finally, the mechanism that protects transactions by protecting data integrity and making sure same record will not be modified by multiple users at the same time.
Database Management Key Issues
Your Database Admin, DBA, has several issues that are top of mind in the day-to-day operation of his or her job. They include:
1. Scalability
No one likes a slow response time from a database. Speed is of the essence. The question is do you scale up, using a very large server and database and increasing the CPUs, memory, and storage as use demands it? Or scale out, distributing the workload across multiple machines and geographic locations.
2. Cybersecurity
Data breaches inevitably involve a database. Your security has to work every time, the bad guys only need to be right once. That means constant monitoring and staying on top of patches and vulnerabilities. And educate your staff on corporate data security policies.
3. New data sources
It used to be databases were filled with the traditional data sources, like user input and transactions. But in recent years, data sources have expanded to include IOT, sensors, and social media. This data explosion has companies struggling to cope with the influx of data and rapidly expanded databases.
4. Decentralized data
Databases have traditionally been held in a single, giant repository, but there has been a trend toward decentralized storage, especially as companies expand globally. There are benefits to decentralized data management, such as redundancy and load balancing, but there are challenges as well, such as how will the data be distributed, what’s the best decentralization method, and regulatory/compliance issues.
5. Database Talent
DBAs are rare. Good DBAs are even more rare. The result is salaries for experienced DBAs can easily run into the six-figure range, and that’s if they are on the market for a job to begin with. But you can’t live without them, either, so DBAs are going to be a necessary expense.
6. Siloing
This is one of those issues that requires considerable foresight and planning. Companies start small and grow. Or perhaps they merge with another. The result is multiple databases with multiple silos of information that have no connection to each other. This means a major overhaul of the back end to erase the silos and give a full picture. A little planning in the beginning can save a lot of heavy lifting later.
7. Commercial or Open Source
Most industries consolidate over time, but there is a rich choice of open source databases on the market, including PostgreSQL, MariaDB, CockroachDB, Neo4j, MongoDB, Redis, Cassandra, SQLite and more. The main reason for the variety is they all have specialties; CockroachDB, for example, specializes in distributed clusters rather than a single monolithic database.
While free to use, open source databases aren’t as well supported as Oracle or SQL Server and talent might prove harder to come by. As the saying goes, you get what you pay for.
8. Disaster recovery
In the good old days of a decade or two ago, DBAs’ backup plans typically consisted of making regular database backups and storing them off-site. If a disaster occurred, the backup could then be used to restore databases at their last point of backup.
That doesn’t work anymore. Too much data is coming in to allow for a once-a-week backup because you could lose up to six days of data. Databases may be too large to back up conveniently or in a timely manner.
There are two options for DBAs: hefty on-premises backup, including high capacity disk arrays, high-capacity tape storage and even hybrid hardware and software solutions, or the cloud. The former has the pluses of being immediately available but the drawback of being expensive, while the latter is convenient, requires no hardware purchase but may be subject to latency. There are a number of vendors who provide data protection as a service, or disaster recovery as a service, including IBM and Microsoft.
Moving to the Cloud
Moving a database to the cloud creates multiple issues, hence it warrants its own segment separate from the other key issues. The good news is that all of the major on-premises databases – Oracle, DB2, SQL Server, MySQL, and PostgreSQL – are available from major cloud services providers, along with smaller players like MariaDB and Apache Cassandra. This makes migration from on-premises much easier.
Moving to the cloud solves several on-prem issues. For starters, you don’t have to worry about DBAs because the cloud service provider manages the database for you. The cloud service provider also handles backup, patching, load balancing, and distribution, taking a number of chores off your hands.
But then there is the issue of data movement. Everything in the cloud is metered. Moving a multi-terabyte or even petabyte database to the cloud can become very expensive. And it can get even more expensive if you have to pull that data down on-prem for whatever reason.
It can also get difficult. There have been anecdotal stories of companies having to fight their CSP to get data back. Once your data has moved to cloud, you become more dependent upon the provider and could be locked into its services, giving the CSP leverage over you in negotiating contract terms.
In moving to the cloud, don’t assume permanent residence. Make sure the contract describes the process by which your data will be returned, quantify the costs and set a time expectation for when the data is to be returned.
And once again the regulatory issue comes into play. The major CSPs all say they are HIPAA and financial regulatory compliant but you still might want to keep particularly sensitive data in-house.
The general consensus is leave data on-premises if it’s already there, and build new databases with new data in the cloud, and do all of the processing there. So move data back and forth as little as possible.
Database vs. Data Warehouse
Databases and data warehouses are sometimes confused, in no small part because of their name, but they are very different. Databases are either SQL (relational) or NoSQL (non-relational). A SQL database organizes information into rows and columns and is highly structured. A NoSQL or non-relational database uses any paradigm for storing data besides rows and columns and is ideal and popular for non-structured data.
A data warehouse is a system that aggregates, stores, and processes information in a structured format for the express purpose of analyzing business data to gain insight, a technique commonly referred to as Business Intelligence (BI).
So a standard database might hold customer information for a business but a data warehouse holds their transactions and is used to gain consumer insight; is there a particular time of day or day of the week when sales are very good or very bad? Is there a preferred method of purchase? What products have a high return rate? And so on.
So the primary purpose of a data warehouse is for companies to analyze all of their data to derive the most accurate business insights and forecasting models, whereas a database is more general storage.
Database vs. Data Lake
A data lake is a repository, yet it bears little to no resemblance to a standard database. A data lake is a large storage repository that holds a huge amount of raw data in its original format and you don’t use it until you need it. Whereas your data is neatly structured in a data warehouse, data stored in data lakes isn’t processed until you use it.
Data lakes are primarily used by data scientists and engineers rather than business users. If you think DBAs are expensive, wait until you hire a data scientist. They are used for processing very large amounts of data of all types. So they are of little use to small and medium-sized businesses.