Today’s emerging data world needs just the Optimal service with minimal manual intervention. Simply to achieve this, Data migration is a necessity. Actually new inventions in technology like Mobile Apps and friendly UIs demands Database Systems to be compatible with new interfaces and features, and to cater this, we need to migrate to new Database Engines. The world is producing and storing more data than ever before. Heterogeneous DBMS assessments and cross platform database migrations to help customers modernize their databases and applications to leverage this amazing capability at enterprise scale is a must now a days.
Necessity is mother of invention and Data Migration is Father of Necessity :)
Most Customers planning data migration projects, perhaps modernizing the existing business processes, to save costs or become more competitive, requires retiring legacy applications and implementing new applications to support the new approach to business. May be they are integrating data from a merger or acquisition. Whatever the reason, data migration is the lynchpin of the larger initiative for the company, is investing strategy, budget, and precious time.
Why lot of databases? what to use?
Based on internet sources, Charles Bachman was the first person to develop the Integrated Data Store (IDS) which was based on network data model for which he was inaugurated with the Turing Award (The most prestigious award in the field of Computer Science the Turing Award, often referred to as the “Nobel Prize of Computing,” carries a $1 million prize, with financial support provided by Google, Inc. It is named for Alan M. Turing, the British mathematician who articulated the mathematical foundation and limits of computing.). The IDS was developed in early 1960’s.
In the late 1960’s, IBM developed the Integrated Management Systems which is the standard database system used till date in many places. It was developed based on the hierarchical database model. In 1970s, Edgar Codd developed the relational database model. Many of the database models we use till today are these model and was considered the standardized database model from then. After a decade (1980’s), IBM developed the Structured Query Language (SQL) as a part of "R project". It was declared as a standard language for the queries by ISO and ANSI. James Gray developed the Transaction Management Systems for processing transactions for which he also was felicitated the Turing Award.
For decades, since DBMS is initiated, the predominant data model that was used for Database Manageent was the relational data model used by all relational databases such as Oracle, Sybase, DB2, SQL Server, MySQL, and PostgreSQL. It wasn’t until the mid to late 2000s that other data models began to gain significant adoption and usage. To differentiate and categorize these new classes of databases and data models, based on the work load and transactions profile besides OLTP, OLAP the term “NoSQL” was coined which is used interchangeably with “nonrelational.”
Kinds of DBMS:
OLTP -> Regular RDBMS strictly comply ACID (atomicity, consistency, isolation, and durability) properties.
Most popular DBMS since 1970 till 2000s. At the time, storage was extremely expensive, so database engineers normalized their databases in order to reduce data duplication (Normalization is a process of avoid Redundancy). As most of us aware of these Databases like Oracle, Sybase, MSSQL, MySQL, PostgreSQL and DB2, simply OLTP is many small transactions in parallel. A large number of short on-line DML transactions (INSERT, UPDATE, DELETE) besides thousands of SELECTs. The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. Most DBMS during until early 2000s were RDBMS and especially OLTP databases only.
OLAP -> All
statistical and Analytical Systems especially with ColumnStore.
Redundancy and scale of data needed a different strategy and regular RDBMS is unable to provide the required throughput/response time and Columnar and Key Databases are emerged since 2010 but to the peaks in mid 2010s.
Analytics/Statistical queries seldom touch most columns of the table, but typically a single or few essential columns, but probably all or most of the Rows/Records. Keeping all the values of a particular column together, the Query processing becomes much more efficient than when the data stored in rowstore. When we need a subset of columns, we use indexes for better performance in Rowstore RDBMS but still it needs to fetch data from many rows, whereas all required entities stored together in a columnar database, results 100s of times faster results.
- Amazon RedShift, Sybase IQ and SAP HANA are most popular column store RDBMS.
Open source RDBMS
Open source databases allows users to create a DBMS based on their unique requirements and business needs besides get rid of huge licensing costs of regular RDBMS. It is free and can also be shared. The source code can be modified to match any user preference. Open source databases address the need to analyze data from a growing number of new applications at lower cost.
- MySQL remains on top as the #1 free and open source database, representing over 30% of open source database use.
- PostgreSQL came in 2nd place with 13.4% representation from open source database users, closely followed by MongoDB at 12.2% in 3rd place.
NoSQL
NoSQL databases (aka "not only SQL") are non tabular, and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. NoSQL databases allow database developers to store huge amounts of unstructured data, giving them a lot of flexibility, as storage costs rapidly decreased and software engineers cost increased, the amount of data to store and query increased to a great extent besides data entities/elements came in all shapes and sizes i.e. structured, semi-structured, and polymorphic plus defining the schema in advance became nearly impossible, NoSQL became is the optimal solution.
Type of Database | Popular Engines |
---|---|
Key–value cache | Apache Ignite, Couchbase, Coherence, eXtreme Scale, Hazelcast, Infinispan, Memcached, Redis, Velocity |
Key–value store | Azure Cosmos DB, ArangoDB, Aerospike, Couchbase, Redis |
Key–value store (eventually consistent) | Azure Cosmos DB, Oracle NoSQL Database, Dynamo, Riak, Voldemort |
Key–value store (ordered) | FoundationDB, InfinityDB, LMDB, MemcacheDB |
Tuple store | Apache River, GigaSpaces |
Object database | Objectivity/DB, Perst, ZopeDB |
Document store | Azure Cosmos DB, ArangoDB, BaseX, Clusterpoint, Couchbase, CouchDB, DocumentDB, eXist-db, IBM Domino, MarkLogic, MongoDB, Qizx, RethinkDB, Elasticsearch |
Wide Column Store | Azure Cosmos DB, Amazon DynamoDB, Bigtable, Cassandra, Google Cloud Datastore, HBase, Hypertable, Scylla |
Native multi-model database | ArangoDB, Azure Cosmos DB, OrientDB, MarkLogic |
- Document databases store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. According to DB-engines rankings/statistics, Relational Databases Oracle, MySQL, MS SQL and PostgreSQL tops the charts and MongoDB is consistently ranked in the 5th place as the world’s most popular NoSQL.
- Key-value databases are a simpler type of database where each item contains keys and values. Redis and DynanoDB are popular key-value databases. Redis is the only key-value database in top 10 consistently.
- Wide-column stores store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores.
- Graph databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Neo4j and JanusGraph are examples of graph databases.
Why Database Migrations?
As a DBA and guardians of data, we are constantly faced with the challenge of moving data from one platform to another for a multitude of reasons.
For most Customers and businesses data is become a gold mine for their critical analytics and processes and the aggressive scale up in size and usage became a crux to manage/maintain without compromising the performance. As the type of data stores, requirements, workloads and volume of the data continue to increase, an innovative solution is necessitated.
As a result many Customers are moving their databases into CLOUD and are implementing DBaaS model to rapidly deploy their databases into the cloud.
A few vital advantages of databases on cloud are:
- Optimised Costs. As cloud computing is now more cost effective it means that your IT overheads will be reduced significantly, optimising costs as it is more of a pay as you go cost structure; maintenance, software, licensing etc cost are all picked up by the cloud provider meaning your data will be in good hand and at a reduced rate.
- For clients who have data that is regularly accessed, the cloud allows you to scale the database up or down in high usage periods to deal with the extra demand.
- Faster time to market. Thanks to the Cloud you’ll be able to swiftly move your new/ existing applications straight to market; whereas offline this operation could take days or even weeks.
- Due to datacentres now being available within the Regions, organisations that must abide by certain legislations and store their data in the country of origin now have the availability to do so. Higher availability coupled with constant database back-up and recovery allows for geo-regional, secure data storage.
- Most cloud providers offer several 9s of consistency and reliability when it comes to data in the cloud. Meaning that their SLA’s are met almost every time and customers can access their data quickly and with confidence that it will be available when needed.
BTW, Along with migration of Data to a better, faster and optimal system, the ability to validate and clean existing data can be an important feature of a system. This is independent of migration. There are often mechanisms to modify data which are outside the control of the system. This can cause data to become invalid. Other data problems result from bugs in the application. Running the validation routines periodically can help identify the problem and allow the data to be cleaned before it is time for migration. As has been noted cleaning the data early can make the migration easier.
Let’s discuss various Database Migration tools/utilities and Replication technologies in the upcoming blogpost, till then, Have a safe, pleasant and productive Days ahead!
Thanks much for many authors and sites for the information I have read and used here.