Normalization in DBMS – 1NF, 2NF, 3NF and BCNF

Here you will learn about normalization in dbms with examples.

What is Normalization in Database?

Most of the projects and websites contain lot of information. They store the data in tables. Since lot of data therefire database contains so many tables. In order to retrieve some information we must go through number of tables which leads to complex quires.

We may think that the solution to avoid these complex quires is instead of number of tables store all information in one big table and retrieve required information from it. It works fine but in this case table will contain redundant information. And also arises other anomalies. They cause problems when inserting new data, updating and deleting. i.e if we try to delete some field in database due to mixing all data along with this data, some other data may lost. Same problem when updating, other fields also may update since all are in same table. And inserting data may give, so much of redundant data.

To avoid these problems of inserting, deleting and updating data, divide the tables as small as possible. The ideal size of the table is two attributes. To make this way we should follow some rules for dividing tables. First test each table whether it’s design is good or not. If not good go for division (split the table). This process of splitting tables is called Normalization.

So, Normalization means split the tables into small tables which will contain less number of attributes in such a way that table design must not contain any problem of inserting, deleting, updating anomalies and guarantees no redundancy.

 

To do normalization we use concept of “Functional dependency” and “Candidate keys”. Using these concepts we check whether table design is good or not, if not good we go from Normalization (splitting the table).

Prerequisite

For better understanding of this concept you should know about:

  • What is functional dependency
  • How to find candidate keys
  • How to find closure
  • Basic knowledge about all keys

Without these also you can understand what is 1NF, 2NF, 3NF, BCNF. But to work with more complex problems you should know those concepts.

Our final aim is that, after normalization, whatever the functional dependency (FD) applicable on table (let FD is x -> y), the left hand side is always must be a key (here x). This final form is called BCNF

BCNF guarantees zero (0) % redundancy.

To go to BCNF first we have to check whether the table is in 1NF, 2NF, 3NF and then BCNF.

Note: Any Normal form aim is that to reduce redundancy and to avoid anomalies.

Normalization in DBMS

First Normal Form (1NF)

First Normal Form says that table is flat i.e in table there must no multi-valued and no composite attributes.

Example:

IDNameCourse
1NeerajC1
2PankajC1, C2
3PawanC3

In above case Pankaj has two courses C1 and C2, so Course is multi valued. Hence it is not in 1NF.

Below table in is in 1NF.

IDNameCourse
1NeerajC1
2PankajC1
2PankajC2
3PawanC3

We need not worry about this, because while converting ER model (diagram) to relational model (table), we follow rules that they guarantees all attributes free from multi-valued and composite.

So any relational table by default is in 1NF.

Second Normal Form (2NF)

Second Normal Form says that, if candidate key containing more than one attribute then any part of that key (called partial key) should not determine anything.

Example:

RollNumStuNameCorNumCorNameResult
1Neeraj3DBMSA
2Venkat2OSC
3Phani1FLATB

Functional dependencies are:

  1. The attribute student name functionally depends on roll number. So RollNum -> StuName
  2. The attribute course name functionally depends on course number. So CorNum –> CorName
  3. The attribute Result depends on Roll number and Course number. So RollNum, CorNum -> Result

We can find that RollNum, CorNum  combine form as candidate key.

Here we can see that a part of candidate keys are deriving other things which we called partial dependency. FD1 and FD2 are partial dependencies.

So this table is not in 2NF. To convert into 2NF find the closure where problem occurred and split the table with that result.

RollNum+ = { StuName } ;;;  CorNum+ = { CorName } ;;;

These two should be separate tables and generally with candidate key other table formed.

Resultant tables which are in 2NF:

Table 1:

RollNumStuName
1Neeraj
2Venkat
3Phani

Table 2:

CorNumCorName
3DBMS
2OS
1FLAT

Table 3:

RoNumCorNumResult
13A
22C
31B

For table 1 candidate key is RollNum, for table 2 candidate key is CorNum since these tables has a single prime attribute we can say these two are in 2NF. Coming to the third table candidate key is RollNum and CotNum combine. But on this table there is only one functional dependency is existing. So this is also in 2NF.

i.e. 2NF is based on Full Functional Dependency. No partial keys are allowed. So in 2NF we checked for partial dependency and eliminated.

Important Note: Whenever you find a part of key on left hand side of FD, don’t confirm that it is partial dependency. Check right side also if right hand side is non-prime attribute then only it is partial dependency. If right hand side also prime attribute it is not a partial dependency.

Third Normal Form (3NF)

Third normal form says that there is no “Transitive Dependency”.

We know the rule of transitivity that, If A -> B and B -> C then A -> C. We can find the transitive dependency FD’s in such a way that, “Non-prime attribute derive something”. If any FD is like this we can say that it has Transitive dependency and we need to eliminate it to make it into 3NF.

We can check 3NF in other way also, formal definition of 3NF is:

Definition: A relational schema (table) is in 3NF if and only if every non trivial FD X -> Y

Either X is a super key or Y is a prime attribute (it is part of some candidate key). If this definition follows there is no chance of transitive dependency.

Example:

Student Table

StuIDStuNameRollNumClassCodeClassName
1Mishra12CS1Lect.Hall
2Amit14CS2Lab
3Jack16CS3Theorey

Functional dependencies:

  1. StuID -> StuName, StuName, RollNum, ClassCode
  2. ClassCode -> ClassName

Here StudID is candidate key which can able to derive everything. So one and only prime attribute is StuID only. But we can see that 2nd FD i.e ClassCode -> ClassName in this ClassCode is a non-prime attribute which is deriving something. So this is not in 3NF.

To convert into 3NF find the closure where problem occurred to split the table.

ClassCode+ = { ClassName };

Resultant tables after splitting are

Student Table

StuIDStuNameRollNumClassCode
1Mishra12CS1
2Amit14CS2
3Jack16CS3

In this table StuID is candidate key and only one Functional dependency existing which is StuID -> StuName, RollNum, ClassCode. So there is no problem this is in 3NF.

Class Table

ClassCodeClassName
CS1Lect.Hall
CS2Lab
CS3Theorey

In this table ClassCode is candidate key and only one functional dependency existing which is ClassCode -> ClassName. So this table is also in 3NF

Boyce Codd Normal Form (BCNF)

To make sure zero % redundancy two scientists Boyce and Codd invented this BCNF. In BCNF result each FD determinants (left hand side attribute) must be a key.

Definition: A relational schema R is in BCNF if whenever a non-trivial FD X -> Y , X should be a super key.

Example:

IpAddPortNumProcessReq
10.4.9.3480Register Application form
10.11.4.99110Gmail message request
10.1.11.11125Remote User request

Functional dependencies exist on this table are:

  1. IpAdd, PortNum -> ProcessReq
  2. ProcReq -> PortNum

Applying normalization means converting into BCNF. For that we first check 1NF, 2NF, 3NF.

By default every relational schema is in 1NF.

Before proceeding to next normal forms, we should find candidate keys. If we find candidate keys we get { IpAdd, PortNum } and { IpAdd, ProcessReq } are candidate keys. So prime attributes (part of candidate keys) are IpAdd, PortNum, ProcessReq. As per formal definition of 3NF, if right hand side has prime attribute, it is enough to say that it is in 3NF. Since all attributes are prime attributes we can say that table is in 3NF also. If already in 3NF, no need to check 2NF. So up to 1NF, 2NF, 3NF all are fine.

Now check for BCNF. According to the definition of BCNF left hand side should be key. So FD IpAdd, PortNum -> PorcessReq . Therefore AB is a key there is no problem.

Other FD PorcessReq -> PortNum, here this FD not deriving all attributes, since it’s not deriving everything ProcessReq is not a key. We can say that it is not in BCNF.  To make it into BCNF,

ProcessReq+ = { ProcessReq, PortNum } is a separate table.

PortNumProcessReq
80Register Application form
110Gmail message request
25Remote User request

And { IpAdd, ProcReq} is other table.

IpAddProcessReq
10.4.9.34Register Application form
10.11.4.99Gmail message request
10.1.11.111Remote User request

On table PortNum, ProcessReq, Functional Dependency is ProcReq -> PortNum, here ProcessReq is key, so satisfies BCNF. And on table IpAdd, ProcessReq, { IpAdd, ProcessReq } itself a key, so it also is in BCNF. But here we lost FD, { IpAddr, PortNum } -> ProcReq so called this is a not functional dependency preserving result even it is in BCNF.

Finally this BCNF guarantees that there is no redundancy and no problem of anomalies of inserting, updating and deleting.

Comment below if you have queries or found any information incorrect in above tutorial for normalization in dbms.

How to do MySQL Server Log Maintenance

MYSQL SERVER LOG MAINTENANCE

As a part of database administration, DBA has to take care of sub-components of database like server logs and has to plan for maintenance activity for these components regularly.

MySQL has various types of log i.e binary log, error log, slow query log, general log for different purposes. And after certain time these logs will grow and you will start seeing issues like low disk space, a large number of logs etc.

MySQL allows you to flush logs using flush logs command, next “How to rotate and clean up old MySQL logs? ”

Linux has a utility called “logrotate” , using logrotate we can implement log rotation for MySQL server logs.

Binary logs: This one is critical if you have replication setup, By enabling  expire_logs_days mysql variable you can manage cleanup and flush logs cmd will rotate binary log.

For General and Slow query that’s not the case, “flush log” cmd will flush the content from memory to respective log files, but it will not rotate these logs. logrotate by default configured and managed with OS root user.On a Linux (Red Hat) installation, you can use the mysql-log-rotate script for this. If you installed MySQL from an RPM distribution, this script should have been installed automatically. It kind of sample script for full implementation, let’s create a separate mysql-log-rotate script.

Prerequisites:

USER and Privileges:

1
2
CREATE USER  'logadmin'@'localhost'IDENTIFIED BY 'xyzpwd';
GRANT RELOAD ON *.* TO 'logadmin'@'localhost';

Secure user credentials using mysql_config_editor:

1
2
3
shell> mysql_config_editor set--login-path=logadmin_client --host=localhost --user=monitor --password                                                                                    
Enter password:<enter_mysql_logadmin_user_password>

NOTE: It will store user credentials info into .mylogin.cnf (This conf file will be get created under current OS user home directory)

mysql-log-rotate script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/PATH/log/mysqld.log /PATH/log/slow-query.log /PATH/log/general-query.log {
create 640 mysql mysql
rotate 5
daily
minsize 1M
notifempty
missingok
compress
postrotate
# just if mysqld is really running
iftest-x /usr/bin/mysqladmin
/usr/bin/mysqladmin--login-path=logadmin_client ping>/dev/null
then
/usr/bin/mysqladmin--login-path=logadmin_client flush-logs
fi
endscript
}

NOTE: Above script will flush logs 3 times since we have 3 logs in one code block.To flush log only at once you can create separate rotate code block for each log and add postrotate script only in the last rotation code block.

Automation:

Just add this script into crontab entry:

1
00 03 * * * /usr/sbin/logrotate-s /PATH/log/logrotate.status /PATH/monitor/mysql-log-rotate.sh > /PATH/log/logrotate_cron.log 2>&1

Key points:

  • You can set rotation on the basis of SIZE, TIME or both. Explore logrotate option for more options.
  • -s /PATH/log/logrotate.status file will get create/update with log name and timestamp, Which will get use for next rotation on the basis of filename and timestamp it has.
  • -f, --force
    Tells logrotate to force the rotation, even if it doesn’t think
    this is necessary. Sometimes this is useful after adding new
    entries to logrotate, or if old log files have been removed by
    hand, as the new files will be created, and logging will continue correctly.

Please comment for any suggestion, concerns.

How to TRUNCATE multiple Tables in MySQL

In this post we will learn about how can we truncate multiple tables in MySQL database management system.

truncate multiple tables in MySQL

Why we cannot TRUNCATE Multiple Tables in MySQL directly:

The basic syntax of TRUNCATE is:

In this case we can truncate only one table at a time from a single database.

What if we need to TRUNCATE multiple tables from different databases in MySQL which could include more than 1000 tables? Do you want write this line by line? I guess no, so let’s find out how to do this:

Solution:

This can be achieved using metadata information of  database. INFORMATION_SCHEMA database holds this information, it contains all the information like information about all the databases that the MySQL server maintains, information of all the users that can access the MySQL server etc.

Use the following query to get the list of tables from multiple databases:

Now execute the result of this query to truncate all the required tables.

We can also execute this query in a way so that we do not need to copy paste the output of it, simple take the ouput in a file and execute it.

Now we have output of this query in output.sql, now execute this:

Note: We may get this error:

That happen if there are tables with foreign keys references to the table you are trying to drop.

For this you can read this post to know the details:

MySQL tables with Foreign Keys

For the quick answer:

Before truncating tables All you need to check  FOREIGN_KEY_CHECKS variable value

Truncate your tables and change it back to

Hope this helps, comment for any suggestion, concerns.

How to install MySQL from Scratch

In this tutorial we will learn about how to install MySQL Database server using the RPM’s on Linux operating system. As we all know MySQL’s community version is free of cost so we will simply download the RPMs from MySQL site and I’ll show you the procedure of installing it on any of your Linux OS.

Step by Step guide to install MySQL:

1.  Download the latest RPMs from MySQL site from here https://dev.mysql.com/downloads/mysql/

Install MySQL

Download the RPMs as per your CPU architecture, if your CPU is 32 bit then download the RPMs with i386 at the end or x64 if your CPU is 64 bit. If you do not know your CPU architecture just execute this command

As you can see mine is x86_64 that means I have 64 bit CPU.

Now as you can see there are so many RPMs so which one to download, just download the following one as others are not that much necessary.

  • MySQL-client-advanced-5.7.19-1.el6.x86_64.rpm
  • MySQL-server-advanced-5.7.19-1.el6.x86_64.rpm
  • MySQL-shared-advanced-5.7.19-1.el6.x86_64.rpm
  • MySQL-shared-compat-advanced-5.7.`9-1.el6.x86_64.rpm

 

2. Install the RPMs

Now as we have downloaded the RPMs we just need to install it now, to install execute to below command:

This will install the RPMs, just install it one by one.

3. Configuring MySQL:

Now as have the RPMs installed we will now configure MySQL as per our needs, now you can find the configuration file of mysql at:

Simply edit it and change whatever your requirements are, for example if you need a custom data directory instead of default (/var/lib/mysql is default) just edit the datadir variable in the my.cnf file.

If you do not know about the parameters simply leave it as it is.

 

4. Run the installation script:

Just execute below command:

The purpose of the mysql_install_db program is to initialize the data directory, including the tables in the mysql system database. It does not overwrite existing MySQL privilege tables, and it does not affect any other data.

 

5. Starting MySQL Server:

Now as we have installed MySQL, we will start the MySQL server now, to do this simply execute below commands:

We can also use these commands too:

 

6. Script for setting passwords and other basic stuff:

Now we have a running MySQL server, its time do some basic stuff like setting root password, if you want to keep some default databases for playing with stuff you will get the option. So for all this execute the below command:

This script will give you options various options like:

  • Change the root password
  • Remove the anonymous user
  • Disallow root login from remote machines
  • Remove the default sample test database

Below will be the output while executing the script:

7. Verify MySQL Installation and Login:

Now we have everything set, we can verify the current version of MySQL running using following command:

Connect to the MySQL database using the root user and make sure the connection is successful.

Now you have entered the MySQL console and you can do anything as per your requirement.

Hope you find this useful, any problems, suggestions are most welcome via comments.

 

What’s new in MySQL 8.0?

So, a good news for all MySQL fans that the world’s most famous RDBMS is releasing it’s new version MySQL 8.0. The official release date is not declared yet but this is going to be exciting as they have added some really useful new features.

 

MySQL 8.0

So, why waste any time let’s see what’s new in MySQL 8.0:

 

  • Roles – Very exciting feature that allows to create roles in MySQL server and assign specific privileges to them. These roles can be assigned to users. So from now you don’t have to remember which permissions a programmer from team X needs, and should a QA from team Y needs privilege Z. Also, it’s very easy to set up:
    • Creating a new role:
      CREATE ROLE 'app_developer', 'app_read', 'app_write';
    • Assigning privileges to roles:
      GRANT SELECT ON app_db.* TO 'app_read';
    • Assigning the role to a user:
      GRANT 'app_read' TO 'read_user1'@'localhost', 'read_user2'@'localhost';
  • Invisible index – ever wanted to just hide an index you currently don’t need, without actually dropping it? Now you can. If you’re not sure if you need an index, you can mark it as invisible and MySQL optimizer won’t use it. After monitoring for your server and queries performance, you can decide to re-activate it in case you believe it will result in a performance boost.
  • Configuration Persistence – Changing configuration during MySQL runtime is commonly done using SET GLOBAL. This disadvantage in this technique is that the changes will not survive a server restart. here comes SET PERSIST to the rescue, which does exactly that, apply configuration changes which survive a MySQL server restart.
  • Default character set and collation – Starting MySQL 8.0, the default character set will be utf8mb4 and the collation will be utf8mb4_800_ci_ai. These are great news and one step ahead towards standardized multilingual support in data driven applications.
  • UUID Enhancements – UUIDs are usually used to generate unique ids in tables. Starting this new version, MySQL can hold these values in a VARBINARY(16) column instead of CHAR(36). The impact of this change is better storage usage and performance improvement. Also, three new functions were introduced to handle these UUID values: BIN_TO_UUID(), UUID_TO_BIN(), IS_UUID().
  • Cost Model Improvements – for the first time, MySQL’s cost model will look into the memory and check if the relevant data for the query already resides in memory. As a result of this change, different query plans can be chosen. this will happen automatically without need for extra configuration.
  • Descending Indexes – MySQL now allows to create descending indexes and scan them in a reverse order, without performance penalty. This was possible in the past, but you would have to take the performance hit on that.
  • Common Table Expressions – CTE is a new feature (which is already available in other databases) that will simplify the way you write complex queries. To put it in simple words, using this feature (the WITH select) will automatically create a temporary table behind the scenes, which you can use in the same query and refer to it. This temporary table / view can only be used in that single query. This feature introduces both readability and performance improvements.

Understanding MySQL Slow Query Logs

Hello all, here we are going to discuss about one of best performance problem solving method i.e. to understand the MySQL slow query logs and take important measures to ensure peak server performance.

 

Let’s take an example of Slow Query Log:

Now let’s understand what each line indicates:

 

  • The first line shows the time the query was logged. The format is YYMMDD H:M:S. We can see the query above was logged at 2017 April 9th at 20:10 –

NOTE: This is server time, which may be different from your local time.

  • Next, we’ll see the MySQL user, database, and hostname/ip address.
  • The third line shows the total query time, the Lock time, the number of Rows “sent” or returned, and the number of rows examined during the query.
  • Below that you’ll see SET timestamp=UNIXTIME; < this is the time the query actually happened. It is good to check that you’re not looking at a problem that happened months ago if you’re trying to figure out why something is slow right now. I’ll explain how to turn that into a usable time below.
  • The final line shows the full query.

 

How to make Timestamp human-readable:

This will work on linux/unix.

 

I hope this helps let me know via comments if any information is not correct or missing.