Java, MySQL & Ubuntu (& Relational Database Sizes & Performance)

This may be quite a lengthy post. I’ll come back to the top after I’ve written it and try and provide some context as to what is covered (and where…).


1. MySQL Installation
2.Using MySQL – database/table creation
3. Connection to Db Via Java
3.1 Insert command via Java
4. Finding Physical Table Size Command (MySQL)
5. Physical Size Comparison
6. Query Performance Comparison
7. Considerations
8. References


There is a database. Its structure is very small – about 4 tables, of which I’m particularly interested in 1 – resource_access. This table is fairly straight forward containing about 9 fields, 7 of them are Strings, 1 int and 1 date/time. Its quite significant in size though – containing close to 100,000,000 rows, and generally I’ve found the performance of running queries to be quite poor.

So I decided to investigate. The structure could be changed to remove quite a bit of duplication in the table, by introducing a couple of other tables, and the use of foreign keys. I wanted to see what performance gains could be made, and also if there were any reductions in size (as in physically disk space).

And so it begins.

1. MySQL Installation (on Ubuntu 9.10)

Very straight-forward! Top marks Ubuntu. In a terminal issue the following command:

sudo apt-get install mysql-server

1.1 MySQL Administrator GUI

A very handy tool, recommended by someone in ref [1]. Just go to Add/Remove Programs and search for MySQL and install ‘MySQL Administrator’.

2. Using MySQL

Start the MySQL Admin program, and then from the Tools Menu select MySQL Command Line.

Create a test database ‘currentstructure’ by issuing the following command:

create database currentstructure;

Select to use that database via:

use currentstructure;

Then create our table:

create table testtable (
field1 varchar(50),
field2 varchar(50),
access datetime,
field3 varchar(50),
num int

We’ve now go our current database structure – obviously my fields are given proper names, just wanting to hide some field names in this post.

I also spent the time to create a second database, containing 3 tables – namely resource_access, user and entity. The resource_access referenced the user information via a foreign key, likewise the entity table.

3. Connecting via Java

Rather than populating this data by hand (I was aiming to put in 6,000,000 rows of data) I decided I wanted to randomly generate it via Java. So I created a Java application in NetBeans.

Connection con = null;

try {
   con = DriverManager.getConnection("jdbc:mysql://localhost:3306/currentstructure", username, password);

      System.out.println("Successfully connected.");
      System.out.println("connection appears to be closed :o(");

} catch(Exception e) {
   System.err.println("Exception: " + e.getMessage());
} finally {
   try {
      if(con != null)
   } catch(SQLException e) {}

In order to get this code to work, I needed to download the MySQL Connector jar file and reference it in my project. I went round the houses with this, but in Ubuntu the easiest way to do this is via the Synaptic Package Manager and search for MySQL – in there will be the connector. The jar file then resides in /usr/share/java directory.

Also in order to get the connection string, I used the NetBeans Services window, created a connection to the MySQL database – this displayed the string I required.

3.1 Populating Data via Java

To insert random data I just used a random number generator and fixed this onto the end of some constant strings. The code for performing the insert is here:

for (int i = 0; i < 250000; i++) {
   String entity = "ENTITY" + rand.nextInt(2000);
   String username = "test" + rand.nextInt(1000);
   String todo = "INSERT into fileaccess " + "(field1, field2, username, field3) "    + "values ('09', 'a', username, '" + entity + ", NOW())";

   java.sql.Statement s = con.createStatement();

Note – very similar code was executed to populate the relational database structure – populating the users table with 1000 users, and the entity table with 10000 entities. The pk1 from these were then randomly selected using the number generating to populate the data in the resourceaccess table.

4. Finding out physical size of table (in MySQL)

In order to find out the physical size of a table in MySQL execute the following code:

show table status like '%';

5. Physical Size Comparison

It turned out that the current structure uses about 80Mb of data per 1,000,000 rows.
The relational database structure uses about 50Mb of data per 1,000,000 rows.

6. Query Performance Comparison

Running the same query across the two database structures, it took 11 seconds to return results for 3,000,000 rows in the current structure, and 21 seconds when 6,000,000 rows were present.

In the relational structure, for 3,000,000 rows it took 0.5 second, and for 6,000,000 just over 1 second.

7. Considerations

* Running locally, doesn’t count for network issues.
* The random nature meant different number of rows were returned from the different structures – although not massively different in this sample.

8. References

[1] Ubuntu MySQL installation:
[2] Java MySQL Connection Code:

9. Late Update

I’ve since discovered an article about DBMonster which can be used to populate databases. The article is here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s