I’m Sick of Databases!
I’m Sick of Databases!
9/20/07
I’m certainly a fan of “open source” software. But some of the subtleties of it often escape me. One of the projects that I work on is a database for tracking motion sensor hits in your house like this...
That program is written in REALbasic and uses the built in database functions of RB which are a subset of SQL and has worked very well. As long as the you don’t let too much data pile up and as long as you don’t expect the main thread of the app to do anything else while the queries are running.
In order to improve the apps speed and responsiveness I decided to add an option to connect to MySQL for the database work. SQL is SQL (at least how I wrote it it was, it is possible to write SQL that wont port, but it takes a lot of effort...) So the only work necessary was the signon code to open the connection to the database. There is a MySQL plugin for realbasic which provides the MySQL login stuff and then a database object which is otherwise identical to the build in database. I didn’t waste too many hours before I had something working pretty good, and a few more hours for some code to move your current database of motion sensor data over to MySQL and back again if you wanted to do that.
After doing some beta testing with folks that was going well it came to my attention that there was an ongoing controversy of just how the open source license that MySQL uses would affect MY app. In my case I was not actually bundling MySQL, or including any of their code, just the plugin to offer me a connection to a separate install of it. Evidently even this means that I have to release the source code to my app as well. This is silly, so now any app that would otherwise have connected to MySQL cannot do so without purchasing a commercial license or open sourcing itself? I thought the whole point of having it as a separate entity on the machine was so that other programs could use it. In any case, MySQL is no longer an option and the hundreds of OSX installs of it I was going to help people do will now go undone.
I have decided to do something even better though. All the data I’ll be dealing with is sequential and structures in numerical order are very easy to search through in memory or on disk. This is all harking back to basic computer programming as actually taught to some people... If your list is in order you can find any value in only a very few hits. You could just start and the end and walk through until you find it, but that will get slower the more datapoints you have and the further your requested point is from the beginning of the dataset. But if you start in the middle and compare you cut the dataset in half, then jump to the middle of the remaining, and cut that in half and continue until you reach the value you’re looking for. This also adds steps with more data, but far far fewer. Back when I used to work for a “major retail pharmacy chain” I did things like this all the time as the hardware we were programming for was limited and often amortized over a dozen years leaving many of the computers out there far behind the Moores Law curve. But lately I’ve gotten lazy, just dump it into a database and write a query! But no more.
Each data point is 16 bytes. I could cut that by 8 bytes or so, but I want to keep the flexibility of having enough bits to store a floating point number for things like temperature data as well as just a motion hit. So the first 8 bytes are the date stored as the MacOS’s long date time value, basically the total seconds since some arbitrary date in the past. Using this as the index does mean that the resolution is limited to 1 entry every second for any given unit memory block, but I can live with that for this application. It just isn’t suitable for high frequency data collection. The second 8 bytes are reserved for the value, be it a simple 1 for a motion hit, or a floating point value for a temperature or other data like wind speed or rainfall or whatever. Say you recorded the temperature every minute for a month that would be 43200 data points, each one taking up 16 bytes makes it only 675k to cache the entire thing in memory. Thats not so bad, I waste more memory than that with screen buffers... Even if you do that for several sensors and many motion sensors (which will have a much lower overall data rate) you’re only looking at a few meg used up to store the data for the last 30 days. For motion sensors you’ll probably not want to save more than 30 days or so anyway, but temperature data you might want to keep forever to run historical graphs or something. RB lets me map a memory region to the same read/write commands that I can use on a binary file. so i can actually use the same routines to search a block of data be it cached in memory or written to disk. The disk ones will run slower because they have to physically seek the disk head to do the searching... but perhaps I could read an entire 30 days worth into memory at a time for searching if that turns out to be too slow. At this point I’m collecting data from my personal system to work with the search routines for. In real use so far I’m very happy with the memory footprint and speed of saving the data. I’m confident that the searching and report generation that I’ll be building next will be considerably faster than the same data stored and indexed in MySQL, and it will use less memory and disk space as well. Watch this space for updates when enough is done to test.
I’m Sick of Databases!
9/20/07
How X2Web is getting a new database component...