home forums resources search newsjoinmembers: 6448
Hiveminds Network PHP Flash Java Ruby Windows Linux
Hiveminds | Fri, 2006-04-21 06:49  tags: ,

Database Design

Okay, I have to say it. Because I have spent another week trying to make sense of the mess. Drupal's database design is stupid! it sucks! it is crap! To things clearer I am talking about design errors. Yes, they do exist. Some say that design errors are the basic cause of bugs in a system. They are right.

Rather than go through a bunch of queries or rather hundreds of queries, I will just say that the database design has no relational qualities that would optimize and speed up the system what so ever. This a glaring whole in the knowledge of the Development team so if you are a joiner and want to contribute to the project then think about helping them out with the database design first.

Please start in on the taxonomy schema. It is scary. Textbook type errors that could be used for examples of what not to do in any database course. I took some time to compare the database to that of ezPublish. I don't know why. Maybe because it was the only othe CMS I had installed. Congratulations to ezPublish! Finally something I can say they have done with excellence and may bring me over to using the CMS. The database design is one of the best I have seen. Normalized, efficent SQL that looks like it would not pull more than 50 queries at most. Very nice. I may even open up that $30 book on ezPublish that I have been thinking of using to soak up that pool of oil from my parking area with.

The the type of errors found run the gamut. Some are comical and make me giggle while others are serious in that they cascade through the system and make me want to cry. Most are the fault of the ease of use of MySQL. MySQL does not require that you be a trained database administrator or designer to get the job done. MySQL also does not complain much unlike Postgresql or MS SQL Server ( this is where I used to live ) which tell you to get lost if things aren't right. Only recently have CMS developers been forced by performance bottlenecks to look into learning more about optimizing SQL queries and designing schema that are naturally relational rather forcing relationships with complicated SQL statements on the fly. It is the on-the-fly SQL that makes the Drupal db design so weird. Rather than submitting a patch that would change the database schema most contributor's write on-the-fly mechanisms and try and gain efficiency by racing queries.

What's that? Hmm sounds like the beat of drums the native are restless again. The drums are saying "If you are so smart then ...." Well to be honest I have only found two queries that just are not needed. Somehow they got added because there was a missing foreign key and probably some hurried logic used. I am still in the midst of doing the access system over so if it does not relate to that then I have to leave it marked and go back to it later.

One thing that I am going to publish is changes in the database that stop query fluctuations. These are a pain and can be seen by using the Drupal devel.module. For some reason two identical calls for pages (including access control) will have two different number of queries called. This might be a indexing problem, but I suspect it is more of one set of queries winning the race at different times. Perhaps it is some redundacy also. I see that some calls to functions like variable_get() are being double checked with extra queries in the code.

<?php

function _forum_get_vid() {
  
$vid variable_get('forum_nav_vocabulary''');
  if (empty(
$vid)) {
    
// Check to see if a forum vocabulary exists
    
$vid db_result(db_query("SELECT vid FROM {vocabulary} WHERE module='%s'"'forum'));
    if (!
$vid) {
      
$vocabulary taxonomy_save_vocabulary(array('name' => 'Forums''multiple' => 0'required' => 1'hierarchy' => 1'relations' => 0'module' => 'taxonomy''nodes' => array('forum')));
      
$vid $vocabulary['vid'];
    }
    
variable_set('forum_nav_vocabulary'$vid);
  }
  return 
$vid;
}
?>

The above code just makes me laugh. There are four queries involved ( two hidden in variable_set()) to return a vocabulary id. Not a list or a complicated result set but a single id number. Somewhere something is very wrong. This is made especially clear by the fact that there are two hardcoded strings in the code that are the same strings in the row being called from the database. Hardcoding means there is a known value. If you already know the value then there is no reason to ask the database more than once for corresponding information.

While you are trying to take in the view above think a bit about the node type column shown. While the number of node types may be small and never grow to more than a dozen the vocabularies column could grow into the thousands. Doing a string comparison to find what node type is associated with a thousand vocabularies is not the best way. Why not a seperate list of types? The node table has the same string comparison design and while many may not use vocabularies even the simpliest of websites may have 8,000 nodes. This is where 'DISTINCT' starts being over used. Again there should be a list table for node types.

If you are wondering why all the noise about this. Well, it's because the front page of a test site and another suddenly became slow after using a mySQL script to put up a finished vocabulary for the classified ads module. The classifieds taxonomy is a standard used by newspapers and other media. When I started creating different node types for different types of ads I thought this would speed things up because each node type would have a finished query associated with it. But instead the entire thing slowed to a crawl. So it's back to the drawing board for that idea. That's all I have time for now.

Remember, beat the drum not the messenger ;)

Hiveminds's picture
This article brought to you by the Hiveminds Magazine - Staff. Contact us if you want to post an article or announcement anonymously
 
a Visitor posted on: Sun, 2007-05-13 13:59.

it isn't just Drupal, but Drupal is certainly a good example of the opensource methodology gone wrong; I think you have it exactly right that they introduced these hacks because it was politically too much trouble to change the database, and because the original data model was never designed to be modular and flexible. The same is true of the stylesheets, although much progress has been done on that front, and the theme engine.

However, the HARD question is not what is wrong with it -- you only need to use the debug-version of the db-query function to see that -- the really hard question is "How do we fix it?"

The answer to that question underlines another problem with the ad-hoc design so prevalent in software, both commercial and open source: Programmers never know when to quit.

At some point, you have to say enough is enough, close the chapter, learn from the past and move on. But we don't. We add another hack, add another feature, support another device, heaping mistake upon mistake, until the legacy weight of the thing sets up a programmer inertia, an increasing of the complexity mass into a black-hole from which no developers can escape.

What is most sad is that this was not the philosophy of the original computer scientists, it is a perversion brought about by the commercial interests who needed to be all things to all people under a single brand name in order to please shareholders. Why, for example, did Drupal need to re-invent the forum when BB was already mature and proven code? Why re-invent shopping carts, classifieds, calendars, spam filters? Did they have some genius brilliant new revolutionary design that obsolesced all prior art in those domains? No.

Instead they built half-assed almost-solutions and clumped them to the side of the project using cardboard and duct-tape.

But think about that ... if this is "open source software", if the purpose and meaning of open source is to foster a sharing and community construction then WHY NOT design a system to simply INTEGRATE solutions? Why not work TOGETHER? The Java-based JSR standard for portlets tried to do just that, yet did any of these "open source bazaar" programmers jump on board and blow our minds with it? No.

The foundational philosophy of computer science was "small tools, new combinations" exemplified in the *nix operating systems where small simple actions are done by small simple programs that can be infinitely reconnected and recombined to produce novel results. Monolithic applications belonged in the days of SAS-assembler and COBOL. The modern computing paradigm was of cooperative components precisely because we observed how the monolithic application always exceeded the authors' capacity for complexity. Always.

So back to Drupal, what can we do? I recall Dykstra's advice to programmers: "Write two programs, throw the first one away."

a Visitor posted on: Sat, 2007-05-26 13:49.

Ok first of all the Drupal DB system is complicated perhaps too much for you to understand. I have been a developer for almost 8 years now. I have been involved in projects ranging from web sites to AI and I can't say that I fully understand the Drupal DB abstraction layer and all of the things going on in the system.

I don't think that you will either. Not after a WHOLE WEEK, perhaps TWO! Give me a break! Noooo, unless you just happen to be smarter than a whole room full of Albert Einsteins, I don't think you'll get it. Perhaps you never will.

Drupal was chosen by MTV, IBM, Sony, Yahoo, and a plethora of other highly mission critical companies and organizations to help them accomplish very useful tasks. These guys may not even be able to fully understand the system. Surely not in a week or two... Duh...

You sir, have a fairly elevated idea of your cranial capacities if you think you are going to sit down and understand the Drupal DBasing system in no time flat. Would you understand the core principals of particle physics after a couple of weeks. No sir! Drupal has been developed by some of the brightest minds around and I am sure that they know what they are doing much better than you know what they are doing.

That said, I think that there is a great probability that there are some bugs. Almost no software is without some bugs or instances where a bug may become present. Semantic errors will sometimes crop up unless you are dealing with the most simplistic of programs.

At any rate, evaluate the system for a few years, then make such assessments. Not before.

Regards, Derek Webb

a Visitor posted on: Mon, 2008-08-25 17:20.

To Derek Webb: damn bro, you make DB design and understanding as complicated as a rocket science. Dude, it's not that complicated. Anyone who had done DB rational design or algorithm in college/university, they can look at a database design and point out what's wrong and whats not. Most design can go through simple algebra and normalization process. Sometimes DB designers ignores them for flexibility and for other issues. Not a smart move, but has been practiced by many big companies, and drupal is an example.

I don't understand why someone with good Db designer has to look at the same db structure for few years to make comment. you are hilarious.

Bitrix Site manager - fast to create, easy to manage CMS Comparison Matrix
Put Your Site Here Developer Links
PHP Wordpress Content Management Systems Silverlight Silverlight Web Developers
 

Newsletter

Get updates on Hiveminds services, articles and downloads by signing up for the newsletter.

Editor's choice

Some of the better articles, stories and tutorials found at Hiveminds.

Find more

Find more of Hiveminds articles, stories, tutorials and user comments by searching.




Picked links

Hand picked websites and articles from around the web that provide quality reading.