|
16 Jan 2008 06:44 am
|
First, let me thank everyone that commented and read the last blog, “How do you store a tree in a database table?” The feedback was great and helpful for everyone including myself. Probably one of the best interactions between the readers I have seen. I just hope Roustabout and Joe Celko don’t run into each other at a SQL World 2008 and if they do they by each other a beer.
I didn’t expect as much feedback or hits as the posting got, I figured every one shelved their databases for REST/JSON services by now
Approaches that came out of the last blog
1.) Nested Sets By Joe Celko
There were lots of comments about the Nested Sets approach. It seems like a valid approach. Don’t see how it works with data that may have more than one parent. The approach is also more complicated to develop from scratch but once the wrapper is completed, it should work pretty efficiently as long as the structure is not changing very often.
http://www.sitepoint.com/article/hierarchical-data-database http://www.intelligententerprise.com/001020/celko.jhtml
http://www.amazon.com/Hierarchies-Smarties- Kaufmann-Management-Systems/dp/1558609202/ref=pd_sim_b_img_1
2.) Roustabout’s Approach
This approach was given in the comments in Part one of the blog. I have tried to decipher this from the comments, but I know this doesn’t do it justice,
Product
ID | Name | Description
————————
ProductCategory
ID | Name | Description
————————
ProductProductCategoryTree
ID | Type | ParentID | ParentType
————————————
ProductProductCategoryPathName
ID | Type | ParentID | ParentType | AncestorID | AncestorType | Sort
————————————————————————
ProductProductCategoryPathNameDesc
ID | Type | ParentID | ParentType | AncestorID | AncestorType | Sort
————————————————————————
http://www.rockstarapps.com/wordpress/?p=88 – His approach is embedded in the comment section of the article.
3.) Oracle Databases developers can use “CONNECT BY”
I am not an Oracle user, but this functionality is really nice and allows developers to select items based on the relationship of an id to parent.
SELECT * FROM tree CONNECT BY PRIOR id = parent_id
http://philip.greenspun.com/sql/trees.html http://www.oracle.com/technology/oramag/oracle/05-may/o35asktom.html
4.) Postgres ltree Module
A couple people brought up the fact that Postgres database allows developers to us their ltree module. I looked at the http://www.postgresql.org website … finally found some documentation, but the information on the ltree module was located on a different site at the link below.
http://www.sai.msu.su/~megera/postgres/gist/ltree/
http://www.postgresql.org/
5.) XML Database
Duke, said the following
“This really isn’t mean to be a facetious answer, but if you have control of which database is used then you might consider using a native XML database instead of MySQL. The free version of DB2 is the best choice.”
Snippet from the IBM website:
“pureXML’s seamless integration of XML with relational data speeds application development, improves search performance with highly optimized XML indexes, and is flexible because both SQL and XQuery can be used to query XML data.”
Click link for more information:
http://www-306.ibm.com/software/data/db2/xml/
6.) My “Pathological Tree” Approach
Won’t go into it here, it is explained in the previous post and I have Sample application and a download to go along with it.
http://www.rockstarapps.com/wordpress/?p=88
The sample application has been upgraded to include the number of SQL calls per page. Located below the product area is the list of all the SQL calls made.
7.) Some people wanted to go Web 2.0 and throwout the structure in favor of the folksonomy. Just Tag It!
MRIQUE,
“who use trees when u can tag
”
Probably not the best solution for all cases
That should wrap up all the approaches that were talked about on the previous blog entry. If there are any that were missed or want to comment on the pros and cons of each, enter it in the comment box below. I need to find or develop an in entry post commenting system that allows people to add comments after a particular section.
Summary
There were a couple things I was looking at in the previous post.
- Needed to be simple to develop
- Needed to work in MySQL. Not many DB2 or Postgres databases provided by hosting providers. Enterprises could use the Oracle, DB2 or Postgres approach.
- Have a simple storage structure.
No solution is best for all cases. Look at the above approaches and pick the solution best matches your requirements.
Bob (Buffone)
You may find this interesting reading http://www.codeproject.com/KB/database/Modeling_DAGs_on_SQL_DBs.aspx
I like this trick:
When you do the initial import, make columns for the pre- and post- order traversal #s.
Then, if you have an axis, you can use the pre-/post- # to determine siblings/children/parents. 1/1 is root.
Lifted from pathfinder-xquery.org.
[...] how do you store a tree in a database table, part 2 [...]
> Don’t see how it works with data that may have more than one parent
If it’s got more than one parent it’s not a tree. (Just saying you may want s/tree/DAG/)
I think the following solution is extremely simple, and I use it for a nested hierarchical comments system.
Basically each row has a “parentid” and a “rootid”. The latter is needed because I have lots of roots (each top post). You can select all the posts for a given number of root posts (ordered by whatever you like) by doing an inner join with the subquery that selects all the root posts (rootid=NULL) with the node table on rootid=rootnodes.postid. Again that subquery can have “LIMIT” or whatever. Anyway this gives you the child nodes, but not the roots themselves (you can modify it by including “OR rootid IS NULL” in the inner join’s “ON”, but that I couldn’t get that query fast in MySQL – it created a temporary table), so you need a second query for the roots themselves.
Recovering the trees from this is trivial really. You just keep an associative map from “node id” to “list of children”. I.e. if you index into this map with 5 you will get all the children to the node 5. Filling this in is simple enough, just loop through the returned nodes, check the “parentid” and insert the node in the list occupying that number in the map.
When you need to render the list (however you want) you can always find the children of the current node by just looking up into this map. Simple.
If you set up your indices correctly in the DB all you’ll do here are simple ranged queries and WHERE’s, which is blazingly fast. And you only need two queries to all the trees you need (including things like pagination).
A different approach can be found at http://www.utdt.edu/~mig/sql-trees. I started to write about this and left in that unfinished state a few years ago. That description was sufficient to be used for a while in openACS (see http://openacs.org/forums/message-view?message_id=16799).
I would just like to throw the modified preorder tree traversal algorithm into the mix… its pretty simple once you get a hang of it:
http://www.sitepoint.com/article/hierarchical-data-database/2
Im sorry if it has already been mentioned, but i searched both pages and nothing came up.
This is one of the web’s most interesting stories on Thu 17th Jan 2008
These are the web’s most talked about URLs on Thu 17th Jan 2008. The current winner is ..
My answer to 7:
Why not use both?
I run a site that uses tagging, but the tags are arranged in a tree. So “tagging” is no silver bullet for tree issues.
I see it now…
“How to Store a Tree in a database” – the new holy war among programmers.
I come down on the Nested Set camp =)
But like I said in the orig post… RoR comes with Nested Sets for free via a plugin! =)
First of all, may I recommend the book Database Design for Mere Mortals as it is an excellent book to learn how to make proper designs for any database. Secondly, it seems that your designs are making the database too complicated. Frankly, I don’t much understand the nested sets approach. After thinking about this problem for about 5 minutes, I came up with the following database design with four tables:
1) Categories Table:
– Category ID (Primary Key)
– Category Name
– Category Description
2) Categories Sub-set table:
– Category ID (Primary Key)
– Parent Category ID (Foreign Key)*
* Must be a valid Category ID from the Categories Table.
3) Products Table:
– Product ID (Primary Key)
– Product Name
– Product Description
4) Product Categories Table:
– Product ID (Composite Primary Key)
– Category ID (Composite Primary Key)
Table 4 is a linking table where Product ID is from the Products table and Category ID is from the Categories Table.
Notable features of this solution:
1) No nulls anywhere in the database.
2) Categories and sub-categories can be nested to an infinite depth.
3) Products can exist in more than one category without breaking the database. You would need to limit this with database rules if that’s desired.
This design could be expanded upon slightly by adding another Foreign Key on the Categories Table. This key would be called “Root Category ID” and hold the Category ID of the topmost category.
How do you store a tree in a database table? Part 2 | Deliggit.com
rockstarapps.com
Continuing the discussion on how to store a tree in a database. Part 2 organizes
hmmmm
Sorry I’m late! Just seen this post. This is such a recurrent issue that I feel compelled to add my 50p.
I’m always keen to use a parentid in the database table, then build xml files from the full table, to use as a “cache”.
I like the Nested Set model, and also the “Tree Path” model, but they are caching mechanisms, and I don’t think caching mechanisms belong directly in the same db row as the data. It just feels a bit untrustworthy, like the db table suddenly needs insert/update instructions.
I prefer to cache structure in xml with an event-based xml lib (store in files, or preferably in memcached), then use DOM to traverse the structures, and SQL to mine the related data. I’ve never had the luxury of XML databases but without them, this works for me.
Ofcourse you need to write cache clearing & re-generation methods, but that’s just part of the fun!
How about representing the tree nodes with a string? Use a predefined number of “digits” in whatever number space you want, just make sure your digits are alphabetically ordered.
ID,Title
AA,Parent
AAAA,Child of Parent
AAAB,Second child of Parent
AB,Parent 2
select * from messages where ID like ‘AA%’ order by id — AA and everything below it