|
01 Jan 2008 09:12 am
|
Every once and a while I will need to need to store a hierarchical structure in a database table. Recently, I needed to do this for a Facebook.com application I am building. In this particular application, I needed to categorize pieces of content in a hierarchical form, which a users can access by drilling down one level at a time. This blog details the solution I implemented on how to store the data and access it. Aren’t there are other articles on this? Yes, but their approach is typically very trivial and has issues when building out the application.
Included at the bottom of this blog entry is a zip of a sample application that demonstrates how to build product explorer using the technique described.
The problem statement!
Below is the hierarchy of data that users can navigate through. Its a standard data structure that one would see in applications like; shopping carts, employee databases, and content management system.
- Books
- Non-fiction
- Politics
- Sports
- Non-fiction
- Electronics
- TV
- Speakers
- Computers
- Cars
- Dodge
- Magnum
- Nitro
- Chevy
- Dodge
Each category can have sub categories and/or products, take for instance the sample above; “Books” has a sub category called “Non-fiction” which has two sub categories. Not shown in the data above is that each category has a set products it is holding, for instance:
- Politics
- “Politics for Dummies”
- “Student Atlas of World Politics”
- Magnum
- 2007 STX mint condition
How do I store the information in a database table that has just rows and columns?
- I need to be able to perform the following operations; insert, update, delete, and select.
- The solution should be easy to create. Little code that is easy to understand.
- The solution needs to be easily stored in a database.
- Selection of items needs to be possible with SQL.
- Inserting and updating a record should affect no other data in the database.
- Delete should only affect the deleted item and its descendants.
A non optimal approach?
I have read a few articles on approaches to solve this problem and each one has a number of drawbacks that makes their approach hard to deal with when actually building out the application.
One approach is to use the name of the parent as the column in the database.
| parent | name |
|---|---|
| Books | |
| Electronics | |
| Cars | |
| Cars | Dodge |
| Books | Non-fiction |
| … | |
This seems simple, logical and will work but once you start to build the application you will notice a major short coming. You loose any ancestorial information other than the parent. This may not seem like a big deal – I know my parent, so I can find their parent. True. The real problem comes in the complexity of the code you are going to need; recursion or other to trace one’s roots.
A couple questions you way name want to know quickly and easily that make ancestorial information important. How many products do me or my descendants have? Where would you use this? Display the number of products contained below a category “Books (2)” based on the information above.
Another issues is what happens if the name of the category changes; “Cars” now becomes “Automobiles” because marketing thinks it sounds more sophisticated – meaning – intelligently appealing – I looked it up. This can easily be resolved by using the id of the parent, or updating all items that have a certain parent when the name changes.
| id | parent | name |
|---|---|---|
| 1 | Books | |
| 2 | Electronics | |
| 3 | Cars | |
| 4 | 3 | Dodge |
| 5 | 1 | Non-fiction |
| … | ||
Above is what the data would look like using the id of the parent in the parent column. This solves the name change issue but still has the ancestry problem of the above solution.
There are also other solutions that store the location in the tree, who is to the left and to the right of me. But this solution again is difficult to manage and code against when developing an application.
These solutions fail a few of the solutions requirements.
- Fails – The solution should be easy to create. Little code that is easy to understand. You will need to use recursion which is slow and can be complicated.
- Fails – Inserting and updating a record should affect no other data in the database. When using the left and right approach you will need to adjust the data for other rows of the database outside the affected rows.
- Fails – Delete should only affect the deleted row and its descendants. As above you will need to adjust the left and right when changing the structure of the tree
Resources on this approach:
http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
http://www.sitepoint.com/article/hierarchical-data-database
So what is the best way to do this?
In a past life, I developed network management software, which had many complicated problems that need solutions. TCP, UDP, SMNP, MIB were all acronyms that you needed to know in order to get the job done.
Most of the applications displayed data queried from a network device. Accessing the information was done by requesting data based on an OID. OIDs are simply a string in dot-notation for a specific piece of information. To pull the “device name” you would ask for OID “1.3.6.1.2.1.1.5″.
So what does this have to do with this article? Everything. The OID was the way to flatten the data which is hierarchical. How would you use this in our category sample? The picture below shows the data displayed with the parent being the “OID” of the parent node.
The root items “Books”, “Cars”, and “Electronics” have a parent which is empty to denote they are a root. All other items have parent value equal to id of all parent’s separated by a “.”.
Lets see how this solution works against the list of requirements:
- I need to be able to perform the following operations; insert, update, delete, and select.
- Included in the sample is code to perform all of the functionality that is needed to; insert, update, delete and select the data.
- The solution should be easy to create. Little code that is easy to understand.
- Solution is very easy; each operation is no more than two database operations and contains no recursion or rebuilding of the database.
- The solution needs to be easily stored in a database.
- The parent information is stored in a single column in the database as alphanumeric characters.
- Selection of items needs to be possible with SQL.
- Selecting a category or product is done with a simple equality operator. Selection of descendants is done in MySQL with the “LIKE” operator.
- Inserting and updating a record should affect no other data in the database.
- In the sample, look at the “insert.php” and “update.php” to see how to perform this functions.
- Delete should only affect the deleted row and its descendants.
- In the sample, look at the “delete.php” file to see how to delete both a category and a leaf (product) from the database
A couple questions that you might have:
Is this hard to work with?
- Actually it is very easy. In the sample application, I have functionality for adding, updating, deleting and selecting categories and products as well as displaying a bread crumb of the ancestors (This could also be used to display a breadcrumb trail and make them links in any website).
- You just need this parent’s database record to create the child record.
- Can delete all descendants easily.
- Updating the record requires just the id of the element.
- Less columns using other left and right techniques
How do I create the parent column?
- The parent id is created using the following code:
$parentPrefix = $currentCategoryRecord["parent"];
if ($parentPrefix != null){
$parentPrefix .= $currentCategoryRecord["id"].”.”;
}else{
$parentPrefix = $currentCategoryRecord["id"].”.”;
}
This code creates the parent prefix for the currently displayed category and is need only to query descendants.
How do you find all the categories in a certain category?
- “SELECT * FROM product_table where parent=’”+<parent prefix>+”‘” in the data pictured above, if I want to display the categories TVs the select statment would look like “SELECT * FROM product_table where type=1 AND parent=’11.13.’
Why the “.” at the end of the parent?
- This makes it possible to distinguish between a category that starts with “1″ and a category that starts “11″. Not have the ending “.” means a startswith(’1′) or “WHERE parent LIKE ‘1%’ will find both “1″ and “11″ to be the ancestor.
Why not just combine the id and parent fields?
- By having two separate fields the data can be inserted into the database using a single statement. If they were a single field you would need to find the highest numbered product or category already entered at that level and increment it by one before inserting the new record.
- By having two fields, the “id” column can be a simple auto incrementing integer which can be joinde with other tables of data.
How do you tell what level a node is at?
- Using the PHP split function you can just capture the level:
$level = count(split($parent) ) – 1; In the sample I display a breadcrumb trail for the currently selected category. It is very simple use the split function and select all the ancestors from the database based on the id.
Why is there a type column?
- In my sample application, I am storing both categories (Tree Nodes) and products (Leafs). The type field differentiates the tree nodes from the leafs. In another application is may demote folders and files.
How do I find how many products a node or its descendants have?
- Determining how many products are under a node is performed by using a single SQL query on the database. “SELECT COUNT(id) as count from product_table where parent LIKE ‘<parent prefix>%’.
Isn’t the “LIKE” operator a performance hog?
- Performance is something that may take a slight hit, but only slightly. I did a bunch of research on this and found a good resource on using the “LIKE” operator. Click here to read a write up by John Nelson http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx
- You also reduce the number of queries need overall and you never need to rebuild the tree if the structure changes.
Conclusion:
I don’t really have any conclusions other than;
“Click Here” for the sample application download.- You can also check it out online, “Click Here”. I didn’t try to stylize it. Also, If you delete any items please add them or others back in.
Have a great New Years. Also included in the download is a simple database class to do some basic functionality, feel free to take and do want you want with it. I will be doing a write up on this is next week.
Have a great year and thanks for visiting Rockstarapps.com
Bob (Buffone)

why not:
table:folder
============
idfolder|foldername
——————-
1 |cars/ferrari
2 |cars/lamborghini
3 |lamborghini
.
.
.
table:items
===========
itemid|itemfolder
——————-
48 | 1
57 | 3
44 | 1
.
.
.
Breaking the product_table into two tables is a possible variation of the solution. In the example above, I would change the fordername to be differnt from the Structure because they are prone to change.
idfolder|ancestry|foldername
——————-
1 ||cars
2 |1|ferrari
3 |1|lamborghini
4 |1.3|2007
.
.
.
This may help keep the size of the ancestry strings down. You will typically have less folders than products.
Nice variation!
This problem was already addressed a long time ago in the form of the “Nested Set” data structure to store and fetch trees in relational tables.
http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
Using parent_ids to represent a tree is usually a bad idea. You end up doing more queries than necessary, and if you have a really large tree this simply doesn’t scale.
Tree with parent_id is the equivalent of a Bubblesort while Nested Sets would be like quicksorts.
As the comment above talks about, you can have separate tables to reduce the size of the parent ids, they could be stored as binary, and the node structure of the table is typically very small compared to the leafs.
I looked at the mysql page when research the solution and it is far more complex then the implmentation of this approach. Just look at “Find the Immediate Subordinates of a Node” and compare that to
SELECT * FROM product_table where parent LIKE ‘%’
You also need to “rebuild” the tree every time the structure changes (Delete, Inserting)
This requires no changes to any node other than the node being operated on.
A good primer on the options for storing tree data is the excellent Joe Celko’s Book.
The best way to store a tree in a table is to have ID and ParentID columns. Any entry that is a root node should have its own ID value as its ParentID value. That’s it! You are done. Everything that you could ever want to know about nodes, their parent, their children, their siblings is recorded.
Oh, you want to store a path in a table? Well, you might want to use the right terms. So, you stored a path based on ID? Great, now can I have that with the nodes sorted alphabetically? No, your approach is naive.
You will have to walk the path every time you want to display your tree nodes in a certain order. Yes, you can save each of these paths, but it will take up space. Usually, storing the path alphabetically and alphabetically descending is enough.
Aldrin Leal, Joe Celko’s ideas on storing tree data are retarded. Trees should be stored as I decried in the first paragraph of this post. Joe Celko’s ideas deal with storing some bastardized path/tree with neither path nor tree in sight. Why not keep the tree simple so that you can maintain the integrity of its data? Leave the messy part for walking the path.
The web’s most interesting stories on Wed 2nd Jan 2008
These are the web’s most talked about URLs on Wed 2nd Jan 2008. The current winner is ..
[...] alstevens wrote an interesting post today onHere’s a quick excerpt [...]
OK, continuing a bit I’ll assume you will have a Product table and a ProductCategory table (Category might be a bad name as you may have other types of categories in the future). Create a table called ProductProductCategoryTree (don’t worry, you’ll alias the table in any query you write so you might just as well be descriptive).
This table should have the following columns: ID, Type, ParentID, and ParentType. ID is the ID from the Product or ProductCategory table. Type is the type of record for this entry, either ‘product’ or ‘product category’ or 1 or 0. You now have a compound key so, if you don’t want that then keep products and product categories in one table. This will let you drop the Type and ParentType columns from you ProductProductCategoryTree table as each item in the tree will be of the same general type distinguished some other way besides table membership only.
Now, create an entry for each product category and product in the tree table based on ID. As mentioned before, if something is a root node in a tree it should have its own ID and Type as the parent. That is good mathematics. Obviously, you test for rootness by getting all nodes where the ID and Type and ParentID and ParentType are equal. To get all the children of a root node (and any node for that matter) get all the children where the ParentID and ParentType equal the desired node and where the ID does not equal the ParentID or the Type does not equal the ParentType. Read that last sentence again and pay attention to the word ‘or’. It is that placement that will let you easily have product categories as parents of product categories.
Finally, set up your path tables. You’ll name these ProductProductCategoryPathName and ProductProductCategoryPathNameDesc and the like. These tables will all have the same columns: ID, Type, ParentID (fyi only), ParentType (fyi only), AncestorID, and AncestorType and Sort. If you are only jumping into the tree at the begining then you can drop the ancestor columns. If not, then you have to build the path for every possible starting point and it is for that the ancestor columns are needed. The Sort column is where you put all your ‘11.13.16.’ business (’Dog FoodÿDryÿ’, ‘Dog FoodÿWetÿ’ if this is an alpha-sort ['ÿ' is a better separator for alpha than '.' for obvious reasons]) or you can have an intermediate step where you assign a unique serial number to each sort value and save space by using an integer value for your sort column.
You’ll need a path table for each of your sort criteria. The siblings just won’t be in the right order otherwise. You’ll need to rebuild the path every time you change a parent-child relationship or some value that affects one of your sort criteria. Start by rebuilding the path every time this changes and then optimize it by only rebuilding the affected down-paths.
In the end, you will have a very clean tree, very clean but relatively large paths and a system that is read-cheap and little bit write-expensive. You can blow the procedures you use to write the paths any time. You can make them faster and called about with triggers or programs or jobs. That is where you want to keep the mess and maybe clean the mess up later. What I just described about tree and path table structure, however, is the minimum and the entirety of the whole matter. Adding anything else is clutter and inelegance. You can do it with less than the table structures I described and you don’t need more.
In closing, I’d like to take the opportunity to criticize Joe Celko and the “Nested Set” approach that AkitaOnRails linked. The left-right bounding gets you nothing if you don’t want to sort by ID. If it gets you nothing for the very common sort criteria of alphabetical then why use it? It is human-understanding expensive and no human can edit it. It is the retarded love child of tree and his sister path. It is neither tree nor path and if some human tries to edit it and screws up then you will have to go to back to restore it. It is not any more write-cheap than the path tables approach that I have recommended. Finally, the left-right bounding approach cannot even be called “nested sets”. Real nested sets are what you will read in my approach and flatten into the path tables. Real nested sets don’t need anything but one parent ID to properly define them. Joe Celcko’s approach is a stupid waste of time.
This wasn’t one of your stated requirements, but I think it’s an important requirement when you want to correctly use a relational database: the design must allow for enforcement of referential integrity.
With the traditional ID/ParentID links, tree.parentid is a foreign key to tree.id. You are then guaranteed that you never have an invalid relationship set up. I always trust the database to be a better enforcer of rules than my own code… if nothing else, you get two layers of protection instead of just one with your own code. Storing structured data in an unstructured fashion, such as your OID approach to storing hierarchical data in a DB, should generally be avoided. You have a powerful tool for storing structured data, so you should let it do what it’s good at!
Some RDBMS packages have statements to help deal with with getting the whole ancestry more easily in a single SELECT statement (it’s “CONNECT BY” in Oracle), so you don’t have to recursively call the database. See, for example, Philip Greenspun’s excellent article at http://philip.greenspun.com/sql/trees.html for solving the problem in Oracle.
I would probably do it similarly to how you’ve done it in the post, or just noting the “parent_id” for each category.
This way is pretty slick though:
http://api.rubyonrails.org/classes/ActiveRecord/Acts/NestedSet/ClassMethods.html
“Nested Set is similar to Tree, but with the added feature that you can select the children and all of their descendents with a single query. A good use case for this is a threaded post system, where you want to display every reply to a comment without multiple selects.”
The only trick is pruning the tree if an element gets deleted.
If using postgresql you can easily get nested set behaviour (free queries for all descendents) by using the INET data type.
eg:
1::
1:1::
1:2::
And on and on…
Maybe you’re looking at the problem the wrong way. The problem might really be that a relational db isn’t the best tool for this job. (If the tool in your hand doesn’t perform the job, change tools!)
If there is a substantial quantity of data you might think about using a network or hierarchical db instead of a relational db.
If there isn’t a substantial amount of data it probably won’t make any difference to the speed of the application anyway.
Have you done any benchmarks with varying sizes and distributions of data with different methods? If so, what were the results?
who use trees when u can tag
However you feel about Celko’s work, his book “Trees and Hierarchies in SQL for Smarties” is an excellent introduction to this subject and well worth a read. It’s also got a reasonable explanation of Vadim Tropashko’s first attempt at improving on common methods for doing this. Actually I’m surprised nobody’s mentioned Tropashko already. His methods get bogged down in some pretty complex math, but should be considered for anyone wanting to really get their heads around storing trees in an rdbms. To save you a bit of time – Tropashko’s first method (1) has a massive flaw (arithmetic overflow once the tree has more than a few levels) but his subsequent work, especially the one using continued fractions (2) , is a lot better…
But when all’s said and done, in my own work, like the OP, I use a semi-materialised path.
(1) http://www.dbazine.com/oracle/or-articles/tropashko4
(2 = PDF) http://arxiv.org/pdf/cs.DB/0402051v1
Matt
As people have said the optimal way to store this kind of data is using “Nested Sets”.
Interesting approach. The only problem with this is that it’s not scalable for high traffic websites, and when I say high traffic think *big* online shops or *big* forums.
MySQL has its limits and when you work with tables with over 1-2 gig of data you can easily bump into them. The LIKE operator does its work, but it will never be nearly as fast as an integer-only, data-to-data approach. As a side note, what would help speed-wise and performance-wise would be a full text search engine for MySQL, such as Sphinx. But I digress.
Roustabout, you seem very decisive and aggressive against Celko’s approach. Let me begin by saying that from your post I haven’t figured yet why you go so ballistic over it. I once experimentally implemented tree and path processing using a simple id-parent_id relationship approach and the nested sets.
Allow me to say that nested sets turned out to be way cheaper in terms of performance, memory consumption and robustness. I concur with one of the previous posters, the difference is no greater, but no less either, than between bubblesort and quicksort.
Regarding the “you can’t edit the data by hand” let me say that if you need to manually alter the data and reorganize the tree… well, you’re doing it wrong, the whole thing.
The problem with this id-parent_id naive approach (and by naive I don’t mean stupid) is that databases were not designed around the tree model, but rather around the nested sets model. Celko’s approach is a more scientific one if you want and it is worth exploring. Once you wrap your head around it you will see its potential.
Migrate to PostgreSQL, and use “ltree”. Gives you data integrity, too
second the “nested sets” approach; way back in time I literally enhanced the performance by factor 20-100 for a major mainframe solution in my company by adding left and right columns to a hierarchical solution implemented in DB2 – worked perfectly, allowing us to extract the needed rows from sub-hierarchies with 50.000+ entries in “no-time”.
What happens when you want to rearrange stuff on the tree? Like moving a whole branch to another node.
The PostgreSQL ITree is worth exploring, as is the Nested sets. The whole thing with the article was to ensure the I had a traceable ancestry for every node. Breaking the simple table into two table “tree” and “leaf” then linking them would reduce the data set size considerably.
Also a combination of the Nested Sets and the approach I outlined could also make for a solution that would increase speed and still make the ancestry achievable.
Thanks for sharing this. I’ve worked on similar issues before, and it’s always a challenge to come up with a solution without a little help!
Hey, thanks for the nod on the “Like” operator stuff, I appreciate it! I never thought people would read my stuff, but now they’re linking to it. Cool.
Number2 (John Nelson)
Wells Fargo
Clawoo wrote “The problem with this id-parent_id naive approach (and by naive I don’t mean stupid) is that databases were not designed around the tree model, but rather around the nested sets model. Celko’s approach is a more scientific one if you want and it is worth exploring. Once you wrap your head around it you will see its potential.”
Celko’s approach never records a tree so it cannot be more scientific. Calling it nested sets is a misnomer too. A set is what you pull from the table. It can be all the records or the children of one node. Just because you have left-right containment IDs doesn’t mean anything is any more “nested” than each node have one parent ID.
Using one parent ID the programmer simply walks the path every time they need it or walk it once per edit and store the materialized path in a separate table. Celko’s approach seeks to store path and tree in one table and that is retarded. It removes all clarity from the recorded relationships in the tree. I’m not aggressive or upset by Celko’s approach. It is simply stupid and naive. It is only sorted by ID and provides no help when sorting by other criteria. So, it is almost worthless.
Now, some claims have been passed out about performance. I ask you, what could be faster than reading a table holding a fully materialized path for every entry point into the tree for one specific sort criteria. The answer is that nothing could be faster. Keep your tree a tree and your path a path and you will be happy and free from perversion. If you want to be a fruitcake and watch a tree and path play naked twister then use Celko’s approach. Have fun never sorting by other criteria!
I see one major flaw in this approach. It assumes that a child will have one and only one parent (or you will have to repeat children). Take the example of the books. A book can belong to multiple categories. “Truman” should be in Biography, Politics and US History. Or using the traditional ‘explosion of parts’, a screw can belong to many different assembled products (or sub assemblies).
@Daniel – on why way into work today I realized that the single table approach in the sample has that issue
Nice catch
Yeah, that is why you want a tree and path tables to be separate. You can have multiple entry points into the path that arrive at the same children if you need a node to have more that one parent. Your uniqueness constraint should be on ID and ParentID so that a product or product category can be in the tree more than once, but only once under the same parent.
Seriously, I never meant naive in a bad way. There is just a dozen things you have to learn the hard way when dealing with tree implementations. All the people suggesting this method or that method have not had to do much with their trees, obviously.
OK, I’m going to present some quick figures on a production system using the pure tree and fully materialized path approach.
The system has 88,000 “things” in it. These things are in a table that holds their unique key. They are related in a tree table. Every single thing is in the table at least once as its own parent. This is because I use the tree as an access control list for application security (granting access to a node grants access to the children). So, each of the 88,000 things are in the tree table multiple times sometimes as a top level item and sometimes as an n-th level item.
The tree table has 185,000 unique parent child relationships. Remember, 88,000 of those are just one level trees with a node as its own parent. The deepest leaf nodes in any tree are 6 deep, but their is no limit to how deep it can go.
With the tree firmly established I materialize a path for sorting the nodes of a tree in alphabetical order. Every time I have a node with itself as a parent, I create a new path (although, all paths are stored in one table). Each unique path is identified by its entry point which is the AncestorID column.
The each path table for each sort criteria has 480,000 unique records. The largest single path in the tree (because of multiple root nodes) has about 48,000 nodes.
Now, I can query those 48,000 nodes by AncestorID from the the path table based on sort criteria. I can then join it to detail records, filter by date and other criteria group, sum and average in a few seconds for hundreds of thousands of transactions. Get it? I can sort any way I like and my fully expanded tree is always join-able. Don’t want the 48,000 records? Pick some other AncestorID and get back 12 records.
So, as you can see this scales fine for a couple hundred thousand tree nodes. Each path takes up 32 MB for data and indexes. That can probably be reduced to under 16 MB if I replace my character sort column with a serial number. It is as fast as any bastardized approach, doesn’t take up much space and if I every need the space I can get rid of the paths and rebuild them any time because I always have the tree. By the way, in my case the tree itself takes up 14 MB.
I forgot to mention that in my path tables there is one more column that is Level. I store the level of each node in the path for quick indentation during presentation.
Also, if I want to narrow any part of the tree during display I just grab a node ID and jump back into the path with that as the AncestorID.
I should mention, that mostly my user interface uses Ajax to get one level of the tree at a time. For ad-hoc reports however, nothing beats having that fully expanded path, with level for any entry point. Keeping it join-able at any time is magic.
This really isn’t mean to be a facetious answer, but if you have control of which database is used then you might consider using a native XML database instead of MySQL. The free version of DB2 is the best choice.
Duke
[...] RockStarApps » How do you store a tree in a database table? (tags: database sql programming tree mysql howto performance algorithms **) [...]
Please look at http://www.devme.it/2007/09/nested_set_java_spring_postgresql_plpgsql_yahoo_framework.html
is a blog (sorry written in italian language) that propose a solution based on nested set using different technologies.
mulp
With solution you propose, you will have a lot of fun if you move something at level 2 to different parent – you suddenly have to move all the items below it.
In fact, best way to look at your approach is to see it as a second solution you have proposed (single numeric parent id), with cache of full path (and level) added to every item. I’m personally for using real caches for caching – either on application level for simple cases, or some kind of third party distributed cache for more complicated things.
BTW, I’m bit surprised that you have found the use case where you are not selecting all the parents anyway. I suppose that in most usages you will want to have at least names of the parents to display full path at the top of the page/whatever. If you are going to query all of them each time, you could as well go with simple parent id.
So, my suggestions:
1) Use database for persistence, not helping with application logic/caching
2) Cache on application level/use distributed cache if you want to scale
3) Hide database behind some layer, it seems that you are putting sql statements directly in your pages, which can lead to very strange design ideas
[...] let me thank everyone that commented and read the last blog, “How do you store a tree in a database table?” The feedback was great and helpful for everyone including myself. Probably one of the best [...]
You might have a look at JCR: http://www-128.ibm.com/developerworks/java/library/j-jcr/
It is simply a hierarchical data storage with versioning, full-text search, schema flexibility, simple binary storage and more!
The JCR standard: http://www.jcp.org/en/jsr/detail?id=170
Jackrabbit, open source repository: http://jackrabbit.apache.org/
Access for non-Java languages: http://dev.day.com/microsling/content/blogs/main/fudbusting2.html
More info: http://wiki.apache.org/jackrabbit/JcrLinks
Great post and a very cute trick. As for alternate implementations, I would consider approaching the problem as though the “categories” were really tags. So “Sports” is tagged “Non-fiction” and “Books”. “Non-fiction” is tagged “Books”, etc. Data models for tag-based architectures are fairly easy to find on the internet these days. In any case, there would have to be a separate mapping table, which would create *A LOT* of extra data. But disk is cheap and people’s patience waiting for a query to return is short. The great benefit here is that since the data is blown out a bit searches would be much faster.
[...] RockStarApps » How do you store a tree in a database table? [...]
[...] let me thank everyone that commented and read the last blog, “How do you store a tree in a database table?” The feedback was great and helpful for everyone including myself. Probably one of the best [...]
Google pre/size/level encoding.