File System vs Database

   It has been a while since I posted a technical article so my mailbox is jammed with e-mails from people begging me to spread more wisdom. This particular post deals with one of the most pressing and hard unsolved problems in computer science – Is it better to store user uploaded files in the database or on the file system? This question has spawned thousand of IRC, forum and other flames. I too have participated in my share of flames on the topic. So it is time for you, the reader, to learn what is the right way to store user uploaded files and it is... (drum roll)...  – IN THE DATABASE!

   If you are one of these annoying types that always ask the question "Why?" instead of accepting the universal truth here are the reasons:

   1. When storing files in the database we can guarantee that the data is consistent. The foreign keys will enforce referential integrity. No more files jamming the server when the item they were attached to no longer exists. On the contrary, it is relatively hard to execute transactional operations on both the file system and the database at the same time. This is the most important reason. Even if this was the only reason and there were no other reasons in favor of the database it would still prevail.

   2. When all our user generated data is in the database it is easier to maintain. When we backup the data we will only backup the database. While it is not that hard to backup a bunch of files it is still a concern and we are going to backup the database anyway so why not backup everything at once. What is more, if at some point there is a change in the application that requires some update to the existing data like changing the file names to match some convention or adding a watermark to the images it is easier to manipulate data using SQL than writing a custom tool to crawl through files and directories.

   3. You can store file related metadata like file size, image dimensions, filenames (especially useful when you want to be able to store multiple files with the same name) and make queries over it. To be honest you can store this data in the database even if you store the actual files on the file system but it is more natural if this data is "attached" to the file and it is easier to preserve data integrity.

   There is one argument against storing the files in the database and it is called performance. More often than not people who bring forward this argument are not sure what performance they are talking about. Usually the flow of thought in these individuals goes like this – The file system is made specially for storing and retrieving files so it should perform better than any other tool that can do the job. So here are the most common ideas of performance:

   - Search time. People believe that the file system is really fast at searching in a directory tree because it looks up files all the time. However the database is probably the most advanced tool for searching data based on a key. Nothing can beat that. What is more if you are storing files with the file system you still need to query the database for the file path.

   - Retrieval time. This one is probably the only real argument. If you are storing the files in a blob field like varbinary(max) it will probably be slower to read. I have not measured the difference but it could not be that big since in both cases we are just reading bytes from a disc. Also when using blobs you need to read all the data in memory before transmitting it to the client. However this is not always bad. When dealing with small files you can take advantage of that and cache the data in memory to avoid accessing the disc. What is more this problem is resolved in most modern databases such as SQL Server 2008. It has a FILESTREAM data type which holds a pointer to a real file system file and you can access it as a stream. The performance and the way FILESTREAM files are handled is analogous to regular files. You get all the benefits of the file system and the database. I believe the equivalent data type in Oracle DB is called BFILE. If you are using an inferior database that does not support such a data type you should consider storing large files on the file system.

   According to this article about the FILESTREAM data type you should chose FILESTREAM over varbinary(max) if:

   - Objects that are being stored are, on average, larger than 1 MB
   - Fast read access is important.
   - You are developing applications that use a middle tier for application logic.
   
   The article does not explain and I am not sure why we are supposed to choose FILESTREAM when we have a middle tier in our application. I suppose it has something to do with avoiding holding large file data in memory while the middle tier is doing its work. The article also claims that "for smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance." This would mean that storing data in varbinary(max) is faster for smaller files than storing them on the file system. File system lovers have to face the fact that in web scenarios most files are smaller than 1MB. It is true that there are videos and large files but they are usually distributed through some specialized services like YouTube. If you are creating such a service you have much more serious problems at hand and I doubt that something as simple as deciding where to store the files will solve them. You need content delivery networks, big clusters of servers that store content close to the client, adaptive streaming technologies and so on. Most files in a standard web application are small and are usually pictures. Also fast read access is not really important. When the user opens a web page he does not care if the pictures will be loaded for 500 ms or for 510 ms. There are better methods to improve performance like caching on the client, accessing pictures via another domain, compression, etc.

   To sum it up large files are the only problem and file based data types in the database solves that problem. The database is the clear winner. For most application the database wins even without using file based data type because most applications deal mostly with small files. Where are YOU going to store the files from now on?
Tags:   english tech 
Posted by:   Stilgar
03:53 21.12.2009

Comments:

First Previous 1 Next Last 

Posted by   Guest (Unregistered)   on   21:21 25.12.2009

Great article. I've been looking for it in decades and now it saves me months of work! Just one small thing - in web-based applications you quite often also deal with documents besides of pictures. And maybe you want to have them versioned. Exactly there the database prevails.

Posted by   Stilgar   on   23:42 25.12.2009

Way to go, comrade! Right now we may be a minority but one day we will prevail. The masses will soon understand our great cause and rise to support us. Viva La Revolución!

Posted by   livetoday (Unregistered)   on   13:24 30.12.2009

MogileFS forever ;P

Posted by   alex (Unregistered)   on   15:01 10.01.2010

Only to add, you will get partition problem in your db if you have many big files in your database.
About performance oracle said their database works faster than fs in some cases.
Good article, transactions and queries are really important.

Posted by   Guest (Unregistered)   on   09:21 26.01.2010

I recognize the need for dealing with files in transactions as being the real meat. I do find it funny that the filestream types listed are actually using the filesystem for the storage.  It also mentions you get the benfits of both, but I really didn't see any description of the benefits of using the filesystem.  It would also seem that if it was a universal truth you would not need to discuss it in the first place, perhaps it is just a solar system level truth ;)

Posted by   Stilgar   on   13:04 26.01.2010

OK to be honest I haven't proven it beyond Earth level but it seems enough for practical purposes :)

Posted by   Guest_asking (Unregistered)   on   07:57 09.07.2010

what is differents between file-based system vs database system???

Posted by   Stilgar   on   11:33 09.07.2010

File system is the operating system file system where you save files when you use the OS. You can write your software to save its files in this way. You can also get the database to manage the files. I guess you know what a database is.

Posted by   Guest (Unregistered)   on   08:54 30.07.2010

thanku sirrrrrrrrrr

Posted by   Guest (Unregistered)   on   20:34 23.12.2010

thank u this is very good valuable information

Posted by   Rumen Yankov (Unregistered)   on   17:52 28.06.2011

Good article, Thanks! I did the same decision before about 2 years and can only add that there is a simple solution for storing large files in varbinary and read them on small chunks using SQL SUBSTRING function. I have a DatabaseStream implementation that works perfect with small and large files.

Posted by   Guest (Unregistered)   on   22:49 18.10.2012

Каза IRC... Я кажи мрежата и канала дето се задават такива въпроси?

Posted by   Stilgar   on   23:15 18.10.2012

irc.freenode.net
I have seen the discussion in the ASP.NET channel but I guess any web tech or database channel can have such a discussion.

Posted by   Guest (Unregistered)   on   18:03 06.10.2014

great article

Posted by   Guest (Unregistered)   on   16:28 03.12.2015

Nice article

Posted by   Guest (Unregistered)   on   12:58 16.11.2016

We need an update on this! BtrFS, XFS and Ex4 have done wonders since 2009.

First Previous 1 Next Last 


Post as:



Post a comment: