It has been a while since I posted a technical article so my mailbox is jammed with e-mails from people begging me to spread more wisdom. This particular post deals with one of the most pressing and hard unsolved problems in computer science – Is it better to store user uploaded files in the database or on the file system? This question has spawned thousand of IRC, forum and other flames. I too have participated in my share of flames on the topic. So it is time for you, the reader, to learn what is the right
way to store user uploaded files and it is... (drum roll)... – IN THE DATABASE!
If you are one of these annoying types that always ask the question "Why?" instead of accepting the universal truth here are the reasons:
1. When storing files in the database we can guarantee that the data is consistent. The foreign keys will enforce referential integrity. No more files jamming the server when the item they were attached to no longer exists. On the contrary, it is relatively hard to execute transactional operations on both the file system and the database at the same time. This is the most important reason. Even if this was the only reason and there were no other reasons in favor of the database it would still prevail.
2. When all our user generated data is in the database it is easier to maintain. When we backup the data we will only backup the database. While it is not that hard to backup a bunch of files it is still a concern and we are going to backup the database anyway so why not backup everything at once. What is more, if at some point there is a change in the application that requires some update to the existing data like changing the file names to match some convention or adding a watermark to the images it is easier to manipulate data using SQL than writing a custom tool to crawl through files and directories.
3. You can store file related metadata like file size, image dimensions, filenames (especially useful when you want to be able to store multiple files with the same name) and make queries over it. To be honest you can store this data in the database even if you store the actual files on the file system but it is more natural if this data is "attached" to the file and it is easier to preserve data integrity.
There is one argument against storing the files in the database and it is called performance. More often than not people who bring forward this argument are not sure what performance they are talking about. Usually the flow of thought in these individuals goes like this – The file system is made specially for storing and retrieving files so it should perform better than any other tool that can do the job. So here are the most common ideas of performance:
- Search time. People believe that the file system is really fast at searching in a directory tree because it looks up files all the time. However the database is probably the most advanced tool for searching data based on a key. Nothing can beat that. What is more if you are storing files with the file system you still need to query the database for the file path.
- Retrieval time. This one is probably the only real argument. If you are storing the files in a blob field like varbinary(max) it will probably be slower to read. I have not measured the difference but it could not be that big since in both cases we are just reading bytes from a disc. Also when using blobs you need to read all the data in memory before transmitting it to the client. However this is not always bad. When dealing with small files you can take advantage of that and cache the data in memory to avoid accessing the disc. What is more this problem is resolved in most modern databases such as SQL Server 2008. It has a FILESTREAM data type which holds a pointer to a real file system file and you can access it as a stream. The performance and the way FILESTREAM files are handled is analogous to regular files. You get all the benefits of the file system and the database. I believe the equivalent data type in Oracle DB is called BFILE. If you are using an inferior database that does not support such a data type you should consider storing large
files on the file system.
According to this article about the FILESTREAM data type
you should chose FILESTREAM over varbinary(max) if:
- Objects that are being stored are, on average, larger than 1 MB
- Fast read access is important.
- You are developing applications that use a middle tier for application logic.
The article does not explain and I am not sure why we are supposed to choose FILESTREAM when we have a middle tier in our application. I suppose it has something to do with avoiding holding large file data in memory while the middle tier is doing its work. The article also claims that "for smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance." This would mean that storing data in varbinary(max) is faster for smaller files than storing them on the file system. File system lovers have to face the fact that in web scenarios most files are smaller than 1MB. It is true that there are videos and large files but they are usually distributed through some specialized services like YouTube. If you are creating such a service you have much more serious problems at hand and I doubt that something as simple as deciding where to store the files will solve them. You need content delivery networks, big clusters of servers that store content close to the client, adaptive streaming technologies and so on. Most files in a standard web application are small and are usually pictures. Also fast read access is not really important. When the user opens a web page he does not care if the pictures will be loaded for 500 ms or for 510 ms. There are better methods to improve performance like caching on the client, accessing pictures via another domain, compression, etc.
To sum it up large files are the only problem and file based data types in the database solves that problem. The database is the clear winner. For most application the database wins even without using file based data type because most applications deal mostly with small files. Where are YOU going to store the files from now on?