UBIFS - UBI File-System
Table of contents
- Big red note
- Overview
- Current status
- Source code
- Mailing list
- User-space tools
- Scalability
- Extended attributes
- Mount options
- Documentation
- How to send an UBIFS bugreport?
Big red note
One thing people have to understand when dealing with UBIFS is that UBIFS is very different to any traditional file-system - it does not work on top of block devices (like hard drive, an SD card, an USB stick, etc). It was designed to work on top of UBI volumes, which have nothing to do with block devices. This is why UBIFS does not work on MMC cards or USB flash sticks - they look like block devices to the outside world because they implement FTL (Flash Translation Layer) support in hardware, which simply speaking emulates a block device on top of the built-in flash chip. This FAQ entry describes the difference between block devices and flash chips.
Overview
UBIFS is a new flash file system which is designed to work on top of UBI. The file system is developed by Nokia engineers with help of the University of Szeged.
UBIFS works on top of UBI volumes, it cannot operate on top of MTD devices. In other words, there are 3 subsystems involved:
- MTD subsystem, which provides a uniform interface to access
flash chips. MTD provides a notion of MTD devices (e.g.,
/dev/mtd0) which basically provides access to the raw flash; - UBI subsystem, which is a volume management system for flash devices; UBI works on top of MTD devices and provides a notion of UBI volumes; UBI volumes are higher level entities than MTD devices and they are devoid of many unpleasant issues MTD devices have (e.g., wearing and bad blocks); see here for more information;
- UBIFS file-system, which works on top of UBI volumes.
In contrast, JFFS2 file-system works on top of raw MTD devices.
The main objectives of UBIFS are:
- better scalability comparing to what JFFS2 provides, which is achieved by storing and maintaining the indexing data structures of the file-system on the flash media;
- write-back support;
- support of on-the-flight compression.
Please, find more comments to each of the above points in the following sections.
Here is a short and unsorted list of some of UBIFS features:
- write-back support - this dramatically improves the throughput of the file-system comparing to JFFS2, which is write-through;
- fast mount time - unlike JFFS2, UBIFS does not have to scan whole media when mounting, it takes milliseconds for UBIFS to mount the media, and this does not depend on flash size; however, UBI initialization time depends on flash size and has to be taken into account (see here for more details);
- tolerance to unclean reboots - UBIFS is a journaling file system and it tolerates sudden crashes and unclean reboots; UBIFS just replays the journal and recovers from the unclean reboot; mount time is a little bit slower in this case, because of the need to replay the journal, but UBIFS does not need to scan whole media, so it anyway takes fractions of a second;
- fast I/O - even with write-back disabled (e.g., if UBIFS is mounted
with
-o syncmount flag UBIFS shows good performance which is close to JFFS2 performance; bear in mind, it is extremely difficult to compete with JFFS2 in synchronous I/O, because JFFS2 does not maintain indexing data structures on flash, so it does not have the maintenance overhead, while UBIFS does have it; the main UBIFS write speed booster is the write-back support; the other one is the way UBIFS commits the journal - it does not move the data physically from one place to another but instead, it just adds corresponding information to the file-system index and picks different eraseblocks for the new journal (i.e., UBIFS has sort of "wandering" journal); - on-the-flight compression - the data is stored in compressed form on the flash media, which makes it possible to put considerably more data to the flash than if the data was not compressed; this is very similar to what JFFS2 has; UBIFS also allows to switch the compression on/off on per-inode basis, which is very flexible; for example, one may switch the compression off by default and enable it only for certain files which are supposed to compress well; or one may switch compression on by default but disable it for supposedly uncompressible data like multimedia files; at the moment UBIFS supports only Zlib and LZO compressors and it is not difficult to add more;
Current status
UBIFS has proved to be fairly stable, however it needs more testing, review, and tuning (e.g., finding optimal journal size, etc). It also needs some profiling and bottleneck hunting.
Source code
The UBIFS development git tree is:
git://git.infradead.org/~dedekind/ubifs-2.6.git
The git tree is usually based on top of the latest main-line Linux kernel release candidate and it is re-based often. But there are also 2.6.21, 2.6.22, 2.6.23, 2.6.24, and 2.6.25 back-ports, although they are not always up-to-date and one may want to pick up additional patches from the main development tree to utilize the latest UBIFS version. The back-ports may be found at the following git-trees:
git://git.infradead.org/~dedekind/ubifs-v2.6.25.gitgit://git.infradead.org/~dedekind/ubifs-v2.6.24.gitgit://git.infradead.org/~dedekind/ubifs-v2.6.23.gitgit://git.infradead.org/~dedekind/ubifs-v2.6.22.gitgit://git.infradead.org/~dedekind/ubifs-v2.6.21.git
Note, since it is impossible to register memory shrinker in kernel versions 2.6.21 and 2.6.22, the UBIFS shrinker does not work there, which means the UBIFS TNC cache never gets shrinked and the system may run out of memory if the file-system is large enough.
Mailing list
You are welcome to send feed-back, bug-reports, patches, etc to the MTD mailing list. Feel free to ask questions.
User-space tools
There is only one UBIFS user-space tool at the moment - mkfs.ubifs, which creates UBIFS images. The tool may be found at
git://git.infradead.org/~dedekind/mkfs.ubifs.git
The images produced by mkfs.ubifs may be written to
UBI volumes using ubiupdatevol
or may be further fed to the
ubinize
tool to create an UBI image which may be put to the MTD device.
Scalability
All the data structures UBIFS is using are trees, so it scales logarithmically in terms of flash size. However, UBI scales linearly (see here) which makes overall UBI/UBIFS stack scalability linear. But the UBIFS authors believe it is always possible to create logarithmically scalable UBI2 and to improve the situation. Current UBI should be OK for 2-16GiB flashes, depending on the I/O speed and requirements.
Note, although the UBI scalability is linear, it anyway scales better much than JFFS2, which was originally designed for small ~32MiB NOR flashes. JFFS2 has scalability issues on the "file-system level", while UBI/UBIFS stack has scalability issues only on lower "raw flash level". The following table describes the issues in more details.
| Scalability issue | JFFS2 | UBIFS |
| Mount time linearly depends on the flash size | True, the dependency is linear, because JFFS2 has to scan whole flash media when mounting. | UBIFS mount time does not depend on the flash size. But UBI needs to scan the flash media, which is actually quicker than JFFS2 scanning. So overall, UBI/UBIFS has this linear dependency. |
| Memory consumption linearly depends on the flash size | True, the dependency is linear. | UBIFS memory does depend on the flash size in the current implementation, because the LPT shrinker is not implemented. But it is not difficult to implement the LPT shrinker and get rid of the dependency. It is not implemented only because the memory consumption is too small to make the coding work worth it. UBI memory consumption linearly depends on flash size. Thus, overall UBI/UBIFS stack has the linear dependency. |
| Mount time linearly depends on the file-system contents | True, the more data is stored on the file-system, the longer it takes to mount it, because JFFS2 has to do more scanning work. | False, mount time does not depend on the file-system contents. At the worst case (if there was an unclean reboot), UBIFS has to scan and replay the journal which has fixed and configurable size. |
| Full file-system checking is required after each mount | True. JFFS2 has to check whole file system just after it has been
mounted in case of NAND flash. The checking involves reading all
nodes for each inode and checking their CRC checksums, which
consumes a lot of CPU. For example, this may be seen by running the
top utility just after JFFS2 has been mounted. This
slows down overall system boot-up time. Fundamentally, this is
needed because JFFS2 does not store space accounting information
(i.e., free/dirty space) on the flash media but instead, gathers
this information by scanning the flash media. |
False. UBIFS does not scan/check whole file-system because it stores the space accounting information on the flash media in the so-called LPT (Logical eraseblock Properties Tree) tree. |
| Memory consumption linearly depends on file-system contents | True. JFFS2 keeps a small data structure in RAM for each node on flash, so the more data is stored on the flash media, the more memory JFFS2 consumes. | False. UBIFS memory consumption does not depend on how much data is stored on the flash media. |
| The first file access time linearly depends on its size | True. JFFS2 has to keep in RAM so-called "fragment tree" for each inode corresponding to an opened file. The fragment tree is an in-memory RB-tree which is indexed by file offset and refers on-flash nodes corresponding to this offset. The fragment tree is not stored on the flash media. Instead, it is built on-the-flight when the file is opened for the first time. To build the fragment tree, JFFS2 has to read each data node corresponding to this inode from the flash. This means, the larger is the file, the longer it takes to open it for the first time. And the larger is the file the more memory it takes when it is opened. Depending on the system, JFFS2 becomes nearly unusable starting from certain file size. | False. UBIFS stores all the indexing information on the media in the indexing B-tree. Whenever a piece of data has to be read from the file-system, the B-tree is looked-up and the corresponding flash address to read is found. There is a TNC cache which caches the B-tree nodes when the B-tree is looked-up, and the cache is shrinkable, which means it might be shrunk when the kernel needs more memory. |
| File-system performance depends on I/O history | True. Since JFFS2 is fully synchronous, it writes data to the flash media as soon as the data arrives. If one changes few bytes in the middle of a file, JFFS2 writes a data node which contains those bytes to the flash. If there are many random small writes all over the place, the file-system becomes fragmented. JFFS2 merges small fragments to 4KiB chunks, which involves re-compression and re-writing the data. But this "de-fragmentation" is happening during garbage collection and at random time, because JFFS2 wear-leveling algorithm is based on random eraseblock selection. So if there were a lot of small writes, JFFS2 becomes slower some time later - the performance just goes down out of the blue which makes the system less predictable. | False. UBIFS always writes in 4KiB chunks. This does not hurt the performance much because of the write-back support: the data changes do not go to the flash straight away - they are instead deferred and are done later, when (hopefully) more data is changed at the same data page and usually in background. |
Extended attributes
UBIFS supports extended attributes if the corresponding configuration option
is enabled (no additional mount options are required). It supports
user, trusted, and security name-spaces.
However, access control list (ACL) support is not implemented.
Mount options
The following are UBIFS-specific mount options.
norm_unmount(default) - commit on unmount; the journal is committed when the file-system is unmounted so that the next mount does not have to replay the journal and it becomes very fast;fast_umount- do not commit on unmount; this option makes unmount faster, but the next mount slower because of the need to replay the journal.
Besides, UBIFS supports the standard sync mount option which
may be used to disable UBIFS write-back and write-buffer caching and make it
fully synchronous.
Documentation
The UBIFS white-paper which briefly describes main UBIFS design aspects is available here: ubifs_whitepaper.pdf. There is UBIFS FAQ which might be useful. Also, there is a wiki page, but it has a lot of out-of-date information.
How to send an UBIFS bugreport?
Before sending a bug report:
- make sure you have compiled kernel symbols in
(
CONFIG_KALLSYMS_ALL=yin.config); - enable UBIFS debugging (
CONFIG_UBIFS_FS_DEBUG=yin.config).
Please, attach all the bug-related messages including the UBIFS messages from
the kernel ring buffer, which may be collected using the dmesg
utility or using minicom with serial console capturing. And of
course, it is wise to describe how the problem can be reproduced. The bugreport
should be sent to the MTD mailing list.