The cache memory used by mifluz
has a tremendous impact on
performance. It is set by the wordlist_cache_size attribute
(see WordList(3) and mifluz(3)). It holds pages from the inverted index
in memory (uncompressed if the file is compressed) to reduce disk
access. Pages migrate from disk to memory using a LRU.
Each page in the cache is really a node of the B-Tree used to store the inverted
index entries. The internal pages are intermediate nodes that mifluz
must
traverse each time a key is searched. It is therefore very important to keep them in memory.
Fortunately they only count for 1% of the total size of the index, at most.
The size of the cache must at least include enough space for the internal pages.
The other factors that must be taken into account in sizing the cache are highly dependant on the application. A typical case is insertion of many random words in the index. In this case two factors are of special importance:
The general rule is : estimate or calculate how many unique words fill
90% of your index. Multiply this number by the pagesize and increase your
cache by that amount.
See wordlist_page_size attribute in WordList(3) or mifluz(3).
foo 100 foo 103
rather than
foo 103 foo 100
This hint must not be considered in isolation but with careful analysis of the distribution of the key components (word and numbers). For instance it does not matter much if a random key follows the word as long as the range of values of the number is small.
The conclusion is that the cache size should be at least 1% of the total index size (uncompressed) plus a number of bytes that depends on the usage pattern.