WordList is the
mifluz
equivalent of a database handler. Each
WordList object is bound to an inverted index file and implements the
operations to create it, fill it with word occurrences and search
for an entry matching a given criterion.
WordList is an abstract class and cannot be instanciated. The List method of the class WordContext will create an instance using the appropriate derived class, either WordListOne or WordListMulti. Refer to the corresponding manual pages for more information on their specific semantic.
When doing bulk insertions, mifluz creates temporary files that
contain the entries to be inserted in the index. Those files are
typically named
indexC00000000
. The maximum size of the
temporary file is
wordlist_cache_size
/ 2. When the maximum
size of the temporary file is reached, mifluz creates another temporary
file named
indexC00000001
. The process continues until mifluz
created 50 temporary file. At this point it merges all temporary files
into one that replaces the first
indexC00000000
. Then it continues
to create temporary file again and keeps following this algorithm until
the bulk insertion is finished. When the bulk insertion is finished,
mifluz has one big file named
indexC00000000
that contains
all the entries to be inserted in the index. mifluz inserts all the
entries from
indexC00000000
into the index and delete the
temporary file when done. The insertion will be fast since all the
entries in
indexC00000000
are already sorted.
The parameter
wordlist_cache_max
can be used to prevent the
temporary files to grow indefinitely. If the total cumulated size of
the
indexC*
files grow beyond this parameter, they are merged
into the main index and deleted. For instance setting this parameter
value to 500Mb garanties that the total size of the
indexC*
files will not grow above 500Mb.