Rather than search all the files all the time:
- Read all the files once.
- Build a file which contains all the words found in all the files.
- Build pointers from all the words to all the files which contain the words.
- Compress everything for speed.
- The indexing program:
- needs the types and locations of the documents being indexed.
- needs the levels of search capabilities
- needs a list of "stop words", i.e. words to ignore in the search.
- The resultant index files can be larger than the source
files.