Lucene Index Maintenance (Insert, Update and Delete)
Lucene is a powerfull search engine with great indexing API which can be used to build search feature for your web application.
In this post I am trying to address the maintenance aspects of a Lucene index.
Once the initial index is created here are the steps which you may need to perform
1. Inserting new documents
2. Updating existing documents
3. Deleting existing documents
Inserting new documents:
Lucene doesnt check for duplicate documents, so you need to build your own logic to identify whether a document is new document or a existing one.
Once identied, inserting is not different than creating the index we just need to take care of recreate flag,
which needs to be false so that the index directory is not overwritten (recreated). Adding the document can be done using IndexWriter.addDocument() call for each document you want to add.
Updating existing documents:
As I have already mentioned that Lucene is not going to check for duplicates, so in case of updating
if its an existing document then it can be done in two steps,
first delete the document and then add the new document to index.
Deleting existing documents:
Deleting can be easily done by IndexReader.deleteDocuments(Terms) and IndexWriter.deleteDocuments(Terms)
Problem deleting documents:
What I have observed is that if the document field is UN_TOKENIZED then these methods work well. But if the Document field is TOKENIZED then delete fails.
One good way to do this is to keep same field with a different name and with UN_TOKENIZED type index field, and while deleting use the same field Term to delete the document.
Deleting the index documents by number is simple and easy. It works well if you know the document number.
You need to reopen all the searchers which were already open.