Lucene Index Maintenance (Insert, Update and Delete)


lucene index maintenance, lucene index insert, lucene index update not working, lucene index delete fails, lucene index fast insert, lucene index fast update, lucene index maintenance issues, lucene index maintenance problems.

Lucene is a powerfull search engine with great indexing API which can be used to build search feature for your web application.
In this post I am trying to address the maintenance aspects of a Lucene index.
Once the initial index is created here are the steps which you may need to perform


1. Inserting new documents
2. Updating existing documents
3. Deleting existing documents


Inserting new documents:
Lucene doesnt check for duplicate documents, so you need to build your own logic to identify whether a document is new document or a existing one.
Once identied, inserting is not different than creating the index we just need to take care of recreate flag,
which needs to be false so that the index directory is not overwritten (recreated). Adding the document can be done using IndexWriter.addDocument() call for each document you want to add.

Updating existing documents:
As I have already mentioned that Lucene is not going to check for duplicates, so in case of updating
if its an existing document then it can be done in two steps,
first delete the document and then add the new document to index.

Deleting existing documents:
Deleting can be easily done by IndexReader.deleteDocuments(Terms) and IndexWriter.deleteDocuments(Terms)

Problem deleting documents:

What I have observed is that if the document field is UN_TOKENIZED then these methods work well. But if the Document field is TOKENIZED then delete fails.

One good way to do this is to keep same field with a different name and with UN_TOKENIZED type index field, and while deleting use the same field Term to delete the document.

Deleting the index documents by number is simple and easy. It works well if you know the document number.

You need to reopen all the searchers which were already open.

Post a Comment Default Comments

  1. Do you have any idea why only UN_TOKENIZED fields can be deleted?

    I have no clue why is it so.

    ReplyDelete
  2. I have a problem when I deleted some docs from Index and after this adding some docs. to index. For example. if I have 10 docs. in Index and if I delete 9, then 1 doc. remains. If I close the indexWriter and reopens and tries to add 4 docs. it is showing as there are 5 docs. in Index (writer.occount()) but if I close the writer and reopens it it is saying that 2 docs. exists in index. can anybody help whats the problem is.

    ReplyDelete
  3. Can you show us the code snippet which is doing this?

    ReplyDelete
  4. Anyone know how to do things like this?

    ReplyDelete
  5. You are not right. I can defend the position. Write to me in PM.

    ReplyDelete
  6. It is reserve, neither it is more, nor it is less

    ReplyDelete
  7. did yu hear about new iphone 4.0? For me, one of the best features in iPhone OS 4.0 is its multitasking feature. And you?

    ReplyDelete

Individuals who comment on FromDev at regular basis, will be rewarded in Top Commenter section. (Comments are selectively moderated so please do not spam)

emo-but-icon

...

item