Skip to content
Matt Davis edited this page May 20, 2016 · 24 revisions

Creating a Client

//Create the cluster configuration by adding cluster servers
LumongoPoolConfig lumongoPoolConfig = new LumongoPoolConfig();
lumongoPoolConfig.addMember("localhost");

//optional settings (default values shown)
lumongoPoolConfig.setDefaultRetries(0);//Number of attempts to try before throwing an exception
lumongoPoolConfig.setMaxConnections(8); //Maximum connections per server
lumongoPoolConfig.setMaxIdle(8); //Maximum idle connections per server
lumongoPoolConfig.setCompressedConnection(false); //Use this for WAN client connections
lumongoPoolConfig.setPoolName(null); //For logging purposes only, null gives default of lumongoPool-n
lumongoPoolConfig.setMemberUpdateEnabled(true); //Periodicly update the members of the cluster
lumongoPoolConfig.setMemberUpdateInterval(10000); //Interval to update the members in ms
lumongoPoolConfig.setRoutingEnabled(true); //enable routing indexing to the correct server, this only works if automatic member updating is enabled or it is periodically called manually.

//create the connection pool
LumongoWorkPool lumongoWorkPool = new LumongoWorkPool(lumongoPoolConfig);

Updating Current Members (Optional)

lumongoWorkPool.updateMembers();

Creating an Index

String defaultSearchField = "title";
int numberOfSegments = 16;

IndexConfig indexConfig = new IndexConfig(defaultSearchField);
indexConfig.addFieldConfig(FieldConfigBuilder.create("title", FieldType.STRING).indexAs(DefaultAnalyzers.STANDARD));
indexConfig.addFieldConfig(FieldConfigBuilder.create("issn", FieldType.STRING).indexAs(DefaultAnalyzers.LC_KEYWORD).facet());
indexConfig.addFieldConfig(FieldConfigBuilder.create("an", FieldType.NUMERIC_INT).index());

CreateIndex createIndex = new CreateIndex("myIndexName", numberOfSegments, indexConfig);
lumongoWorkPool.createIndex(createIndex);

The number of segments and unique id field cannot be changed for the index once the index is created. Set number of segments to greater than or equal to the maximum number of servers possible in the cluster. In the future changing the number of segments will be possible through a separate process.

LuMongo supports indexes created from object annotations. For more info see section on Object Persistence.

Updating an Index

String defaultSearchField = "abstract";
IndexConfig indexConfig = new IndexConfig(defaultSearchField);

indexConfig.addFieldConfig(FieldConfigBuilder.create("title", FieldType.STRING).indexAs(DefaultAnalyzers.STANDARD));
indexConfig.addFieldConfig(FieldConfigBuilder.create("issn", FieldType.STRING).indexAs(DefaultAnalyzers.LC_KEYWORD).facet());
indexConfig.addFieldConfig(FieldConfigBuilder.create("an", FieldType.NUMERIC_INT).index());
indexConfig.addFieldConfig(FieldConfigBuilder.create("abstract", FieldType.STRING).indexAs(DefaultAnalyzers.STANDARD));

UpdateIndex updateIndex = new UpdateIndex("myIndexName", indexConfig);
lumongoWorkPool.updateIndex(updateIndex);

Changing or adding analyzers for fields that are already indexed may require re-indexing for desired results.

Creating or Update an Index

String defaultSearchField = "abstract";
int numberOfSegments = 16;
String uniqueIdField = "uid";

IndexConfig indexConfig = new IndexConfig(defaultSearchField);

indexConfig.addFieldConfig(FieldConfigBuilder.create("title", FieldType.STRING).indexAs(DefaultAnalyzers.STANDARD));
indexConfig.addFieldConfig(FieldConfigBuilder.create("issn", FieldType.STRING).indexAs(DefaultAnalyzers.LC_KEYWORD).facet());
indexConfig.addFieldConfig(FieldConfigBuilder.create("an", FieldType.NUMERIC_INT).index());
indexConfig.addFieldConfig(FieldConfigBuilder.create("abstract", FieldType.STRING).indexAs(DefaultAnalyzers.STANDARD));

CreateOrUpdateIndex createOrUpdateIndex = new CreateOrUpdateIndex(indexName, numberOfSegments, uniqueIdField, indexConfig);
CreateOrUpdateIndexResult result = lumongoWorkPool.createOrUpdateIndex(createOrUpdateIndex);
System.out.println(result.isNewIndex());
System.out.println(result.isUpdatedIndex());

Note that an update cannot change the number of segments or the unique id field. and that changing or adding analyzers for fields that are already indexed may require re-indexing for desired results

Index Config Details

The individual settings on IndexConfig are explained below:

defaultSearchField - The field that is searched if no field is given to a lucene query
defaultAnalyzer - The default analyzer for all fields not specified by a field config
fieldConfig - Overrides the default analyzer for a field
segmentCommitInterval - Indexes or deletes to segment before a commit is forced (default 3200)
idleTimeWithoutCommit - Time without indexing before commit is force in seconds (0 disables) (default 30)
applyUncommitedDeletes - Apply all deletes before search (default true)
segmentQueryCacheSize - Number of queries cached at the segment level
segmentQueryCacheMaxAmount - Queries with more this this amount of documents returned are not cached
storeDocumentInIndex - Store mongo document in lucene index for fast fetching with queries (default true)
storeDocumentInMongo - Store mongo document in mongodb to allow things like map reduce and aggregation of the document (currently has to be done directly against mongodb not through lumongo)

//The following are used in optimizing federation of segments when more than one segment is used.  The amount requested from each segment on a //query is (((amountRequestedByQuery / numberOfSegments) + minSegmentRequest) * requestFactor).
requestFactor - Use in calculation of request size for a segment (default 2.0)
minSegmentRequest - Added to the calculated request for a segment (default 2)
segmentTolerance - Difference in scores between segments tolerated before requesting full results (query request amount) from the segment (default 0.05)

These Field Types are Available

STRING
NUMERIC_INT
NUMERIC_LONG
NUMERIC_FLOAT
NUMERIC_DOUBLE
DATE
BOOL

These Built In Analyzers are available (DefaultAnalyzers)

KEYWORD - Field is searched as one token
LC_KEYWORD - Field is searched as one token in lowercase (case insenstive, use for wildcard searches)
STANDARD - Standard lucene analyzer (good for general full text)
MIN_STEM - Minimal English Stemmer
KSTEMMED - K Stemmer
LSH - Locality Sensitive Hash

Custom Analyzer (Supported in java client from version 0.51)

indexConfig.addAnalyzerSetting("myAnalyzer", Tokenizer.WHITESPACE, Arrays.asList(Filter.ASCII_FOLDING, Filter.LOWERCASE), Similarity.BM25, QueryHandling.NORMAL);
indexConfig.addFieldConfig(FieldConfigBuilder.create("abstract", FieldType.STRING).indexAs("myAnalyzer"));

Delete Index

lumongoWorkPool.deleteIndex(indexName);

Storing / Indexing Documents

LuMongo supports indexing and storing from object annotations. For more info see section on Object Persistence.

MongoDB/BSON Document

DBObject dbObject = new BasicDBObject();
dbObject.put("title", "Magic Java Beans");
dbObject.put("issn", "4321-4321");

Store s = new Store("myid222", "myIndexName");

ResultDocBuilder resultDocumentBuilder = new ResultDocBuilder();
resultDocumentBuilder.setDocument(dbObject);

//optional meta
resultDocumentBuilder.addMetaData("test1", "val1");
resultDocumentBuilder.addMetaData("test2", "val2");

s.setResultDocument(resultDocumentBuilder);

lumongoWorkPool.store(s);

Storing Associated Documents

String uniqueId = "myid123";
String indexName = MY_INDEX_NAME;
String filename = "myfile2";
		
AssociatedBuilder associatedBuilder = new AssociatedBuilder();
associatedBuilder.setFilename(filename);
associatedBuilder.setCompressed(false);
associatedBuilder.setDocument("Some Text3");
associatedBuilder.addMetaData("mydata", "myvalue2");
associatedBuilder.addMetaData("sometypeinfo", "text file2");
		
//can be part of the same store request as the document
Store s = new Store(uniqueId, indexName);

//multiple associated documented can be added at once
s.addAssociatedDocument(associatedBuilder);

lumongoWorkPool.store(s);

Storing Large Associated Documents (Streaming)

String uniqueId = "myid333";
String filename = "myfilename";
String indexName = "myIndexName";
		
StoreLargeAssociated storeLargeAssociated = new StoreLargeAssociated(uniqueId, indexName, filename, new File("/tmp/myFile"));
		
lumongoWorkPool.storeLargeAssociated(storeLargeAssociated);

Fetching Documents

Fetch Document

FetchDocument fetchDocument = new FetchDocument("myid222", MY_INDEX_NAME);
		
FetchResult fetchResult = lumongoWorkPool.fetch(fetchDocument);

if (fetchResult.hasResultDocument()) {
	DBObject object = fetchResult.getDocument();
	
	//Get optional Meta
	Map<String, String> meta = fetchResult1.getMeta();
}

Fetch All Associated

FetchAllAssociated fetchAssociated = new FetchAllAssociated("myid123", "myIndexName");

FetchResult fetchResult = lumongoWorkPool.fetch(fetchAssociated);

if (fetchResult.hasResultDocument()) {
	DBObject object = fetchResult.getDocument();
	
	//Get optional Meta
	Map<String, String> meta = fetchResult1.getMeta();
}

for (AssociatedResult ad : fetchResult.getAssociatedDocuments()) {
    //use correct function for document type
    String text = ad.getDocumentAsUtf8();
}

Fetch Associated

FetchAssociated fetchAssociated = new FetchAssociated("myid123", "myIndexName",  "myfile2");

FetchResult fetchResult = lumongoWorkPool.fetch(fetchAssociated);

if (fetchResult.getAssociatedDocumentCount() != 0) {
	AssociatedResult ad = fetchResult.getAssociatedDocument(0);
        //use correct function for document type
	String text = ad.getDocumentAsUtf8();
}

Fetch Large Associated (Streaming)

FetchLargeAssociated fetchLargeAssociated = new FetchLargeAssociated("myid333", "myIndexName", "myfilename", new File("/tmp/myFetchedFile"));
lumongoWorkPool.fetchLargeAssociated(fetchLargeAssociated);

Querying

Simple Query

int numberOfResults = 10;

String normalLuceneQuery = "issn:1234-1234 AND title:special";
Query query = new Query("myIndexName", normalLuceneQuery, numberOfResults);

//optionally set realtime to false for better performance under high indexing load
//this will prevent flushing segments become searching
//query.setRealTime(false);

QueryResult queryResult = lumongoWorkPool.query(query);

long totalHits = queryResult.getTotalHits();

System.out.println("Found <" + totalHits + "> hits");
for (ScoredResult sr : queryResult.getResults()) {
	System.out.println("Matching document <" + sr.getUniqueId() + "> with score <" + sr.getScore() + ">");
}

Search Multiple Indexes

int numberOfResults = 10;

String normalLuceneQuery = "issn:4321-1234 AND title:java";
Query query = new Query(Arrays.asList("myIndexName", "myIndexName2"), normalLuceneQuery, numberOfResults);

//optionally set realtime to false for better performance under high indexing load
//this will prevent flushing segments become searching
//query.setRealTime(false);

QueryResult queryResult = lumongoWorkPool.query(query);

long totalHits = queryResult.getTotalHits();

System.out.println("Found <" + totalHits + "> hits");
for (ScoredResult sr : queryResult.getResults()) {
	System.out.println("Matching document <" + sr.getUniqueId() + "> with score <" + sr.getScore() + ">");
}

Paging Query Results

Query query = new Query("myIndexName", "issn:1234-1234 AND title:special", 10);
		
QueryResult firstResult = lumongoWorkPool.query(query);
		
query.setLastResult(firstResult);
		
QueryResult secondResult = lumongoWorkPool.query(query);

Sorting

Query query = new Query("myIndexName", "title:special", 10);
query.addFieldSort("issn", Direction.ASCENDING); //Field must be KEYWORD, LC_KEYWORD, or NUMERIC
QueryResult queryResult = lumongoWorkPool.query(query);

Filter Queries (fq) and Query Fields (qf)

Query query = new Query("myIndexName", "cancer cure", numberOfResults);
query.addQueryField("abstract");
query.addQueryField("title");
query.addFilterQuery("title:special");
query.addFilterQuery("issn:1234-1234");
QueryResult queryResult = lumongoWorkPool.query(query);

Requesting Facets

// Can set number of documents to return to 0 unless you want the documents
// at the same time

Query query = new Query(Arrays.asList("myIndexName", "myIndexName2"), "title:special", 0);
int maxFacets = 30;
query.addCountRequest("issn", maxFacets);

QueryResult queryResult = lumongoWorkPool.query(query);
for (FacetCount fc : queryResult.getFacetCounts("issn")) {
	System.out.println("Facet <" + fc.getFacet() + "> with count <" + fc.getCount() + ">");
}

Drilling Down Facets

Query query = new Query("myIndexName", "title:special", 0);
query.addDrillDown("issn", "1111-1111");
QueryResult queryResult = lumongoWorkPool.query(query);
for (FacetCount fc : queryResult.getFacetCounts("issn")) {
   System.out.println("Facet <" + fc.getFacet() + "> with count <" + fc.getCount() + ">");
}

Deleting

Delete From Index

//Deletes the document from the index but not any associated documents
DeleteFromIndex deleteFromIndex = new DeleteFromIndex("myid111", "myIndexName");
lumongoWorkPool.delete(deleteFromIndex);

Delete Completely

//Deletes the result document, the index documents and all associated documents associated with an id
DeleteFull deleteFull = new DeleteFull("myid123", MY_INDEX_NAME);
lumongoWorkPool.delete(deleteFull);

Delete Single Associated

//Removes a single associated document with the unique id and filename given
DeleteAssociated deleteAssociated = new DeleteAssociated("myid123", "myIndexName", "myfile2");
lumongoWorkPool.delete(deleteAssociated);

Delete All Associated

DeleteAllAssociated deleteAllAssociated = new DeleteAllAssociated("myid123", "myIndexName");
lumongoWorkPool.delete(deleteAllAssociated);

Other Operations

Get Current Document Count for Index

GetNumberOfDocsResult result = lumongoWorkPool.getNumberOfDocs("myIndexName");
System.out.println(result.getNumberOfDocs());

Get Fields for Index

GetFieldsResult result = lumongoWorkPool.getFields(new GetFields("myIndexName"));
System.out.println(result.getFieldNames());

#Get Terms for Field

GetTermsResult getTermsResult = lumongoWorkPool.getAllTerms(new GetAllTerms("myIndexName", "title"));
for (Term term : getTermsResult.getTerms()) {
   System.out.println(term.getValue() + ": " + term.getDocFreq());
}

#Get Cluster Members

GetMembersResult getMembersResult = lumongoWorkPool.getMembers();
for (LMMember member : getMembersResult.getMembers()) {
    System.out.println(member);
}

#Async API Every Function has a Corresponding Async Version

Query query = new Query(MY_INDEX_NAME, "issn:1234-1234 AND title:special", 10);

ListenableFuture<QueryResult> resultFuture = lumongoWorkPool.queryAsync(query);

Futures.addCallback(resultFuture, new FutureCallback<QueryResult>() {

    @Override
    public void onSuccess(QueryResult explosion) {
	
    }

    @Override
    public void onFailure(Throwable thrown) {
	
    }
});

Object Persistence / Mapping

Annotated Object Example

@Settings(
    indexName = "wikipedia",
    numberOfSegments = 16,
    segmentCommitInterval = 6000
    )
public class Article {

    public Article() {

    }

    @UniqueId
    private String id;

    @Indexed(analyzerName = DefaultAnalyzers.STANDARD)
    private String title;

    @Indexed
    private Integer namespace;

    @DefaultSearch
    @Indexed(analyzerName = DefaultAnalyzers.STANDARD)
    private String text;

    private Long revision;

    @Indexed
    private Integer userId;

    @Indexed(analyzerName = DefaultAnalyzers.STANDARD)
    private String user;

    @Indexed
    private Date revisionDate;

    //Getters and Setters
    //....
}

Creating Index for Annotated Class Example

Mapper<Article> mapper = new Mapper< Article >(Article.class);
lumongoWorkPool.createOrUpdateIndex(mapper.createOrUpdateIndex());

Storing an Object

Article article = new Article();
...
Store store = mapper.createStore(article);
lumongoWorkPool.store(store);   

Querying and Fetching

Query query = new Query("wikipedia", "title:a*", 10);
QueryResult queryResult = lumongoWorkPool.query(query);

BatchFetch batchFetch = new BatchFetch().addFetchDocumentsFromResults(queryResult);
BatchFetchResult bfr = lumongoWorkPool.batchFetch(batchFetch);

List<Article> articles = mapper.fromBatchFetchResult(bfr);

Client Side Document Caching

int maxSize = 2000;
DocumentCache documentCache = new DocumentCache(lumongoWorkPool, maxSize);

Query query = new Query("wikipedia", "title:a*", 10);

//each query result from the query has a timestamp
QueryResult queryResult = lumongoWorkPool.query(query);

//the cache compares the timestamp of the query result with the one in the cache
BatchFetchResult batchFetchResult = documentCache.fetch(queryResult);

List< Article > articles = mapper.fromBatchFetchResult(batchFetchResult);