Blogs

Search for Big Data

Written by Adoddle | 15-Jul-2016 14:00:00

Adoddle has enterprise search built-in; enabling organizations to turn “big data” - petabytes of information stored across documents, project forms, models, ecommerce messages - into useful results, without the need to shred the data. Adoddle indexes data in real time and makes it immediately searchable.

With the arrival of technologies like Hadoop, Spark and massive indexed data, the relevance of Big Data search is more certain. Before Big Data, traditional business intelligence efforts were focussed around adding Web search-like query interfaces. Scalability and Performance outputs of these efforts were mixed at best. Big Data search is now a necessity rather than a premium feature. Adoddle Search is a distributed indexing and search engine that combines fault tolerance and high availability in the cloud. Adoddle Search supports the following features:

  • Central configuration for the entire cluster
  • Automatic load balancing and fail-over for queries
  • ZooKeeper integration for cluster coordination and configuration

Adoddle uses latest open source big data cloud frameworks with Lucene as it’s core search engine library. Lucene is an Apache open source index library written in Java. Lucene was originally written by Doug Cutting, creator of Hadoop, in 1999. Lucene is based on the inverted index and provides many different features to enable advanced indexing and querying on many different data types.

The fundamental concepts in Lucene are index, document, field and term. An index contains a sequence of documents.

  • A document is a sequence of fields.
  • A term is a string.
  • A field is a named sequence of terms.

The same string in two different fields is considered a different term. Thus terms are represented as a pair of strings, the first naming the field, and the second naming text within the field. The index stores statistics about terms in order to make term-based search more efficient. Lucene's index falls into the family of indexes known as an inverted index. This is because it can list, for a term, the documents that contain it. This is the inverse of the natural relationship, in which documents list terms.

Adoddle users upload 296 megabytes of new data every minute of the day into Adoddle. The amount and rate of data generated every day is increasing exponentially. Adoddle receives 6000 insert / update requests per minute into it’s Global Index. Data is the new Oil. Data is just like crude. It’s valuable, but if unrefined it cannot really be used. Adoddle enables everyone across the enterprise to make big data driven decisions. ITs Adoddle.