Big Data “Meets-up” with Scalability: Accumulo, Hadoop and Pig


Last night WhitePages hosted the Scalability and Distributed Systems meetup at our office downtown. We were treated to two presentations from different perspectives of Big Data – an implementation and a problem solved.

Paul Brown, Koverse CEO opened up with his presentation on Accumulo, a distributed key/value store, based on BigTable. Paul discussed details about Accumulo’s operation and described some of the unique features of Accumulo, particularly its server-side mechanism to modify data, and its cell-based access control. Paul also covered some of the things he thinks are key enhancements for Big Data going forward to support research applications, including data attribution. Data Attribution allows researchers to understand the source of the data utilized to provide an answer to a query.

Scott Sikora, WhitePages CTO and Robert Noble, Director of Software Engineering presented one of WhitePages’ interesting data processing problems: how to clean, classify, and present business data, including static, unique business IDs, re-building the dataset from current data every day. The WhitePages process involves taking multiple data sources, merging business listings from a variety of resources into the correct set of logical entities, then presenting those entities to WhitePages users on our WhitePages consumer web site, and via WhitePages PRO APIs and lookup tools.

Many thanks to the attendees – without their engagement and interactivity, the evening would have been far less interesting, and to eSage for hosting after-meetup beers and snacks at Rock Bottom.

Find Matt on Google+

by Whitepages

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>