As part of the MIT CSAIL bigdata lecture series, Dr. Calvin Andrus who works in Office of the Chief Information Officer at the Central Intelligence Agency, gave a talk on the challenges associated with deploying enterprise-scale analytic engines on top of cloud-based bigdata holdings in a classified environment. He also gave his thoughts on the kind of bigdata analytic environment the government is thinking about.

He started the discussion with the problems his IT department faced when they decided to migrate to a cloud based infrastructure. The goal was to tackle the velocity, volume, and variety problem by separating the storage, compute (hadoop), and user operations(analytics). Some of the major concerns were security and isolation — concerns such as how can one department keep their data totally isolated and secure from other departments.

Other bigdata problem he emphasized on is the lack of any open-source/free tools that can be used for applying sophisticated models on very large data. They often need to generate a huge amount of fake data to understand incidents such as nuclear explosion. Since these data are engineered data, a more sophisticated model is needed and most of the freely available tools break when sophisticated models are applied on bigdata.

Some of the tools which are in particular interesting to them are — bigdata tools for making recommendations (recommender system),  semantic reasoning system (reasoning on unstructured text by extracting entities and the relationships that exist among them), complex event processing (processing events based on data gathered from around the world at a very high rate), visualization systems, and machine learning tools.

Some of the tools that even exist do work only in isolation. The input of one tool does not necessarily always in the format required by the output of other tools, and thus using multiple tools together becomes a nightmare. He described the integration problem as the biggest problem, his office is currently dealing with.

Giving Facebook as an example, how Facebook Connect allows any website to connect to Facebook and allows users to access all Facebook services (like, recommend, share, etc..) with the website itself. He concluded with emphasizing that these kind of connect mechanism (sharing of data across applications) is what current tools need.