MongoDB Connector for Hadoop

程序员文章站 2022-06-09 09:40:13

...

by Mike O’Brien, MongoDB Kernel Tools Lead and maintainer of Mongo-Hadoop, the Hadoop Adapter for MongoDB Hadoop is a powerful, JVM-based platform for running Map/Reduce jobs on clusters of many machines, and it excels at doing analytics

by Mike O’Brien, MongoDB Kernel Tools Lead and maintainer of Mongo-Hadoop, the Hadoop Adapter for MongoDB

Hadoop is a powerful, JVM-based platform for running Map/Reduce jobs on clusters of many machines, and it excels at doing analytics and processing tasks on very large data sets.

Since MongoDB excels at storing large operational data sets for applications, it makes sense to explore using these together - MongoDB for storage and querying, and Hadoop for batch processing.

The MongoDB Connector for Hadoop

We recently released the 1.1 release of the MongoDB Connector for Hadoop. The MongoDB Connector for Hadoop makes it easy to use Mongo databases, or MongoDB backup files in .bson format, as the input source or output destination for Hadoop Map/Reduce jobs. By inspecting the data and computing input splits, Hadoop can process the data in parallel so that very large datasets can be processed quickly.

The MongoDB Connector for Hadoop also includes support for Pig and Hive, which allow very sophisticated MapReduce workflows to be executed just by writing very simple scripts.

Pig is a high-level scripting language for data analysis and building map/reduce workflows
Hive is a SQL-like language for ad-hoc queries and analysis of data sets on Hadoop-compatible file systems.

Hadoop streaming is also supported, so map/reduce functions can be written in any language besides Java. Right now the MongoDB Connector for Hadoop supports streaming in Ruby, Node.js and Python.

How it Works

How the Hadoop connector works

The adapter examines the MongoDB Collection and calculates a set of splits from the data
Each of the splits gets assigned to a node in Hadoop cluster
In parallel, Hadoop nodes pull data for their splits from MongoDB (or BSON) and process them locally
Hadoop merges results and streams output back to MongoDB or BSON

I’ll be giving an hour-long webinar on What’s New with the Mongo-Hadoop integration. The webinar will cover

Using Java MapReduce with the MongoDB Connector for Hadoop
Using Hadoop Streaming for other non-JVM languages
Writing Pig Scripts with the MongoDB Connector for Hadoop
MongoDB and Hadoop usage with Elastic MapReduce to easily kick off your Hadoop jobs
Overview of MongoUpdateWriteable: Using the result output from Hadoop to modify an existing output collection

The webinar will be offered twice on August 8:

8 am PDT / 11 am EDT / 3pm UTC
11am PDT / 2pm EDT / 6pm UTC

Update: Watch the webinar recording

原文地址：MongoDB Connector for Hadoop, 感谢原作者分享。

上一篇： thinkphp怎么读取config.php配置文件？

下一篇：初学者问一个CI框架的有关问题

MongoDB Connector for Hadoop

The MongoDB Connector for Hadoop

How it Works

php联接mongoDB

nodejs实现连接mongodb数据库的方法示例

MongoDB下，启动服务时，出现“服务没有响应控制功能”解决方法

顶峰7月线上技术分享-Hadoop、MySQL

Nodejs使用Mongodb存储与提供后端CRD服务详解

hadoop 2.6.0 伪分布式部署安装的实例教程

Hadoop编译

SpringBoot轻松整合MongoDB的全过程记录

【MongoDB】windows平台搭建Mongo数据库复制集（类似集群）（二

ZendFramework2 与MongoDB的整合