The hadoofus
project is an HDFS (Hadoop Distributed File System) client
library. Like Snakebite, it does not
require Java. Unlike Snakebite, hadoofus is implemented in C. It supports RPC
pipelining and out-of-order execution.
It provides a C API for directly calling Namenode RPCs and performing Datanode block read and write operations.
Unlike libhdfs, Hadoofus speaks multiple versions of the HDFS protocol. At your
option, you may speak with Hadoop 0.20.203 through 1.x.y (HDFS_NN_v1
/
HDFS_DATANODE_AP_1_0
), Hadoop 2.0.x (HDFS_NN_v2
/ HDFS_DATANODE_AP_2_0
),
or Hadoop 2.2.x and higher (HDFS_NN_v2_2
/ HDFS_DATANODE_AP_2_0
).
#include <hadoofus/highlevel.h>
int64_t res;
const char *err = NULL;
struct hdfs_namenode *h;
struct hdfs_object *exception = NULL;
h = hdfs_namenode_new("host.bar.com", "8020", "hdfs", &err);
if (!h)
... ;
res = hdfs_getProtocolVersion(h, HADOOFUS_CLIENT_PROTOCOL_STR, 61L, &exception);
if (exception) {
// fprintf(, "...%s...", hdfs_exception_get_message(exception));
hdfs_object_free(exception);
... ;
}
if (res != 61)
... ;
hdfs_namenode_delete(h);
Some less common RPCs provided by the Hadoop ClientProtocol
interface in v2.x
of the protocol are not yet implemented (see Issue #29).
Note: Hadoop has been known to change semantics slightly between different versions of the software (especially before v2 was released). The v1 protocol has no spec; we do the best we can.
Some RPCs that exist in HDFSv1 do not exist in HDFSv2+ — e.g.
getProtocolVersion
does not exist in v2.
The Datanode API is somewhat fragile.
HDFS attempts to be a restricted subset of a POSIX filesystem.
Files can only have one writer at a time and do not support random writes. They can be appended but not overwritten in place.
Generally, the Namenode acts as an RPC server for querying and manipulating file system metadata. It points clients at Datanode(s) to read/write file data.
For more information, see wikipedia's article on Hadoop or the HDFS Architecture Guide.
See INSTALLING.md.
Found a bug? Please file it on github. Thanks!
- Tom Arnfeld <tarnfeld@me.com>
- Conrad Meyer <conrad.meyer@isilon.com>
- Paul Scott <paul@duedil.com>
- Alex Smith <aes7mv AT virginia.edu>
Unless otherwise noted, files in this source distribution are released under the
terms of the MIT license. Some files used for CRC32C support come from elsewhere
have non-MIT, but similarly permissive licenses (namely the BSD 2-clause license
and Mark Adler's license, which he also uses for zlib), which are clearly
specified at the top of their files (Some files which are not compiled into
installed binaries or otherwise installed by this package's Makefiles come from
the Apache Hadoop sources and have different licenses. These licenses are
clearly specified at the beginning of the files.) For the full text of the MIT
license, see the file LICENSE
included with this source distribution.