| Path: | README |
| Last Update: | Sat Nov 03 13:43:19 -0400 2007 |
RDDB is a Ruby Document-oriented Database. It is inspired by CouchDB and the notion that you insert documents into the database and then define views for querying. Views are defined as blocks which are used to select the documents and the attributes in the documents that are relevant to your needs. Views are materialized to enhance performance.
This first example creates three documents and then queries for the names in the documents.
# First create an database object
database = Rddb::Database.new
# Put some documents into it
database << {:name => 'John', :income => 35000}
database << {:name => 'Bob', :income => 40000}
database << {:name => 'Jim', :income => 37000}
# Create a view that will return the names
database.create_view('names') do |document, args|
document.name
end
# The result of querying will return an array of names
assert_equal ['John','Bob','Jim'], database.query('names')
In addition it is possible to specify a block to reduce the result set:
database.create_view('total income') do |document, args|
document.income
end.reduce_with do |results|
results.inject { |memo,value| memo + value }
end
assert_equal 112000, database.query('total income')
Here‘s another example, this time averaging the incomes:
database.create_view('average income') do |document, args|
document.income
end.reduce_with do |results|
results.inject { |memo,value| memo + value } / results.length
end
assert_equal 37333, database.query('average income')
You can also implement grouping:
database.create_view('total income by profession') do |document, args|
[document.profession, document.income]
end.reduce_with do |results|
returning Hash.new do |reduced|
results.each do |result|
reduced[result[0]] ||= 0
reduced[result[0]] += result[1]
end
end
end
results = database.query('total income by profession')
assert_equal 77000, results['Plumber']
assert_equal 35000, results['Carpenter']
Here is an example of sorting:
database.create_view('sorted by income') do |document, args|
{
:name => document.name,
:income => document.income,
:profession => document.profession
}
end.reduce_with do |results|
results.sort { |a,b| a[:income] <=> b[:income] }
end
assert_equal [
{:income=>35000, :profession=>"Carpenter", :name=>"John"},
{:income=>37000, :profession=>"Plumber", :name=>"Jim"},
{:income=>40000, :profession=>"Plumber", :name=>"Bob"}
], database.query('sorted by income')
Finding distinct values:
database.create_view('distinct professions') do |document, args|
document.profession
end.reduce_with do |results|
returning Array.new do |reduced|
results.each do |result|
reduced << result unless reduced.include?(result)
end
end
end
assert_equal ['Carpenter','Plumber'], database.query('distinct professions')
You can also indicate that a view is materialized by passing a block to materialize_if:
database.create_view('names') do |document, args|
document.name
end.materialize_if do |document|
document.attribute?(:name)
end
And this works with reduce as well:
database.create_view('names') do |document, args|
document.name
end.reduce_with do |results|
results.sort
end.materialize_if do |document|
document.attribute?(:name)
end
You can indicate that materialization is required with the following:
database.create_view('names') do |document, args|
document.name
end.reduce_with do |results|
results.sort
end.materialize_if do |document|
document.attribute?(:name)
end.require_materialization
Since materialization can be a threaded or distributed process it may be a non-blocking operation. Therefore, if you specify that materialization is required then a Rddb::ViewNotYetMaterialized error will be raised if you attempt to query the view prior to completion of materialization. Normally materialization will occur when a document is added to the database, however you can also force materialization by calling database.refresh_views at any time. Usually you‘ll want to call this method after adding a new materialized view with database.create_view.
There are three document stores provided with Rddb: RamDocumentStore, PartitionedFileDocumentStore and S3DocumentStore. By default the RamDocumentStore will be used, however either the partitioned file document store or the S3 document store should be used if you require your data to be persistent. The following example shows how to use the PartitionedFileDocumentStore
document_store = Rddb::DocumentStore::PartitionedFileDocumentStore database = Rddb::Database.new(:document_store = document_store)
The partitioned file document store accepts three options:
The S3 document store accepts the same arguments as above.
The :partition_strategy option can be used to provide a prefix for partitioning depending on what data is in the document. For example, if you have users then you might want to partition on the first three letters of their username:
ps = Proc.new { |document| document.username[0..2] }
document_store = Rddb::DocumentStore::PartitionedFileDatastore.new(
:partition_strategy => ps
)
Whatever string is returned from the :partition_strategy Proc will be used as the partition name.
The cache option can be used to implement in-memory caching of documents. You can pass any sort of Hash, including a basic Hash or a more complex cache implementation (for example an LRU cache) as long as it has the same interface as Hash.
Right now the partitioned file document store and S3 both use Marshal to dump and load data. This will most likely change in the future to something faster and more compact.
There are three materialization stores currently provided with Rddb: RamMaterializationStore, FilesystemMaterializationStore and S3MaterializationStore.
In both the FilesystemMaterializationStore and the S3MaterializationStore the view will be materialized to the storage system so that queries can be answered very quickly without recalculating the view. The following example shows the file system version:
m_store = Rddb::MaterializationStore::FilesystemMaterializationStore.new
database.create_view('names sorted by last name', :materialization_store => m_store) do |document, args|
{
:first_name => document.first_name,
:last_name => document.last_name
}
end.reduce_with do |results|
results.sort { |a,b| a[:last_name] <=> b[:last_name] }
end.materialize_if do |document|
true
end
Types matter. You should be sure that values which will be aggregated (for example, reduced with an inject) are inserted into the database using a numeric type. Calling inject on an array with 200,000 Strings is not a pleasant experience. ;-)
This project is hosted on RubyForge at rubyforge.org/projects/rddb. Instructions for accessing the Subversion repository: rubyforge.org/scm/?group_id=4688.