September 7, 2013 by by name: "Drew Kerrigan"

Wait, what is Yokozuna?

I’m glad you asked, it’s distributed Solr integrated directly with Riak. I’m oversimplifying a bit, but essentially each Riak node in your cluster will have it’s own instance of Solr running. This allows you to have the complex query and index framework that Solr provides as well as the high availability and scalability from Riak, not to mention some awesome data repair facilities and active anti entropy (read: automatic index repair). Yokozuna is currently in alpha, however it will be included in mainline Riak around version 2.0. Ryan Zezeski is the Basho Engineer that implemented Yokozuna, so thank him for the awesome work he did on it.

Following are the steps I took to make a simple Sinatra app using the aforementioned technologies.

First things first, get Riak running

Yokozuna is now in the main Riak develop branch, but at the time that I wrote this demo application, it was still in a separate yokozuna repository, so I’ll be using the instructions found in the old repo. Installation is very similar to vanilla Riak, but you’ll need to compile from source using the 0.8 Yokozuna release. More installation details can be found here, but here is the gist:

Download

wget http://data.riakcs.net:8080/yokozuna/riak-yokozuna-0.8.0-src.tar.gz

Unpack

tar zxvf riak-yokozuna-0.8.0-src.tar.gz

Compile

cd riak-yokozuna-0.8.0-src
make

Create a single Riak node

make rel

Configure

sed -e '/{yokozuna,/,/]}/{s/{enabled, false}/{enabled, true}/;}' -i.back rel/riak/etc/app.config

Start Riak

cd rel/riak/bin
./riak start

Basic Sinatra setup

Since this is a single purpose lightweight sinatra app, very few dependencies are needed. You’ll notice in the github repository for my demo app that I have a views directory, but I’m omitting that from this guide for simplicity.

Create Gemfile

source 'https://rubygems.org'
gem 'json', '~> 1.7.7'
gem 'sinatra'
gem 'riak-client', :git => 'git://github.com/basho/riak-ruby-client.git', :branch => 'bk-yokozuna'

Notice the bk-yokozuna branch of riak-ruby-client - the code used from that branch will be available in the main client library around the time that Riak 2.0 lands. If you don’t feel comfortable using a branch that is still in active development, it wouldn’t be too hard to just write a small client on your own that utilizes the HTTP endpoints for Riak / Yokozuna.

Install dependencies

bundle install

Create an index and seed the data

I’m using a small json file for my seed data with the following fields for each entry:

{
    "name_t": "Drew Kerrigan",
    "title_t": "Consulting Engineer",
    "created_dt": "2015-12-13T23:59:59Z"
}

name_t is a simple multi word text field. The _t bit corresponds to the DynamicField full text field *_t that can be found in the default Solr schema.xml since we did not specify a schema of our own for this application. Similarly title_t is a full text field that will be indexed as such. created_dt is a Solr DateTime field as you might expect, it can be easily used in range queries later on.

The full json fixture that I used can be found here. The dates are obviously bogus just for the purpose of easier range query demonstration later on.

Now that we have our data, we need to create an index, associate our user bucket with that index, and load the data. Following are the relevant bits from my setup_search.rb script which does all of that:

Make a PUT request to the index endpoint

req = Net::HTTP::Put.new("/yz/index/user", initheader = { 'Content-Type' => 'application/json'})
Net::HTTP.new(host, port).start {|http| http.request(req) }
sleep(15)

Associate your user bucket with the user index

bucket = Riak::Bucket.new(client, "user")
bucket.props = {'yz_index' => "user"}
sleep(15)

Read the json fixture data and load it into Riak normally

users = JSON.parse( IO.read("user_fixtures.json") )

users.each do |user|
  object = bucket.new()
  object.raw_data = user.to_json
  object.content_type = 'application/json'
  object.store
end

full setup_search.rb source

Some of you out there (looking at you clr) might be dismayed at my use of sleep in that script, but there is a good reason… I promise. This script only needs to be run once, and if you attempt to immediately use the index you created, Yokozuna will complain in rel/riak/log/solr.log.

It’s safe to run the script now since Riak is already running

ruby setup_search.rb

Create a Sinatra server with a simple search endpoint

My goal is to show a few commonly used query needs, namely single term search, pagination, and range queries. Using some simple parameter existence logic, we can accomplish all of these goals in a single query endpoint:

We want an endpoint that can respond to /user/query/name_t/Drew or /user/query/title_t/Engineer

get '/user/query/:term/:value' do

We also want to handle pagination with a query string like ?rows=10&start=0

  results = []
  query = "#{params[:term]}:#{params[:value]}"
  rows = (params[:rows])? params[:rows].to_i : 10
  start = (params[:start])? params[:start].to_i : 0

Lastly we want support for range queries like ?from=2015-12-13T23:59:59Z&to=2018-12-13T23:59:59Z

  if(params[:from] && params[:to])
    query = "((#{query}) AND (created_dt:[#{params[:from]} TO #{params[:to]}]))"
  end

Now that the parameters are sorted out, we can perform the query using a default field of name_t. After we get a list of matching Solr documents from the query, we want the end user to have the actual Riak objects, so we’ll grab each of the resulting objects from Riak using the _yz_rk (riak key) field from each document.

  begin
    resp = client.search("user", query, {
      :rows => rows, 
      :start => start, 
      :df => "name_t"
    })

    resp["docs"].each do |doc|
      object = client.bucket("user").get(doc["_yz_rk"])
      results << JSON.parse(object.raw_data)
    end
  rescue
    results = {:error => "There was a problem with the query, or there were no results"}
  end

The only thing left to do is return the json results to the user

  results.to_json
end

full server.rb source

Start the server

ruby server.rb

Did it work?

Test it out with a few curl calls, or with your browser

Simple term query

curl 'http://localhost:4567/user/query/name_t/Drew+Kerrigan'

Should return

[{"name_t":"Drew Kerrigan","title_t":"Consulting Engineer","created_dt":"2015-12-13T23:59:59Z"}]

Pagination query

curl 'http://localhost:4567/user/query/title_t/*Engineer*?rows=10&start=0'

Range query

curl 'http://localhost:4567/user/query/*/*?from=1994-01-01T01:01:01Z&to=2018-12-13T23:59:59Z'

Great!

That’s all there is to it. Everything you’ve seen here will come standard issue with Riak 2.0 in the coming months. Hopefully these tools have made it a little easier to think through and implement more complex data models than you might have thought were possible with a key value store.

A note to the wise, or maybe the unwise

None of this code should be considered production ready. Aside from the fact that I’m using branches and repositories that are obviously still in alpha, and the fact that I included no “productionizing” that normally is required with Ruby/Sinatra (Rack, Unicorn, Nginx, Haproxy for more than one Riak node, etc), I also didn’t do any input sanitization. Use with caution.