Giggle 2 Search Caching

Finally, time for my third article about caching in my ‘Giggle 2’ project. The last part was about Fragment Caching, while this part is about my changes for Search Caching.To recap: the search page in the application presents some problems that are not entirely addressed by the ‘standard’ caching behaviour in Rails. Users can enter whatever search text they want, so there could be an unlimited number of different pages generated. Also there is no easy way of telling which search results may be invalidated by any given change in the data.

My idea to deal with the latter problem was to have ‘generations’ of cached results. We store a number which is the current generation, and it is used as an extra key element in the cache fragment names. When a new object is created and the search pages are invalidated, we just increment the generation number.

When it came to implementing this idea, I realised that I could do it on top of the basic fragment caching system; all I needed to override1 was the fragment_cache_key method.

def fragment_cache_key_with_generation(name)
if(name.is_a? Hash)
"search_cache/#{current_generation}/#{fragment_cache_key_without_generation(name)}"
else
name
end
end

So all this does is ensure the fragment key contains the current generation. But where does that come from? I just use an extra fragment with a fixed name to store it. Actually the extra fragment stores YAML so it can include the generation number and an optional ‘expires’ time.

def current_generation
if gfrag = read_fragment("search_cache/generation")
generation = YAML.load(gfrag)
if generation[:expires] && generation[:expires] < Time.now
new_generation(generation[:current] + 1)
else
generation[:current]
end
else
new_generation(0)
end
end

As a little optimization, I assume that several updates are likely to occur at once, so I don’t expire the cache immediately; it is allowed to ‘live’ for a few more minutes. This ties in with the cache expiry method, which just sets the expiry time if there isn’t one already:

def expire_generation
if gfrag = read_fragment("search_cache/generation")
generation = YAML.load(gfrag)
unless generation[:expires]
generation[:expires] = Time.now + 5.minutes
write_fragment("search_cache/generation", YAML.dump(generation))
end
end
end


That dealt with writing the cache fragments, but I haven’t done anything about deleting them yet. One thing I wanted to avoid was the occasional HTTP request taking longer because it had to wait while the cache was cleaned up. I don’t have to worry about invalid files being used, the ‘generation’ mechanism ensures that I won’t read them. So deleting cache files is done by an entirely separate script that is run from cron.

#!/usr/bin/ruby
require 'yaml'
require 'fileutils'
 
#configure here:
search_cache = File.join(File.dirname(__FILE__), "../tmp/cache/search_cache")
file_count = 500
 
# 1. delete old generations
begin
generation = YAML.load(File.open(search_cache + "/generation.cache", 'rb') { |f| f.read })
rescue
generation = {:current => 0}
end
 
gi = generation[:current].to_i
if gi > 0
(gi - 1).downto(0) do |g|
file = File.join(search_cache, g.to_s)
FileUtils.rm_r(file) if File.exist?(file)
end
end
 
# 2. loop on file count - delete half each time
Dir.chdir search_cache do
count = 0
while (files = Dir.glob("#{gi}/**/*.cache")).size > file_count
count += File.delete(*files.sort do |a, b|
File.atime(a) <=> File.atime(b)
end[0..(files.size / 2)])
end
puts "deleted #{count} search cache files" if count > 0
end

This script does not use the Rails environment – which helps it start up reasonably quickly, but means it has to ‘know’ about the file layout. It has two steps. First it reads the current generation, and deletes the directories of any older generations. Next it checks if the number of cached files is more than a desired limit2, and deletes some of them until it gets below the limit. It uses the atime attribute to sort files3, so it always deletes the least recently used ones first. This should mean that popular search results will be kept for longer, while unusual ones will end up being deleted.


fn1. In the strict sense, it’s using ‘method chaining’, which is a peculiarly Rubyish idiom, rather than overriding. But it’s a pretty similar concept.

2 I originally wanted to have a limit based on disk space taken up, but number of files is much easier to use, and will work fine.

3 As discussed previously

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*