One of the most common features to be added to Rails apps is the ability to handle file uploads. A lot of developers are also storing these uploads on Amazon’s S3 service due to its low maintenance, cost and ease of mind. Plugins like Attachment Fu and Paperclip make this dead simple.
But there is a problem with this. When a user uploads a file, the attachment is sent along with the request, the server receives the request and does whatever it needs to do, and then the attachment is uploaded from the server to S3. All the while the user is still waiting on a response and your mongrel or passenger instance is blocked from taking any other requests. This isn’t a very good user experience especially if your handling large uploads, and even worse, if S3 fails to respond or drops the connection the user will have to re-upload their file regardless of whether or not it is already on your server.

Queue it up
The solution to this is pretty straight forward. Move uploads to S3 to a background process. Now your request life cycle will look like this: the attachment is sent, the server receives the request and does the minimum need to generate a response, the user is sent a response. You cut out a very significant portion out of the request life cycle and your users are happier for it. Later, the server can upload the file to S3 on its own time. When dealing with large files this can be the difference between server timeout errors and satisfied customers.

Could get messy
That sounds great, but how is the code going to look? How is Paperclip or Attachment fu going to handle saving to the file system but also on S3 when the time comes? Is the code responsible for migrating the file to S3 going to be in my model? That doesn’t sound too clean. What about in a separate background worker file or a shell script? That seems like an after thought and difficult to keep track of. As a developer this messy solution almost seems worse then the problem its trying to solve.
When I first tackled this problem I hated the result. Attachment fu was set up to save files to the file system. Using a callback it would queue up a background task to upload the file to S3. The background task would upload the file to S3 using the aws-s3 gem and mark the record as being on S3. I had to add special methods to get the url of the file since I couldn’t use attachment_fu’s built in ones as it thought the file was on the filesystem. Deleting files from S3 was even a bigger pain. This Sucked!
My Solution
Recently, we were doing a rewrite of that part of the code and I decided to come up with a better way. Paperclip and Attachment Fu may not give you a clear way of dealing with a file that needs to be saved to both the filesystem and S3 but you can still take advantage of them. What if you instead of having a model that has to handle both cases you have 2 models that handle each case?
Basically, you need to have your main model that will have an attached file. For this example we’ll have an Upload model. Set it up just like you would if your were doing direct uploads to S3.
class Upload < ActiveRecord::Base has_attached_file :file, :storage => :s3, :s3_credentials => { :access_key_id => App.s3[:access_key_id], :secret_access_key => App.s3[:secret_access_key] }, :bucket => App.s3[:bucket] end
Then, create another model that inherits from the Upload model. We’ll call it TempUpload. This will be set up to save to the filesystem.
class TempUpload < Upload has_attached_file :file, :path => ":rails_root/tmp/uploads/:id/:basename.:extension" # Save it to a temporay location end
Now we have 2 models that use the same DB table. One knows how to work with the filesystem and the other knows how to work with S3. We want files to be initially saved to the filesystem, so whenever we create a new upload we’ll create it using the TempUpload class.
class UploadsController < ApplicationController def create @upload = TempUpload.new(params[:upload]) if @upload.save redirect_to upload_path(@upload) else render :new, :status => :bad_request end end end
Now all we need is a way to turn temp uploads into regular uploads that are stored on S3. Paperclip callbacks and Delayed::Job to the rescue.
class TempUpload < Upload has_attached_file :file, :path => ":rails_root/tmp/uploads/:basename.:extension" # Save it to a temporay location # Save this as a regular upload on s3 # this is ran after the file is saved # and all post process stuff like # resizing is done after_post_process :queue_move_to_s3 def queue_move_to_s3 # Object method added by Delayed::Job # works like send() send_later(:move_to_s3) end def move_to_s3 temp_path = file.path temp_file = file.to_file # Paperclips way of getting a File object for the attachment # Save it as a regular attachment # this will save to S3 s3_upload = Upload.find(id) # Same db record but we need the S3 version s3_upload.file = temp_file # reset the file - it will assume its a new file s3_upload.save! # Paperclip will upload the file on save # Delete the temporary file when we are done temp_file.close File.delete(temp_path) end end
What we’ve done is defined a callback ‘after_post_process’ that will call ‘queue_move_to_s3’ after the attachment has been saved and possibly cropped or resized. That method calls ‘send_later(:move_to_s3)’. send_later is a method added by Delayed::Job that works just like send except it does it asynchronously. So later, whenever our worker picks up the job, it will call ‘move_to_s3’ which takes care of moving the file to S3 by getting the Upload version of itself and setting its attachment to the same file. When the Upload is saved it will think it has a different file and Paperclip will take over sending it to S3. It then deletes the temporary attachment and we never have to work with the TempUpload version of that record again. Clean and Simple.
Gotchas
You may want to add an extra column to your uploads table to track whether the file has been moved to S3 or not. This is so you can handle cases where there is a long delay between the TempUpload being saved and it being move to S3. In that case you may even want a proxy class that returns an instance of a subclass to handle the filesystem or s3. In our case we didn’t need to provide links to uploads right away and wouldn’t have any case where users may be deleting uploads that could lie on the filesystem or S3 so this turned out to be enough.
To get this to work you’ll also need to know a bit more about Paperclip and Delayed::Job then I went over here. You could also use this general idea with other file upload plugins and background processing libraries.
Jul 10th, 2009
Another option is to post uploads into s3 directly, so you avoid the double upload entirely…
Aug 8th, 2009
I read about a couple different ways to achieve async uploads and this is the most clever idea.
On my app, I just added 2 attachments to a given model with one that is local while the other is S3. It seems to work great. Thanks.
Aug 10th, 2009
Brilliant little work around! I’d highly suggest you contribute your changes back into Paperclip plugin on github. Would be awesome if we could get this functionality built in
Aug 13th, 2009
This a good idea! I was inspired and ended up doing something similar. Thanks!
Aug 19th, 2009
I’m really glad you guys are finding it useful.
@jc I had considered writing it as a post-process plug in for Paperclip but haven’t quite worked out how it would be implemented
Sep 10th, 2009
This looks like a great idea!
Thanks
Ethan
Dec 17th, 2009
kain, if you change it to after_save :queue_move_to_s3 it works
the issue seems to be that after_post_process dose not set self correctly and the call to move_to_s3 is not run as a instance method
trouble is now a new error happens telling me that “AWS access keys are required to operate on S3”
although if i run the method from the console its fine?!?!?
any help anyone thanks
Dec 1st, 2009
Hey there,
I’m getting the same error as kain. Has anyone else had this problem, and if so, how did you fix it?
Any info is welcome.
Great article btw
Chris
Nov 27th, 2009
hi,
great doc, but I’m having issues with this solution.
class TempTrack < Track
…
after_post_process :queue_move_to_s3
def queue_move_to_s3send_later(:move_to_s3)
end
for some reason delayed_job doesn’t like that:
[JOB] Unknown#move_to_s3 failed with NoMethodError: undefined method `move_to_s3’ for AR:TempTrack: – 2 failed attempts
any ideas?