I've worked on a fair number of projects where I've had to import data from a CSV file. With large datasets, especially when inserting into a database with standard ActiveRecord, this can take a while. If you're not running it often its usually not worth optimising, but it does get annoying not knowing how long its going to take when you do.
With this small module you can easily add a progress bar to the built-in CSV library. It uses the progress_bar gem to do all the hard work (not to be confused with the other Ruby/ProgressBar library).
require 'csv'
require 'progress_bar'
class CSV
module ProgressBar
def progress_bar
::ProgressBar.new(@io.size, :bar, :percentage, :elapsed, :eta)
end
def each
progress_bar = self.progress_bar
super do |row|
yield row
progress_bar.count = self.pos
progress_bar.increment!(0)
end
end
end
class WithProgressBar < CSV
include ProgressBar
end
def self.with_progress_bar
WithProgressBar
end
end
It can be used in a few different ways. The least intrusive if you're integrating it into an existing code base is to extend the CSV::ProgressBar
module onto the CSV
instance.
data = File.read('data.csv')
csv = CSV.new(data)
csv.extend(CSV::ProgressBar)
csv.each do |row|
# expensive operation
end
You can also use the subclass, CSV::WithProgressBar
, which includes the ProgressBar
module for you. This syntax doesn't feel as idiomatic, but does make it possible to use the class-level convenience methods.
CSV::WithProgressBar.foreach('data.csv') do |row|
# expensive operation
end
I also added the CSV.with_progress_bar
class method. Its just a little bit of syntactic sugar, but I find it reads nicer than using the subclass.
CSV.with_progress_bar.foreach('data.csv') do |row|
# expensive operation
end
I've added the code to a Gist. Use it, fork it, modify it.