Skip to content

Commit

Permalink
Get rid of association code. Focus on idempotency. Log runs.
Browse files Browse the repository at this point in the history
  • Loading branch information
seamusabshere committed Feb 25, 2010
1 parent 23ee99b commit 71647ef
Show file tree
Hide file tree
Showing 21 changed files with 1,159 additions and 662 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ coverage
rdoc
pkg
test/test.sqlite3
data_miner.log
5 changes: 5 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
0.2.6
* Upgrade to remote_table 0.1.6 to handle UTF-8 CSVs and long urls.
0.3.0
* Removed association code... now data_miner focuses on just importing.
* New, simpler DSL
* Upgrade to remote_table 0.2.1 for row_hashes and better blank row handling
* Remove all association-related code
26 changes: 11 additions & 15 deletions README.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ Put this in <tt>config/environment.rb</tt>:

config.gem 'data_miner'

You need to define <tt>mine_data</tt> blocks in your ActiveRecord models. For example, in <tt>app/models/country.rb</tt>:
You need to define <tt>data_miner</tt> blocks in your ActiveRecord models. For example, in <tt>app/models/country.rb</tt>:

class Country < ActiveRecord::Base
mine_data do |step|
data_miner do |step|
# import country names and country codes
step.import :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do |attr|
attr.key :iso_3166, :name_in_source => 'country code'
attr.store :iso_3166, :name_in_source => 'country code'
attr.store :name, :name_in_source => 'country'
attr.key :iso_3166, :field_name => 'country code'
attr.store :iso_3166, :field_name => 'country code'
attr.store :name, :field_name => 'country'
end
end
end
Expand All @@ -26,7 +26,7 @@ You need to define <tt>mine_data</tt> blocks in your ActiveRecord models. For ex
class Airport < ActiveRecord::Base
belongs_to :country

mine_data do |step|
data_miner do |step|
# import airport iata_code, name, etc.
step.import(:url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false) do |attr|
attr.key :iata_code, :field_number => 3
Expand All @@ -43,12 +43,8 @@ You need to define <tt>mine_data</tt> blocks in your ActiveRecord models. For ex
Put this in <tt>lib/tasks/data_miner_tasks.rake</tt>: (unfortunately I don't know a way to automatically include gem tasks, so you have to do this manually for now)

namespace :data_miner do
task :mine => :environment do
DataMiner.mine :class_names => ENV['CLASSES'].to_s.split(/\s*,\s*/).flatten.compact
end

task :map_to_attrs => :environment do
DataMiner.map_to_attrs ENV['METHOD'], :class_names => ENV['CLASSES'].to_s.split(/\s*,\s*/).flatten.compact
task :run => :environment do
DataMiner.run :class_names => ENV['CLASSES'].to_s.split(/\s*,\s*/).flatten.compact
end
end

Expand All @@ -60,9 +56,9 @@ You need to specify what order to mine data. For example, in <tt>config/initiali
# etc
end

Once you have (1) set up the order of data mining and (2) defined <tt>mine_data</tt> blocks in your classes, you can:
Once you have (1) set up the order of data mining and (2) defined <tt>data_miner</tt> blocks in your classes, you can:

$ rake data_miner:mine
$ rake data_miner:run

==Complete example

Expand All @@ -75,7 +71,7 @@ Once you have (1) set up the order of data mining and (2) defined <tt>mine_data<
[...edit per quick start...]
~/testapp $ touch config/initializers/data_miner_config.rake
[...edit per quick start...]
~/testapp $ rake data_miner:mine
~/testapp $ rake data_miner:run

Now you should have

Expand Down
9 changes: 7 additions & 2 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,13 @@ begin
gem.email = "seamus@abshere.net"
gem.homepage = "http://github.com/seamusabshere/data_miner"
gem.authors = ["Seamus Abshere", "Andy Rossmeissl"]
%w{ activerecord activesupport andand errata conversions }.each { |name| gem.add_dependency name }
gem.add_dependency 'remote_table', '0.1.6'
gem.add_dependency 'remote_table', '~>0.2.1'
gem.add_dependency 'activerecord', '~>2.3.4'
gem.add_dependency 'activesupport', '~>2.3.4'
gem.add_dependency 'andand', '~>1.3.1'
gem.add_dependency 'errata', '~>0.1.4'
gem.add_dependency 'conversions', '~>1.4.3'
gem.add_dependency 'blockenspiel', '~>0.3.2'
gem.require_path = "lib"
gem.files.include %w(lib/data_miner) unless gem.files.empty? # seems to fail once it's in the wild
gem.rdoc_options << '--line-numbers' << '--inline-source'
Expand Down
77 changes: 50 additions & 27 deletions lib/data_miner.rb
Original file line number Diff line number Diff line change
@@ -1,43 +1,66 @@
require 'rubygems'
require 'activesupport'
require 'activerecord'
require 'active_support'
require 'active_record'
require 'blockenspiel'
require 'conversions'
require 'remote_table'
require 'errata'
require 'andand'
require 'log4r'

require 'data_miner/active_record_ext'
require 'data_miner/attribute'
require 'data_miner/attribute_collection'
require 'data_miner/configuration'
require 'data_miner/dictionary'
require 'data_miner/step'
require 'data_miner/step/associate'
require 'data_miner/step/await'
require 'data_miner/step/callback'
require 'data_miner/step/derive'
require 'data_miner/step/import'
require 'data_miner/william_james_cartesian_product' # TODO: move to gem
require 'data_miner/import'
require 'data_miner/process'
require 'data_miner/target'
require 'data_miner/run'

# TODO: move to gem
require 'data_miner/william_james_cartesian_product'

module DataMiner
class << self
def mine(options = {})
DataMiner::Configuration.mine options
end

def map_to_attrs(method, options = {})
puts DataMiner::Configuration.map_to_attrs(method, options)
end
class MissingHashColumn < RuntimeError; end

include Log4r

def enqueue(&block)
DataMiner::Configuration.enqueue &block
end

def classes
DataMiner::Configuration.classes
mattr_accessor :logger

def self.start_logging
if defined?(Rails)
self.logger = Rails.logger
else
self.logger = Logger.new 'data_miner'
logger.outputters = FileOutputter.new 'f1', :filename => 'data_miner.log'
end
end

def self.run(options = {})
DataMiner::Configuration.run options
end

def self.enqueue(&block)
DataMiner::Configuration.enqueue &block
end

def self.classes
DataMiner::Configuration.classes
end

def self.create_tables
DataMiner::Configuration.create_tables
end
end

ActiveRecord::Base.class_eval do
include DataMiner::ActiveRecordExt
def self.data_miner(&block)
# this is class_eval'ed here so that each ActiveRecord descendant has its own copy, or none at all
class_eval { cattr_accessor :data_miner_config }
self.data_miner_config = DataMiner::Configuration.new self

data_miner_config.before_invoke
Blockenspiel.invoke block, data_miner_config
data_miner_config.after_invoke
end
end

DataMiner.start_logging
25 changes: 0 additions & 25 deletions lib/data_miner/active_record_ext.rb

This file was deleted.

Loading

0 comments on commit 71647ef

Please sign in to comment.