Archive for July, 2010
test
Thursday, July 22nd, 2010Ruby Script for Combinations and Permutations
Thursday, July 22nd, 2010Note that there are 27 permutations of a three-item set, but the script below returns 39 because there are 39 ways of combining some or all of the items in a three-item set (i.e. in addition to the 27 permutations of 1, 2, and 3, there are also 12 permutations of 1, 2, or 3.)
The code
Array.class_eval do
# return hash with
# keys being unique elements of self;
# values being number of occurrences of those elements
# similar to SQL select x,count(1) from y group by x
# [1,3,3,9,9,9,9].count_distinct
# => {1=>1, 3=>2, 9=>4}
def count_distinct
c={}
self.each{|x|c[x]=(c[x]||0)+1}
c
end
end
def permute(i,els,perms,options)
els.each{|e|
if (i.size+1)<=options[:max_size]
nxt=i.dup<<e
perms<<nxt if nxt.size>=options[:min_size]
permute(nxt,els,perms,options)
end
}
end
def easy_permute(elements,min=1,max=elements.size)
permutations=[]
options={:min_size=>min,:max_size=>max}
permute([],elements,permutations,options)
permutations
end
def easy_combine(elements,min=1,max=elements.size,max_repetitions=nil)
puts "min is #{min}"
permutations=easy_permute(elements,min,max)
permutations.each{|x|x.sort!}.sort!
permutations.uniq!
permutations.delete_if{|x|
x.count_distinct.values.detect{|y|
y>max_repetitions}} if max_repetitions
permutations
end
Usage and testing…
test_combine=easy_combine(['a','b','c'])
test_combine_3=easy_combine(['a','b','c'],3)
test_combine_3_no_rep=easy_combine(['a','b','c'],3,3,1)
test_permute=easy_permute(['a','b','c'])
test_permute_3=easy_permute(['a','b','c'],3)
msg= "=========================\n"
msg<<"COMBINATIONS (#{test_combine.size})\n"
test_combine.each {|x|msg<<x.join(',')<<"\n"};nil
msg<<"=========================\n"
msg<<"COMBINATIONS OF 3 (#{test_combine_3.size})\n"
test_combine_3.each {|x|msg<<x.join(',')<<"\n"};nil
msg<<"=========================\n"
msg<<"COMBINATIONS OF 3 WITH NO REPETITION (#{test_combine_3_no_rep.size})\n"
test_combine_3_no_rep.each {|x|msg<<x.join(',')<<"\n"};nil
msg<<"=========================\n"
msg<<"PERMUTATIONS (#{test_permute.size})\n"
test_permute.each {|x|msg<<x.join(',')<<"\n"};nil
msg<<"=========================\n"
msg<<"PERMUTATIONS OF 3 (#{test_permute_3.size})\n"
test_permute_3.each {|x|msg<<x.join(',')<<"\n"};nil
puts msg
========================= COMBINATIONS (19) a a,a a,a,a a,a,b a,a,c a,b a,b,b a,b,c a,c a,c,c b b,b b,b,b b,b,c b,c b,c,c c c,c c,c,c ========================= COMBINATIONS OF 3 (10) a,a,a a,a,b a,a,c a,b,b a,b,c a,c,c b,b,b b,b,c b,c,c c,c,c ========================= COMBINATIONS OF 3 WITH NO REPETITION (1) a,b,c ========================= PERMUTATIONS (39) a a,a a,a,a a,a,b a,a,c a,b a,b,a a,b,b a,b,c a,c a,c,a a,c,b a,c,c b b,a b,a,a b,a,b b,a,c b,b b,b,a b,b,b b,b,c b,c b,c,a b,c,b b,c,c c c,a c,a,a c,a,b c,a,c c,b c,b,a c,b,b c,b,c c,c c,c,a c,c,b c,c,c ========================= PERMUTATIONS OF 3 (27) a,a,a a,a,b a,a,c a,b,a a,b,b a,b,c a,c,a a,c,b a,c,c b,a,a b,a,b b,a,c b,b,a b,b,b b,b,c b,c,a b,c,b b,c,c c,a,a c,a,b c,a,c c,b,a c,b,b c,b,c c,c,a c,c,b c,c,c
Ruby Script for Combinations
Friday, July 16th, 2010Say you have the set 1,2,3,4,5,6 and you want to know all combinations, not permutations, of the set. Here is a Ruby script to do so.
elements=[1,2,3,4,5,6]
t1=Time.now
combinations=[]
elements.each_with_index do |element,i|
# the by-itself combination
combinations<<[element]
# the last item's combinations are already defined
unless i==elements.size-1
elements[(i+1)..(elements.size-1)].each do |next_element|
combinations<<(combinations.last.dup<<next_element)
end
end
end
t2=Time.now
combinations.each {|c|puts c.join(',')}
puts "it took #{(t2-t1)*1000}ms to generate this set"
and the output:
1 1,2 1,2,3 1,2,3,4 1,2,3,4,5 1,2,3,4,5,6 2 2,3 2,3,4 2,3,4,5 2,3,4,5,6 3 3,4 3,4,5 3,4,5,6 4 4,5 4,5,6 5 5,6 6 it took 8ms to generate this set
It also works with words. The real-world case for this is that I’m working on a keyword co-occurrence database. I want to know, out of a given set of survey responses, the frequency of words which appear in the same sentence. For the given sentence I love Ruby, it is such a great programming language — and powerful, too., with the common words removed…
elements=["love","Ruby","great","programming","language","powerful"]
and the output:
love love,Ruby love,Ruby,great love,Ruby,great,programming love,Ruby,great,programming,language love,Ruby,great,programming,language,powerful Ruby Ruby,great Ruby,great,programming Ruby,great,programming,language Ruby,great,programming,language,powerful great great,programming great,programming,language great,programming,language,powerful programming programming,language programming,language,powerful language language,powerful powerful it took 13ms to generate this set
This script doesn’t handle large sets!
elements=(1..500).to_a it took 30819ms to generate this set elements=(1..5000).to_a #my computer crashed from lack of memory!
But if you can reduce the depth of combinations…
There isn’t much value (at least in the context of what I am doing!) in indexing every possible combination of words in a large text. Really, I only need maybe four-word combinations. This script will work with a large text because the result set is relatively small because it limits the combination depth to three elements.
max=3
elements=[1,2,3,4,5,6]
t1=Time.now
combinations=[]
elements.each_with_index do |element,i|
# the by-itself combination
combinations<<[element]
# the last item's combinations are already defined
unless i==elements.size-1
elements[(i+1)..(elements.size-1)].each do |next_element|
c=combinations.last.dup<<next_element
c.delete_at(1) until c.size<=max
combinations<<c
end
end
end
t2=Time.now
combinations.each {|c|puts c.join(',')}
puts "it took #{t2-t1}ms to generate this set"
1 1,2 1,2,3 1,3,4 1,4,5 1,5,6 2 2,3 2,3,4 2,4,5 2,5,6 3 3,4 3,4,5 3,5,6 4 4,5 4,5,6 5 5,6 6 it took 13ms to generate this set
Now I can work with large texts!
elements=(1..5000).to_a it took 28370ms to generate this set
Now I have what I need to build a database index representing combinations of words in sentences. From there, I can find all sentences with, say, powerful and language in them (because all sentences will be indexed according to the combinations of words that appear in them). However, knowing to look for powerful and language in advance is not my goal -- I want the database to tell me the most frequent co-occurrences, so that I can examine them. My goal is to have the database tell me, in essence, the most common combination of words the combination of problem and feature so that I can pinpoint what people are talking about without having to read every single sentence. (I'm pretty sure this is impossible since people might not use the same terminology even though they're all talking about the same thing).
But more on this topic later...