Wednesday, April 18, 2012

Crossword generation algorithm revision

Remove fixed set perpedicular dictionaries prior to puzzle construction.  Use a single set dictionary instead, to pick words from and assign as needed words in crossword from this single set.


2.  Implement backtracker, however, record processed data  up to all potential backtrack points, to avoid recomputations.

3. remove existing ranking system, instead implement additional
ranking system that measures word/letter density (over iteration cycle).

4.  Remove squarness test, as word/letter density metric may compensate, or implement squareness test
over larger iteration cycles...there may be overlap bettween this an density ranking?

method: We start with two cycle iteration, for example, from horizontal, fill verticals, then fill horizontal, then rank, then repeat steps on all 1rst cycle matches from the fill verticals step, reiterating next cycle, rerank, and repeat as necessary until constructing all match variations from the first cycle and all possible match variations on the second cycle.  Pick the highest density rank, measure squareness, then picking edge work a new starting edge cell line to balance for squareness, and repeat this process all over, and so forth until have removed either all or a ranged target of words from such set.

positives here:  again avoiding fixed boundary conditions for puzzle generation, but implementing weights on puzzle generation growth measures, this increases probable growth favouring higher word/letter to space density arrangements (more likely leading to more densely filled rectangles or square fill patterns), and then attempting to weight growth in favour of squareness by picking new 'cell' edges along a desired growth line.  While this does some hard number crunching for picking variation of matched words on the fill pattern.  Once a subset fill pattern has been chosen, I avoid hard back track fill re computation of such space...hmm, one way of thinking of this is building many crosswords inside a crossword.  Once building a crossword a decent subset density crossword, build another high density crossword subset, and so forth, until filling using all desired words, or a given range of words, keeping the total fill pattern arranged as square as desired, and weighted inside a target total side length range.

4/19

backtrack: 
   definition:
     iteration cycle: for the context of this program this refers to the cycling through of all work queued words on a particular fixed dimension of the crossword.  For example,  three words, ('cat', 'dog', 'iguana') are added to the work queue, these are worked on the HORIZONTAL axis,
     Finding all vertical word matches for these words in turn represents one iteration cycle.
    
   Data records:  1) All data gathered at iteration snapshot, key cycle iterations.
   Construct an adequate key for iteration cycles. Hmm...since for any seed word  on a 2 cycle iteration, there should exist for such seed a key of length, len(seed_word), we could use a key tuple value for this in our tracker.
  
   After all 2 cycle iterations of a word are complete (in terms of all permutations of matched words to such word), then a highest density ranked crossword sub group (sub crossword) is selected  positions are recorded and all data records are discarded.
  
revision: will revise squareness test at the moment, to measure and record alongside picked sub group data attributes -coordinates, words, letters, and word orientation, the length, width attributes for the respective subgroup.


method revision:  2 cycle iteration permuation groups determine a subgroup for a selected word, repeat this process of the set of all words until having sufficiently exhausted all such words and having constructed, a family of subgroups.  Then with attribute data of subgroups, proceed to build,
a larger group representing our crossword.  I'd leave the possibility of rotational matrix transformations open so as to provide additional fit possibilities as necessary.  This means we hadn't need position a seed selection word for the base of such subgroup in any particular oriented fashion firstly, to meet fit
criteria latter, since we can rotate this subgroup as necessary.

So the one thing to be added here, it appears that removing weight ranking selection at every pick and instead opting for trying out permutations for a given 2 cycle iterations leads to on first cycle for a seed word,
number of letters  = len(seed_word), and for each letter len(of word matches).

This is total of  a[0]*a[1]...a[n] represents n letters and a[n] represents the length of word matches on the nth letter, permutations, the next set grows even larger for each set of picks in a[0] thru a[n] where each word match leads to its own similarly expanded set of letter to word association groups.  Something of a computation suggestion here.  The more words left in your dictionary has the potential to create nicer more dense looking crossword structures ( I would guess here), but computation time spent could increase dramatically here even with 2 iteration cycle caps.

I created an ordered linked list on the permutation set here for keying computation data at least for structuring letter to word associated keys.  

No comments:

Post a Comment

Oblivion

 Between the fascination of an upcoming pandemic ridden college football season, Taylor Swift, and Kim Kardashian, wildfires, crazier weathe...