Tuesday, March 24, 2009

Manga Meeya


A while ago, I wrote a post about CDisplay, an awesome Comic Book Reading Tool. I had several problems with it (such as its non customizable shortcuts, full screen mode that didn't work with my dual screens, random crashes while opening certain archives, difficulty in dealing with archives within archives...), but I didn't think that anything better existed out there. I tried CDisplayEx, but that was way worse. I was even tempted to write my own manga viewer called "CDisplayEx+Alpha." But there's no need, because I found MangaMeeya, which is everything I ever wanted in a comic reader and more.

My favorite features about MangaMeeya are its ability to display two pages at once, its smart look ahead feature, its ability to view pages either left-to-right or right-to-left, its fit-to-page image resizing, its ability to automatically deal with compressed files, and its customizability. On top of all that, MangaMeeya is blazingly fast. In addition to comic reading, it works really well as a generic image viewer... you can even delete images using MangaMeeya and use it to organize your galleries. I really can't even imagine a better sequential image viewing tool.

I don't really know the history of MangaMeeya or where it came from. I don't even know its home page. I think that it's a Japanese commercial product, but that development has ended over some copyright dispute. At some point, some guy unofficially translated it into English. If anyone has more information about MangaMeeya, please let me know, since I would like to pay for the program.

Link to Download English MangaMeeya:  Download Here

Context Menu

Although its internal file browser is nice, you will definitely want to add MangaMeeya to your right click context menu. That'll allow you to right click on any file or folder and have it open in MangaMeeya. You can do this by going into the registry and adding new shell commands to the '*' and 'Folder' extensions. Or you can download this batch file I wrote and then run it in the directory where MangaMeeyaCE-English.exe is located. I've tested it on Windows XP SP2, Windows XP SP3, and Windows Vista. It will modify your registry, but I can't imagine how it could ruin anything. Still, I assume no responsibility if you somehow use it to wreck your system. The context item will be named CMangaMeeya, which should always put it near the top of your right click list.
  1. @echo off  
  2. if not exist MangaMeeyaCE-English.exe (  
  3.   echo MangaMeeyaCE-English.exe Not Found In Current Directory.  
  4.   echo Please run this file in the MangaMeeya Directory!  
  5.   echo.  
  6.   pause  
  7.   exit  
  8. )  
  9.   
  10. echo Creating Registry file...  
  11.   
  12. echo Windows Registry Editor Version 5.00 > mangameeya.reg  
  13. echo. >> mangameeya.reg  
  14. echo [HKEY_CLASSES_ROOT\Folder\shell] >> mangameeya.reg  
  15. echo [HKEY_CLASSES_ROOT\Folder\shell\CMangaMeeya] >> mangameeya.reg  
  16. echo [HKEY_CLASSES_ROOT\Folder\shell\CMangaMeeya\command] >> mangameeya.reg  
  17. echo @="\"%cd:\=\\%\\MangaMeeyaCE-English.exe\" \"%%1\"" >> mangameeya.reg  
  18. echo. >> mangameeya.reg  
  19. echo [HKEY_CLASSES_ROOT\*\shell] >> mangameeya.reg  
  20. echo [HKEY_CLASSES_ROOT\*\shell\CMangaMeeya] >> mangameeya.reg  
  21. echo [HKEY_CLASSES_ROOT\*\shell\CMangaMeeya\command] >> mangameeya.reg  
  22. echo @="\"%cd:\=\\%\\MangaMeeyaCE-English.exe\" \"%%1\"" >> mangameeya.reg  
  23.   
  24. regedit /s mangameeya.reg  
  25.   
  26. echo Successfully Added Manga Meeya to Context Menu as CMangaMeeya!  
  27.   
  28. pause  
My CDisplay Clone Set Up

Since I've used CDisplay for a few years, I've gotten used to many of its key shortcuts. Since MangaMeeya is so customizable, I switched many of the key shortcuts to be more CDisplay-like. If you're in the same boat as me, feel free to use my configuration settings. Simply download my file and swap it with the file in your MangaMeeya directory.

My Configuration Settings File:  Download Here

My Frequently Used Commands:

Right Arrow: Goes to the next section or page.
Left Arrow: Goes to the previous section or page.
Ctrl+l: Foes to the next directory or compressed file. It's a lowercase 'L', not the number one.
Ctrl+Shift+l: Goes to the preview directory or compressed file. It's a lowercase 'L', not the number one.
l: Allows you to select a folder to open. It's a lowercase 'L', not the number one.
o: Allows you to select a file to open. It's a lowercase 'O', not the number zero.
`: Exits the program
Ctrl+Return: Switches between full screen mode and Windowed mode.

Well, if you read comics on your computer, I hope that you give MangaMeeya a try. It's definitely worth it.

Thursday, March 12, 2009

Simple Anagrammer

While watching my family feverishly play word jumble one day, I decided to write a program to list anagrams. They're still addicted to that game, but I've long since finished up my program. Although it wasn't much effort, I thought I'd go over what I came up with. I wrote code to find single word anagrams, sub anagrams, and multi word anagrams. Since those may be my own nomenclature, I'll explain more clearly. What I'm calling a Single Word Anagram is a single word made up from all the letters of one or more words, such as "cat" to "act" and "resume" to "sure me." What I'm calling Sub Anagrams are possible shorter words made up from one or more words, such as "cat" to "catharsis" and "user" to "sure me." What I'm calling Multi Word anagrams are two or more words made up from all the letters of , such as "sure me" to "resume." Now let's get into my implementation. You should know that my implementation does assume you're more concerned about processing time than memory space.

Also, I originally wrote the code in java, but that language is not very suited for code snippets... it's so, so verbose. So all my code examples are going to be in ruby, with an emphasis of being as concise as possible. I left out all the optimizations I did for my java program, I'm only intending to give a general example of how the code works. So the code runs very, very slowly for long words. But, you'll get the idea of what's going on.

Word List

Before I get into my algorithms, I should go over obtaining a good word list, which is probably the most important step. A word list is exactly what it sounds like and it's what you'll use to determine if a word is real or not. There are lots of word lists out there and you should choose one depending on how thorough you want to be and what language you want to use. There's one in /usr/dict/words if you're using unix, but it isn't very good. I like the very flexible set I found at word list called scowl. It gives you a lot of different word lists you can pick and choose from. The word lists are found in "[scowl home]\final"; they're arranged by a description followed by their part number. A part number of 10 includes a small set of very common words, a part number of 95 contains a very large set of uncommon words. No two sets for a description contain the same word. For my word list, I used english-words.10, english-words.20, english-words.30, and english-words.40. You can trim down the word lists however you want.

By the way, I also found a word list at wordnet that you can also use as a dictionary for looking up definitions. It's free, which is nice.

Single Word Anagrams

After you have your word list, you need a good algorithm. Let's say you want to find all the anagrams of a word. The most straightforward solution is to find all the permutations of the letters in that word and then figure out which ones are real by comparing to your word list. Of course, this is horribly inefficient. I mean, for every n length word you're doing n! comparisons. It's doable on today's machines, just very, very slow.

A better solution is to create a special dictionary (/hash table) for yourself. The key will be the letters of a word arranged alphabetically, the value will be a list of words with that key. So, for example, the key "abt" will contain the word list "bat, tab". You'll have to iterate over your whole word list to create the dictionary, which will take a few seconds, but it is a one time thing. And once it's created, listing the anagram is as easy as arranging the letters of your target word alphabetically and looking it up in the dictionary you created. Make sure everything is lower case for your comparisons.

Here's my ruby code:
AnagrammerBySort.rb - Part 1
view plaincopy to clipboardprint?
  1. def words text  
  2.   text.downcase.scan(/['\w]+/)  
  3. end  
  4.   
  5. def getWordKey word  
  6.   return word.split("").sort!.to_s.strip  
  7. end  
  8.   
  9. def train features  
  10.   model = Hash.new{|hash,key| hash[key] = Array.new}  
  11.   features.each {|f| model[getWordKey(f)].push(f)}  
  12.   return model  
  13. end  
  14.   
  15. NWORDS = train(words(File.new("words.txt").read))  
  16.   
  17. def singleWordAnagram word  
  18.   return NWORDS[getWordKey(word)]  
  19. end  
Sub Anagrams

To find sub anagrams, you first need to be able to determine if a word is a subset of another word. Remember that duplicates count for anagrams, in that "aabc" is not a subset of "abcd." There are many, many ways of checking subsets. In my ruby code example, I just went for the most concise way. It actually performs really poorly.

Once you have a way to determine subsets, all you need to do is iterate over all the keys in your anagram dictionary to determine if they are subsets of your target word. You could also create a new dictionary that maps all words in your word list to their sub anagrams, which would make many future sub anagram lookups instant (it won't be instant for words you haven't seen before.)

Here's my ruby code:
AnagrammerBySort.rb - Part 2
view plaincopy to clipboardprint?
  1. # Whether word1 a subset of word2, duplicates are not ignored.  
  2. def isSubset( word1, word2 )  
  3.   word1.split("").uniq.each { |f|   
  4.     return false if(word1.count(f) > word2.count(f))   
  5.   }  
  6.   return true  
  7. end  
  8.   
  9. def subAnagrams word  
  10.   solution = Array.new  
  11.   NWORDS.keys.each {|f| solution.push(NWORDS[f]) if(isSubset(f,word)) }  
  12.   return solution.flatten!  
  13. end  
(Edit: I found this page which explains a much faster (although much more memory intensive) method for finding Sub Anagrams that uses a tree algorithm *somewhat* similar to a tree. If you want to get deeper into anagrams, you should definitely take a look at it as well as its follow up. The comments are really useful to read too.)

Multi Word Anagrams

Multi Word Anagrams are easy after knowing how to make single word anagrams and sub anagrams. This is how I did it:

1) Make a candidate list of all possible sub anagrams of your target word.
2) Iterate the candidate list. In each iteration, remove the first word on the candidate list and compare it to all other words on the candidate list.
    A) Add a combination of the words (Word A Word B) to the candidate list if they are a subset of the target word.
    B) Record all words that have the same key as the target word key.
3) All the words you've recorded is your solution.

This has to be done carefully or you'll quickly run out of memory. You should add logic to make sure a word is added to the list only if you want to evaluate it later.

Here's my ruby code:
AnagrammerBySort.rb - Part 3
view plaincopy to clipboardprint?
  1. def multiWordAnagrams word  
  2.   solution = Array.new  
  3.   candidates = subAnagrams(word)  
  4.   while candidates.size > 0      
  5.     f = candidates.delete(candidates[0])  
  6.     candidates.each {|g|      
  7.       if( getWordKey(word) == getWordKey(f+" "+g) )   
  8.         solution.push(f+" "+g)  
  9.       elsif( isSubset( getWordKey(f+" "+g), getWordKey(word) ) )  
  10.         candidates.push(f+" "+g)  
  11.       end  
  12.     }  
  13.   end  
  14.   return solution  
  15. end  
  16.   
  17. # Testing #  
  18. require 'pp';  
  19. print "Enter word: "; word = gets.strip  
  20. print "Single Anagrams: "; pp singleWordAnagram(word)  
  21. print "Sub Anagrams: "; pp subAnagrams(word)  
  22. print "Multi Word Anagrams: "; pp multiWordAnagrams(word)  
Alternative Dictionary

I think alphabetical keys work very well for full word anagrams, but not as well for sub anagrams. The solution I preferred involved using a different key for the anagram dictionary. First assign each letter of the alphabet a prime number greater than 1. 'a' can be 2, 'b' can be 3, and so on. All characters you don't care about (such as apostraphes, periods, and spaces) should be assigned 1. You can find a list of prime numbers here. Use the smallest numbers you can, otherwise you'll have bigger problems later. The key for the new dictionary will be the product of the letters in the word. Like before, the value will be a list of words sharing the key. Since you're multiplying prime numbers, you'll know that if Word A / Word B is 0, then the words are anagrams of each other. If Word A % Word B is 0, then the Word A is a sub anagram of Word B. To determine sub anagrams, traverse all the keys in your dictionary and find the anagram lists that divide into the target word key evenly. Finding Multi Word Anagrams will also be basically the same as before, but switching out the way you determined words are anagrams or sub anagrams.

One thing to watch out for with the prime number method is the size the of the key. In java, a 32 Bit Int won't even hold the word resume. I used a long, but even that has issues if I'm trying to find the anagrams for a sentence, much less a paragraph. You can use code to deal with really large numbers or you can just just report an error if the word can't be represented. How you deal with this problem is up to you.

I admit, the prime number algorithm doesn't actually go noticably faster, although there should be significantly less comparisons being made. Furthermore, it does have a limit in the words based on the maximum size of the key. I still like this algorithm better. Anyways, here's the code using the prime number methodology:
  1. #!/usr/bin/ruby  
  2.   
  3. LETTERS = { 'a'=>2,'b'=>3,'c'=>5,'d'=>7,'e'=>11,'f'=>13,'g'=>17,'h'=>19,  
  4. 'i'=>23,'j'=>29,'k'=>31,'l'=>37,'m'=>41,'n'=>43,'o'=>47,'p'=>53,  
  5. 'q'=>59,'r'=>61,'s'=>67,'t'=>71,'u'=>73,'v'=>79,'w'=>83,'x'=>89,  
  6. 'y'=>97,'z'=>101 }  
  7. LETTERS.default=1  
  8.   
  9. # Warning: Requiring 2 Character Minimum, includes apostrophes  
  10. def words text  
  11.   text.downcase.scan(/['\w]{1}['\w]+/)  
  12. end  
  13.   
  14. # Warning: Not checking for value overflows  
  15. def getWordKey word  
  16.   product = 1  
  17.   n = word.length  
  18.   (0..n-1).collect {|c| product *= LETTERS[word[c,1]] }  
  19.   return product  
  20. end  
  21.   
  22. def train features  
  23.   model = Hash.new{|hash,key| hash[key] = Array.new}  
  24.   features.each {|f| model[getWordKey(f)].push(f)}  
  25.   return model  
  26. end  
  27.   
  28. NWORDS = train(words(File.new("words.txt").read))  
  29.   
  30. def singleWordAnagram word  
  31.   return NWORDS[getWordKey(word)]  
  32. end  
  33.   
  34. def subAnagrams word  
  35.   solution = Array.new  
  36.   NWORDS.keys.each {|f| solution.push(NWORDS[f]) if(getWordKey(word)%f==0) }  
  37.   return solution.flatten!  
  38. end  
  39.   
  40. def multiWordAnagrams word  
  41.   solution = Array.new  
  42.   candidates = subAnagrams(word)  
  43.   while candidates.size > 0   
  44.     f = candidates.delete(candidates[0])  
  45.     candidates.each { |g|   
  46.       if ( getWordKey(word) == getWordKey(f+" "+g) )   
  47.         solution.push(f+" "+g)  
  48.       elsif( getWordKey(word) % getWordKey(f+" "+g) == 0 )  
  49.         candidates.push(f+" "+g)  
  50.       end  
  51.     }  
  52.   end  
  53.   return solution  
  54. end  
  55.   
  56. # Testing #  
  57. require 'pp';  
  58. print "Enter word: "; word = gets.strip  
  59. print "Single Anagrams: "; pp singleWordAnagram(word)  
  60. print "Sub Anagrams: "; pp subAnagrams(word)  
  61. print "Multi Word Anagrams: "; pp multiWordAnagrams(word)  
Those are the basic solutions to all the problems I played with. The above should be good inspiration to make your own anagrammer. There's still lots of fine tuning and optimizations you can add that I didn't get into. For example, you may want to ignore words with apostrophes or words less than two characters. You could work in a multi-threaded fashion as well. There are also lots of word list tweaks you can do. There's also a lot of features you can add; fun behavioral tweaks. For example, you might want to look for sentences from your multi word anagrams by throwing them through a grammar checker. There are lots of really fun anagrams out there and I'm sure you could write a program to find many more.

Tuesday, March 10, 2009

Limited User Accounts

I'm usually working on a Solaris machine at work, but occasionally I need to use Windows. I don't do much development on it, but I use it a lot for creating presentations, reports, and design documents, as well as sending e-mails. When I use Windows at work (which isn't really that often), I'm forced to use a machine where I've only got a Limited User Account (LUA). I understand the account type is necessary the way we're doing things at work. But it was annoying because the LUA account prevents me from installing a lot of the programs I'm used to having and a lot of the OS tweaks I'm used to doing.

Apparently, there are a bunch of people who purposefully don't use an Administrator account, primarily for safety reasons. They've got a website over at http://nonadmin.editme.com where they explain how they use Windows with an LUA and are still be able to do all their day to day tasks. That website was an excellent resource. While it hasn't convinced me to use an LUA at home, I no longer have any complaints about using it at work. I got all my most important programs installed. I thought I'd share the two most important workarounds for me.

1. Installing DLLs

There's a program called RegSvrEx that will allow you to register your DLLs when you have a LUA. It's a replacement for RegSvr. Instead of registering the DLL globally (which is potentially dangerous for everyone, I guess), it has an option to just register the DLL for your account (which can only be dangerous for you.) So download the program from http://www.codeproject.com/w2k/regsvrex.asp and run it from command line using "RegSvrEx /c [path to dll]".

2. Adding Right Click Options

Notepad sucks. It really, really sucks. Not only will it mess up UNIX text files, it takes forever to open large files, looks really ugly... man, Notepad isn't good for much. As a text editor, I usually use Textpad. I couldn't get Textpad to install with an LUA, so I went with Notepad++, which is also excellent. But everything still opens with Notepad. Bah. This workaround allowed me to add Notepad++ to my right click menu, which saved me from so many "Open With" dialogues. It's really simple to do. Create a "notepadplus.reg" file with the following contents for your preferred editor and then execute it. This script is easily modifiable for whichever program you want to add to your menu. Please note that you will be modifying your registry:
  1. Windows Registry Editor Version 5.00  
  2.   
  3. [HKEY_CLASSES_ROOT\*\shell]  
  4.   
  5. [HKEY_CLASSES_ROOT\*\shell\[texteditor name]]  
  6.   
  7. [HKEY_CLASSES_ROOT\*\shell\[texteditor name]\command]  
  8. @="\"[path to editor, don't forget to escape '\' characters]\" \"%1\""