a computer scientist’s thoughts
RSS icon Email icon Home icon
  • Generate Ambient White Noise on Linux

    Here is the link for the software:

    http://pessimization.com/software/whitenoise/

    It supports frequency cutoff. I’ve been using it for a whole day, it is wonderful.

  • Some useful tools for you to write English articles on Linux

    (This is a re-post from my previous Chinese blog)
    http://blog.youxu.info/2007/07/11/some-useful-tools-for-you-to-write-english-articles-on-linux/

    As an ESL (English as a Second language) student, I usually have a fear of writing articles. Nevertheless, I have to write about one article per week, either for learning English or for recoding my ideas. For many people in China, their killer applications are Word and Kingsoft Ciba. They simply type a Chinese phrase/word into the electronic dictionary, click the “translate” button, copy/paste the English word, do a grammar check in Word, and that’s all. After doing all of these, if Word stops reporting any spelling and grammar errors, they feel a great sense of achievement. I was one of those people before.

    Meanwhile, as a Linux deadhead, I dislike M$ products emotionally. It seems to me that the only way out is to use AbiWord or Openoffice. I’ve used both for a while. Yet, I have to say that they are helpful but not perfect. To use them, I have to prepare a text file, which is inconvenient when you are working on a Tex file. For MacOSX, the other thing is I have to install X11. Don’t get me wrong, *nix is industrial-strength and designed to do everything solely with the shell. (Well, WoW is the last thing on my mind.)

    After painful Googling, now I have at least four tools for helping with ESL writing.

    1. GNU Aspell.

    GNU Aspell is a Free and Open Source spell checker. It supports spell checking for source codes, script comments and TeX files, as well as HTML web pages and email. Aspell provides its users both interactive and batch modes. It contains several advanced features that are missing in both M$ Office and OO, such as a text-file-based user-defined dictionary and a “sound like” feature (e.g., know and no). GNU Aspell is definitely for literate programmers or PhD. students who want to write elegant code comments and academic articles.

    2. GNU diction

    GNU diction originates from the `diction’ on AT&T UNIX. It is actually a rule-based style checker. I’ve read the code thoroughly and found that almost every piece of the rules came from a book titled “The Elements of Style” authored by William Strunk. That is to say, you have an “Elements of Style” in your pocket now. Please note that the simple grammar checker in Word has nothing to do with style checking. GNU diction is a charming complement to Word/Openoffice if you insist on using them.

    As it is rule-based, it sometimes provides redundant information even if your usage is indeed correct. As D.E. Knuth has mentioned in “Mathematical Writing”, the analysis of diction is quite superficial. “However, said Don, these programs are kind of fun. And they do provide an excuse to read the document from another point of view. Even if the analysis is wrong it does prompt you to re-read your prose, and this has to be a good thing”.

    3. GNU Style

    GNU style is contained in the GNU diction package. It will report the readability of your article based on several well-known linguistic indexes. For the native speaker, these are used for improving the readability of articles. Nevertheless, for ESL students, these indexes would be viewed as the writing level in terms of “what grade/school year is needed to understand your article for the average American”. In my opinion, we ESL students should prevent over-using simple words and simple sentences in technical writing. But,  definitely don’t use a million-dollar word where a one-dollar word will do. Yet for ESL students, trying to use some new and sophisticated words will eventually boost writing ability.

    4. LanguageTool (GPLed)

    It is an open source language checker for English and other languages based on Java. I began to use it recently. It’s better than the embedded grammar checker in Openoffice. Moreover, it does support CLI mode and web mode. This is the missing tool on the Linux platform for grammar checking.

    I can remember that when I was a college student, I struggled to write English articles with M$ word or Openoffice. My personal experience with English writing and M$ Word grammar checker brought me the truth that we should never let  the quality rely on the damn grammar checker. As a rule of thumb, the best way to improve ESL writing skill is to write and to practice.

    BTW: In preparing this article, I’ve employed vim, aspell, diction, style, languagetool and other tools on the Linux and Mac platform.

  • An awk/shell trick to extract lines from a file

    [This post is for *nix users only]

    We all know that [shell]

    grep -f file1 file2

    will find the intersection lines on two files, file1 and file2, given that file1 and file2 are relatively small, and that any line in file1 will not be a substring of some line in file2.

    Now, how about extracting from a file only a subset of lines, such that, say, the first column is in a set (stored as a file)?

    You can use [shell]

    grep -f set file

    However, this will not give you the exact result if some columns in file2 contain the string in the set as a substring. Grep can not support filed (or you can use regular expressions. However, to use regular expression, you have to  rewrite the file “set”).

    I guess the best way is to use awk, as it identifies fields.

    First, you can use [shell]

    cat set | tr "\n" "\t"

    to make a list of strings separated by tabs.

    Then, we can copy this list printed out in shell to awk for further use. Actually, we write something like

    BEGIN {
    split("the tab separated list", temp);
    for (i in temp) set[temp[i]] = i;
    }

    Now we will have a “set” array in awk with subscriptions as the elements in the original “set” file.

    Now we use the following line in awk to extract the lines you want with the first column in the set.

    $1 in set

    -EOF-