Back in October 2013 I evaluated several commercial and free text editors. I guess, my profession as a computational linguist brings some special requirements for a text editor which may differ from other users' needs, e.g. programmers. Additionally I spend a lot of time in text editors and prefer appealing ones (so please don't give me advice regarding vi or emacs ;-) ).
First of all I want to send apologies to Jussi Jumppanen, the author of Zeus. He send me a mail some months ago and asked, why I compared his years old free Zeus version with professional commercial up to date versions from Ultraedit, Sublime etc. As a matter of fact, that was not fair.
So today I am going to compare three of the best ediors on the market right now. How I decide which editors to test? Like this:
1. My last tests winner was UltraEdit and I am still using it since then. However, I was never really content with my election so maybe this test will reveal some other preferences.
2. A friend is using Sublime and I like to watch him using it as it is as well fancy as mighty.
3. After my mail contact with Jussi I was interested in the power of the "great" Zeus I missed in my last test.
Okay, let's start. What do I want, what do I expect, what do I require from a text editor?
I think I can break it down to the following points:
1. Can handle really big text files
2. Has a stable and fast and transparent regular expressions component
3. Can reliably recognize and convert text encodings
I evaluate the following text editors:
1. How long does it take to open a 50 MByte / 100 MByte / 1.8 GByte text file?
(Ok, as I used real text files and not ones I made up the real files sizes are 53 MByte, 105 MByte and 1.84 GByte)
|50 MB||~ 5 s.||< 1 s.||~ 3 s.|
|100 MB||~ 10 s.||< 1 s.||~ 5 s.|
|1.8 GB||> 2 min.||< 1 s.||~ 50 s.|
As we can see, we are looking at three very different styles of text file handling. As UltraEdit does not load the whole file at once but the part you are looking at, it does not matter how big the entire file is - opening it and starting to work with it does never ever take longer than a second. A huge plus here. Zeus reads the entire file as Sublime does as well but Zeus is a lot faster and I was not sure if Sublime will not crash at the end of the procedure as it didn't react for some time but than was stable. And, in contrary to Zeus, it showed a status bar while loading so you knew it was working which is for loading time more than 10 seconds a real useful information.
Oh yeah, there is something you need to know in regard of UltraEdits file handling: If the file is bigger than 1 MByte you always get a dialog window before the file opens asking you to select: Open the file directly which makes changes permanent or in a temporary file which makes it longer to process. I always use the direct file opening.
2. How much RAM is used according to the task manager?
I admit, this is influenced by a lot of other factors and I guess repeating the experiment would end in varying results but at the end of the day, the tendencies are useful to know.
|50 MB||47.000 k||28.000 k||80.000 k *|
|100 MB||85.000 k||28.000 k||140.000 k|
|1.8 GB||994.000.000 k||28.000 k||1.934.000.000 k|
Interesting: While UltraEdit is as slim as it is fast (and I guess this goes for longer loading times when scrolling through files) Zeus seems to be the bit faster that it uses more memory.
*] It seems that I could win the crash challenge which is run in order to prove the stability of Zeus - unfortunatley it crashed several times after opening this file and just doing nothing. In general, Zeus appeared very stable to me.
3. How do I recognize additional file information ad the number of lines, number of words, number of chars?
Sublime: I was not able to see any of those informations in Sublime and I am not willing to read documentation for such basic features.
UltraEdit: UltraEdit shows in the bottom bar some information about the acutal file. This includes file size in a not specified number. The 50.6 MByte file e.g. is 53158512 big so I think this means something like Byte. The other informations I would be interested in are also not available or well hidden.
Zeus: Same as Sublime - no information at all.
4. How do I recognize the acutal text encoding?
Sublime: I have no idea and I am not willing to read documentation for such basic features.
UltraEdit: Bottom bar says "utf-16"
Zeus: I have no idea and I am not willing to read documentation for such basic features.
5. How do I convert the actual text encoding?
Sublime: File/Save with Encoding - good list but I don't get why I have to select "save" when I want to convert.
UltraEdit: Under File/Convert you may find a depressingly incomprehensible list of possible and impossible conversions. E.g. I guess "Unicode" means "UTF-16 Little Endian" as there also exists "Unicode Big Endian" but these are things I don't really understand. Encodings are such a pain in the neck, why the heck should an editor make this even more complicated?
Zeus: I have no idea and I am not willing to read documentation for such basic features. Some basic encodings are accessible via "save as..."
6. How do I recognize the line ending style?
Sublime: I have no idea and I am not willing to read documentation for such basic features.
UltraEdit: Bottom bar says DOS or UNIX or MAC. It can be converted as well via File/Convert.
Zeus: I have no idea and I am not willing to read documentation for such basic features.
7. Can I somehow see this invisible character? ""
The Unicode for this character called "Reverse Line feed" is U+008D and I hate it because it sometimes shows up in my corpora and it is really hard to spot. In my first test, UltraEdit earned a big plus as it was the only editor able to show a box here. Unfortunatly, this was a bug and was fixed afterwards, so now there is no editor able to show this character...
Sublime: No visible character, but a countable one, a line with this character inside has one character more and it is spotable with the text cursor as it stops there.
UltraEdit: No visible character, and also not countable one, the count just jumps from 1 to 3.
Zeus: No visible character, but a countable one. And Zeus is the only editor where you can mark the character (either with the mouse cursor or with the text cursor) as there appears a space if you try to.
8. Can I search through all open files or all files in a folder?
Sublime: Yes, but not in the regular search field. You've got to open "Find/Search in files..." and there you can select files and folders. It shows the results in another window, which seems not to be so practical to me.
UltraEdit: Yes, all open files, just set a mark on "All open files" when searching.
Zeus: Just click "All open documents" in the search window.
9. What about the Regex implementation?
Sublime: The regex engine seems to be good implemented and it is fast even for big files.
UltraEdit: This was the best part of UltraEdit as regex are really fast and you can chose only in Ultraedit between Perl, Unix and UltraEdit flavor. But as a matter of fact I have had a lot of problems since my last test and had a lot of conversation with the UltraEdit support regarding problems and bugs in the regex implementation. They are really nice and supportive people, but e.g. my last bug report from November regarding regex was not fixed and I neither got an answer as well.
Zeus: The documentation states, that the regex flavor is "Unix/Perl". In general it seems to work good but I would have to learn some things that are different from my usual working style. I thought it would be Perl-Style to use $1 in order to refer to a variable but here it is \1. Additionally, there is still no way to come from search to replace (you've got to close the search pop up and open the replace pop up).
11. Do I like the look and feel of the editor?
Sublime: It is very slim, beautiful and has style. I really like the minimap although I do not use it very often. The clean interface is sometimes to clean as I miss a lot of information. There seems to be no way to reuse old search strings. You can use standard keyboard shortcuts as CTRL-W to close a window.
UltraEdit: In the contrary, UltraEdit is full of icons and functions I have never used and so it seems to be really complex. It could use a bit of a cleaning up. I hate UltraEdits Search windows. After a lot of investigation and personal support (!) I found out it is possible to fix the find and replace bar at the right side so you can debug a regex without restarting all the time from scratch. I would love to prevent opening a new floating bar everytime when search starts as this always is on top of something I would rather like to see and/or edit. Additionally I hate the keyboard shortcuts, e.g. Shift-F5 to close a tab while system wide CTRL-W changes from line break from hard to soft.
Zeus: I am not a fan of the oldfashioned style of Zeus. But, well, it has pretty serious and stable appearance as well. But still, I would prefer something modern and slim. And it seems to be impossible to dock the search windows anywhere. You can use standard keyboard shortcuts as CTRL-W to close a window.
As a matter of fact, there does not seem to be THE editor for me. All of them had strong features I would love to use on a regular basis and all of them have real bad shortcomings.
I love the slim and fast interface although I often struggle to find the information I want. Sublime needs a lot of time to open large files but is stable and fast even if they are really big.
I love the fact, that UltraEdit is so unbelievable fast in regards of big files. The possibility to change the regex flavor seems nice but I never changed the style - I think I was just happy to exaclty know which style I was working with. I do not like the interface, especially the keyboard shortcut implementation. The bugs in the regex engine are a no go.
Zeus is the only editor which allows marking of the invisible character in test number 7. It is faster and more RAM greedy than Sublime which I think is a good deal, after all when it still is able to work with those big files. Unfortunatley I really dislike the interface although it is a reasonable mix between the over filled UltraEdit and the too cleaned up Sublime.
My personal decision? Hard to say. Really hard to say. In the end, it comes down to one question: What is worse? That Sublime is so sedate when it comes to big files or that UltraEdits regex are buggy and I dislike the interface? Although Zeus has some quite interesting features and is in some manners the middle way between the other candidates but I really really don't want to work with it .
I guess I will stay with UltraEdit for the moment but ask my Sublime using friend if he can answer some of my questions above. I will inform you.
Do you remember my article on wit.ai, a clever SAAS provider for automatisch speech recognition? They are the speech technology company recently bought by Facebook. This is, what wit.ai states on their blog on this issue:
It is an incredible acceleration in the execution of our vision. Facebook has the resources and talent to help us take the next step. Facebook’s mission is to connect everyone and build amazing experiences for the over 1.3 billion people on the platform – technology that understands natural language is a big part of that, and we think we can help.
The platform will remain open and become entirely free for everyone. Developers are the life of our project and the energy, enthusiasm and passion of the community has helped turn what was once just a lofty dream, into a reality. We want to continue to build with you.
As the end of the year is coming closer I wanted to share my new favorite palindrome with you. As you may know, a palindrome is a word or a sentence (or "string of characters ") which reads the same backward or forward. Unfortunatley, it is in German. But if you are not able to understand it, at least be impressed by this very special palindrome. I provide a translation afterwards so you might see, that it is not totally foolish text. And yes, my favorite English palindrome stays "A man, a plan, a canal - Panama"...
So here it is:
Geist ziert Leben, Mut hegt Siege, Beileid trägt belegbare Reue, Neid dient nie,
nun eint Neid die Neuerer, abgelebt gärt die Liebe, Geist geht, umnebelt reizt Sieg.
or, if you prefer it backwards:
.geiS tzier tlebenmu, theg tsieG, ebeiL eid träg tbelegba, rereueN eid dieN tnie
nun, ein tneid dieN, eueR erabgeleb tgärt dielieb, egeis tgeh tuM, nebeL treiz tsieG
("Spirit graces life, courage nourishes victories, commiseration includes provable remorse, envy never serves,
now envy unites the innovators, deceased love ferments, spirit goes, befogged victory is tantalizing.")
Some additional remarks:
1) As a German I seem to be obliged to mention that "Reliefpfeiler" is a) one of the longest German one word-palindromes and was b) "invented" by Goethe (although Wikipedia states I) it was Schopenhauer and II) that this is not the truth). I am not sure if this is true or interesting but several teachers in my life seem to care about this.
2) Weird Al Yankovich made a song out of Palindromes. It goes something like this:
3) There are also Palindrome novelles. According to Wikipedia, there is e.g. the novel "Dr Awkward & Olson in Oslo" by Lawrence Levine from 1986 containing 31,954 bidirectional words, take a look at it here at DigitalCommons.
4) Regarding palindromic dates, according to Gnudung the next one we will encounter is 21.12.2112 at 21.12.
German Newspaper Frankfurter Allgemeine Zeitung talked with the bavarian SEO specialist Marcus Tandler from OnPage. Topics lead from the rise and fall of platforms as Yatego and Googles influence on this up to the rise of Google, the fall of Altavista and Tandlers prognosis, that Google seems to be invincible (although he thought the same on Altavista once).
FAZ.NET: Eine Plattform für alles
It was pretty clear that "krak" ist the Campell Monkey term for "Leopard in sight", as scientists determined it by observing them in their home forests of Ivory Coast. Research revealed: The Campell Monkey vocabulary differenciates in this regard between hawks, leopards, and other but minor potential sources of danger. You need to know by here that those monkeys are famous for their advanced communication forms with rudiment syntax. Ok, so the assumption, "krak" means "Leopard" was stable until they found recently Campell Monkeys on Tiwai Island in Sierra Leone, that use the same vocabulary but obviuosly with a different meaning - as there are no leopards on this island. As they failed to get a plausible explanation they with their current approach, they startet to think in more linguistic patterns and to apply linguistic methodology - and finally encountered a solution that seems to be rather promising:
Here’s where it gets tricky: word meanings tend to be contextual. In human language, we choose the most specific term available and, when we don’t, the listener infers that there is a special reason why we opted for a relatively vague word. Simply put, “words compete with each other,” Schlenker says. “And you use the more informative one.”
See the article on the Scientific American: Monkey See, Monkey Speak
Or read the paper by Philippe Schlenker et al.: Monkey semantics: two ‘dialects’ of Campbell’s monkey alarm calls
So, let's talk about Lady Gaga. She has a German poem tattooed on her Iggy Pop-side arm (which is the left one) and it goes like this (line break follows tattoo calligraphy):
Prüfen Sie, ob er in der tiefsten Stelle Ihres
Herzens seine Wurzeln ausstreckt, gestehen
Sie sich ein, ob Sie sterben müßten, wenn es Ihnen
versagt würde zu schreiben. Muss ich schreiben?
My favorite blog for applicated poetry Doktor Fausti Weheklag und Höllenfahrt posted an article on this tattoo, this poem, this writer and Lady Gaga, German magazine Titanic, Sister Act, mistranslations and some other corresponding topics.
Doktor Fausti Weheklag und Höllenfahrt: Wenn es Ihnen versagt würde to translate
Natural languaes (and some planned languages as well) bring forth strange flowers from time to time. For example, in many languages there exist sentences that are built of the same word or syllable all over. Let's call it a "repetion play" and take a closer look:
The following is a Chinese poem that tells the story of a poet who is craving for lion flesh while living in a cavern. This is an incredible example of those repetition plays and only possible due to the Chinese distinguishment of word by tone pitch. The following table shows the poem in Traditional Chinese, in Pinyin transliteration and as a translation, on the Wikipedia page you can also hear a native speaker reading it out.
« Shī Shì shí shī shǐ »
Shíshì shīshì Shī Shì, shì shī, shì shí shí shī.
Shì shíshí shì shì shì shī.
Shí shí, shì shí shī shì shì.
Shì shí, shì Shī Shì shì shì.
Shì shì shì shí shī, shì shǐ shì, shǐ shì shí shī shìshì.
Shì shí shì shí shī shī, shì shíshì.
Shíshì shī, Shì shǐ shì shì shíshì.
Shíshì shì, Shì shǐ shì shí shì shí shī.
Shí shí, shǐ shí shì shí shī shī, shí shí shí shī shī.
Shì shì shì shì.
« Lion-Eating Poet in the Stone Den »
In a stone den was a poet called Shi Shi, who was a lion addict, and had resolved to eat ten lions.
He often went to the market to look for lions.
At ten o'clock, ten lions had just arrived at the market.
At that time, Shi had just arrived at the market.
He saw those ten lions, and using his trusty arrows, caused the ten lions to die.
He brought the corpses of the ten lions to the stone den.
The stone den was damp. He asked his servants to wipe it.
After the stone den was wiped, he tried to eat those ten lions.
When he ate, he realized that these ten lions were in fact ten stone lion corpses.
Try to explain this matter.
In contrast, the Japanese example works not due to same syllables with different pitch but with different ways to read the same Kanji 子. There is a story around this sentence and the scholar Ono no Takamura meeting the emperor Saga Tennō. Here you can see the sentence as a seemingly meaningless repetition of the Kanji, the way to pronounce it correctly next to the way to write it normally as well as the translation.
|子子子子子子子子子子子子||neko no ko no koneko, shishi no ko no kojishi (猫の子の子猫、獅子の子の子獅子)||The young of cat, kitten, and the young of lion, cub.|
My favorite blog on nerdy things io9 came up with this some days ago with the english-centric title The most confusing sentence in the world uses just one word. But I have to admit: It is really confusing. Here, neither graphemes nor sounds are the source of confusion, but classical homonymy, i.e. the same word bears several meanings. This special sentence has its own website hosted by its inventor, linguist William J. Rapaport from the State University of New York at Buffalo with a complete history, many examples and discussions. Here you see the sentence, a (shortened) parse tree visualization of its parts of speech and a "translation" to understand the somewhat constructed meaning.
In most cases, German needs a small introduction in order to get a repetition play working, as in "Wenn Fliegen hinter Fliegen fliegen fliegen Fliegen Fliegen nach." which means thas flies flying behind other flies are flying behind other flies. But I have also found an example that comes without other words and makes also use of the homonymy. The content, however, is even weirder than in the English example...
|Weichen Weichen weichen Weichen, weichen Weichen weichen Weichen||Weichen [V] Weichen [S] weichen [Adj] Weichen [S], weichen [V] Weichen [S] weichen [Adj] Weichen [S].||If switch points avoid soft switch points, than switch points avoid soft switch points.|
|Der Mann, der die Aufsicht über den Bau der Brücke, die über den Fluss, der stets kaltes Wasser führte, führte, führte, führte ein aufregendes Leben.||[Aufsicht führen], [über einen Fluss führen], [kaltes Wasser führen], [ein aufregendes Leben führen]||The man, who leads the construction of the bridge that is going over the river that conducts cold water, has an exciting life.|
Ook! is a so called esoteric programming language and is a derivate of another one called (rightly) brainfuck. As programming languages can be understand as planned languages and as Ook! was designed in order to be understood at least by orang-utans I think it is only fair to consider it here. I present an example code to write the famous "Hello World" program next to the basic programming cocepts and the omitted output:
|Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook. Ook! Ook. Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook. Ook. Ook. Ook! Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook. Ook! Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook. Ook. Ook? Ook. Ook? Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook. Ook! Ook. Ook. Ook? Ook. Ook? Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook. Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook. Ook? Ook. Ook? Ook. Ook? Ook. Ook? Ook. Ook! Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook. Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook. Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook. Ook. Ook? Ook. Ook? Ook. Ook. Ook! Ook.||
One question remains: Why are so many animals involved in this...?
Almost all examples are extracted from Wikipedia and I have placed the respective links in the text before.
New Idea Engineering is a California based Enterprise Search Consulting company. Their homepage host an unteresting collection of texts to the business as well as to the technology of search. This includes for example:
- Search Industrie Overview: The search industry is an ecosystem with a number of different companies and related technologies that together provide complete solutions for intranet and customer-facing content search.
- Mergers and Acquisition Map: Like many dynamic industries, enterprise search vendors and companies with related technologies have grown not only by sales but also by acquisition and merger.
- Anatomy of a Search Engine