jeudi 31 décembre 2009

Harpes et orgues

Cette histoire avait été racontée, il y a bien des années maintenant, du vivant d'Antoine Reboulot, lors d'une conférence musicale qu'il illustrait dans la chapelle Saint-Louis de l'église Saint-Jean-Baptiste. Je me souviens encore de ce rire bonenfant, intense et profondément amusé, très sympathique, de l'aveugle qui n'a pas l'habitude du regard de l'autre.

Marcel Dupré, dans son temps, était un organiste de renom, et populaire dans sa cathédrale. Les dames de la haute, qui ne connaissaient que bien peu de choses à la musique, se bousculaient pour s'assoir près de sa tribune durant les offices religieux qu'il accompagnait, et pouvoir parler à l'homme ensuite, ce qu'il endurait de bonne grâce, mais sans grand enthousiasme.

Un jour, l'une de ces dames demande: « Maître, dites-moi, nous voudrions savoir, à quoi servent les pédales de l'orgue? Pas celles sur lesquelles vous jouez, nous comprenons bien sûr qu'elles servent à faire de la musique, mais les grosses qui se trouvent au dessus? ». Elle faisait référence aux quelques pédales d'expression qui ouvrent ou ferment des cages qui enferment les tuyaux, et aussi la pédale de crescendo, qui enclenche tous les jeux de l'orgue de manière progressive, en une cinquantaine de crans prévus par l'organier. N'ayant pas le goût d'entrer dans toutes ces explications techniques, Marcel Dupré s'est contenté de dire: « Ces pédales, mesdames, servent à accorder l'orgue! ».

Plus tard, ailleurs, dans quelque salon galant, ces bonnes dames s'empressent de répéter l'explication reçue, et quelqu'un de leur dire: « Ces pédales ne servent pas du tout à l'accord de l'orgue, j'ai bien peur qu'on se soit simplement foutu de votre gueule, mesdames! ».

Un temps passe, et ces mêmes dames se retrouvent assises aux premiers rangs d'un concert de harpe. Comme il faut s'y attendre, avec un grand sourire intéressé, l'une d'elles demande: « Dites-moi, à quoi servent ces pédales, à la base de la harpe? ». Et le musicien de répondre, le plus honnêtement du monde: « Ces pédales servent à accorder l'instrument! ». Les dames se lèvent alors et après un vibrant « Celle-là, on nous l'a déjà faite… », s'empressent de quitter les lieux, offusquées, outrées…

mardi 15 décembre 2009

Chrome et Delicious

Me voilà avec Google Chrome depuis quelques jours. Je voulais tenter l'expérience d'en faire mon fureteur préférentiel pendant quelque temps, délogeant ainsi Firefox dans ce rôle.

Je prends goût à sa vitesse de démarrage, au point que j'ai déjà moins tendance à protéger sa permanence sur un bureau. D'ailleurs, Chrome disparaît lorsqu'on élimine le seul onglet restant. Cela me rappelle un peu Vim, que l'on démarre et quitte plus allègrement que Emacs. Dans ce sens-là, j'oserais dire que Chrome est à Firefox ce que Vim est à Emacs.

Chrome offre une bonne foison de greffons, mais comme je n'en utilisais pas plus d'une demi-douzaine dans Firefox, j'ai fait l'hypothèse que je pourrais tenter d'être parsimonieux pour Chrome aussi. Quelques greffons dans Firefox sont moins nécessaires dans Chrome, qui offre d'embrée des fonctionnalités équivalentes. Pour les autres, j'utilise souvent Delicious bookmarks (by Yahoo), et Dafizilla ViewSourceWith.

Je n'ai pas essayé tous les greffons pour Delicious que Chrome offre, mais un rapide regard m'a donné l'impression qu'ils sont moins intéressants que celui de Yahoo pour Firefox. Bonne chose, dans le fond, puisque quelque chose me chatouille depuis un bon bout de temps déjà dans Delicious, et cela m'a permis d'y réfléchir. Le service offert par Delicious est indubitablement précieux, mais avec plus de 5000 signets et autour de 800 tags, la gestion en est progressivement devenu balourde, lente et même ardue. Il est difficile de garder une nomenclature consistante dans le temps pour un grand nombre de tags. Puisque plusieurs tags s'évanouissent de la mémoire, les signets associés deviennent indisponibles à toute fin pratique. Je me rends compte que j'ai besoin de garder une vision plus globale et plus synthétique de l'ensemble, et de structurer tous ces signets bien au-delà de ce que peut offrir des collections de tags.

À la recherche d'une solution qui me soit confortable, j'ai décidé d'explorer l'idée suivante: carrément intégrer mes signets Delicious à mes notes Tomboy. J'hésite un peu. D'une part, Delicious procède d'une mise-en-commun des signets, ce qui en permet l'évaluation collaborative, et je m'éloigne de ce partage. D'autre part, s'il est facile de passer de Tomboy à Chrome en cliquant sur un URL, je n'ai pas encore d'outil qui me permette de rapidement alimenter une note Tomboy d'une référence provenant d'une page Web affichée dans Chrome, ou d'éliminer cette référence. Avec un peu de chance, je trouverai bien quelque chose. Pour l'instant, j'accepte le risque et je plonge!

(P.S. — 2009-12-21 Je me suis finalement fait une extension Chrome qui copie, dans la sélection, l'insertion Tomboy à faire pour la page courramment affichée.)

Alors voilà. En utilisant l'exportation massive offerte par Delicious, doublée d'un script Python qui transforme cette exportation en un fichier textuel dans un format très semblable à celui de mes conventions d'usage dans Tomboy, plusieurs heures d'édition m'ont permis de transporter tous mes signets. Bon, c'est peut-être un peu rude, ici et là; ce premier jet a été rapide et massif. Je raffinerai sûrement avec le temps, et bien plus facilement que ce que Delicious me le permet.

Pour éviter toute confusion, j'ai aussi résolu d'éliminer l'essentiel de mes signets se trouvant sur Delicious. J'imagine que la plupart des utilisateurs auraient simplement laissé traîner, mais j'ai ce petit côté anal qui cherche toujours à nettoyer. L'interface Web fourni par Delicious offre une élimination en vrac, mais chacun de ces vracs est limité à 10 signets. Avec plus de 5000 signets, cela me fait des vracs qui me semblent minuscules. Je me suis rabattu sur le greffon iMacros for Chrome, qui m'a bien servi ici, en particulier par la possibilité qu'il offre de boucler sur l'exécution d'un macro. C'est un peu lent au niveau des mises-à-jour chez Delicious, et on gagne en vitesse en réduisant l'affichage au minimum (par exemple en évitant l'affichage de listes explicites de tags). Au pur clavier et à la souris, ce travail de nettoyage s'annonçait mortellement fastidieux, et iMacros a sauvé la donne!

mercredi 14 octobre 2009

PureData musings

Avez-vous déjà joué avec le logiciel PureData? Je suis tombé là-dessus en fin de semaine. C'est un logiciel de création, d'analyse ou de transformation de musique. Il traite à la fois MIDI et onde, les logiciels . Il a été étendu depuis aussi pour le vidéo et le graphisme, mais je n'ai pas regardé ces aspects là. C'est aussi un logiciel de programmation graphique : vous interconnectez des boîtes pour représenter des algorithmes plutôt que d'écrire du code source; et celui-là est très instructuré, dans ce sens que toutes horreurs y sont permises. Il est du niveau de CSound, ou de la circuiterie électronique.

Vous pouvez visiter, mais sans vous fier aux images affichées. L'interface graphique actuel de pd est basé sur Tk, dont l'aspect est très vieillot (regardez le menu sous pd webring à gauche, qui donne une idée beaucoup plus proche de la réalité). Si l'on passe outre la pauvre qualité graphique de l'interface, l'outil demeure intéressant à connaître.

Chez moi, il s'interface bien à OSS, Alsa et Jack pour l'audio, et prend automatiquement en charge toutes les entrées et sorties MIDI configurées dans mon ordinateur, aussi bien l'équipement externe que les instruments complètement logiciels. Alors j'ai rapidement pu y jeter un coup d'oreille ☺, si j'ose dire.

lundi 14 septembre 2009

Sound of 64 feet pipes

Quite interesting, I presume, for the sound technician in each of us! I stumbled on this recording of 64 feet opened pipes from the oversized organ in Atlantic City. More precisely, we hear the two lower scales (only the white notes), and the 15 notes are played descending, one at a time.

I insist on opened pipes because it is quite usual, in organs, to close the top of pipes as a way to force the belly of the sound wave at the end of the pipe, while in an opened pipe, the belly is located at the middle of the pipe. So, a closed pipe has its effective wave length multiplied by two, compared to an opened pipe of the same length, and consequently, the sound frequency is divided by two. For the lowest notes, which require the biggest pipes, closing them is a way to be economical in space, weight and money.

However, odd harmonics have more amplitude in a closed pipe than even harmonics, and this gives the sound a particular colour. The overall sound is also a bit weaker. But in the recorded sound as above, the longest pipes really use 64 feet, there is no trickery on 32 feet pipes.

Could we really speak about sounds here? Frequencies are below the sonic threshold, we can even hear the separate wave peeks one by one for the lowest notes. These pipes are like enormous whistles, where sound happens in very slow motion. Yet, imagine the blow it would take for sounding such gigantic whistles!

It is especially amusing, for me at least, that we can recognize the notes even if the pipes play subsonic waves. This is a consequence that many harmonics of the fundamental are in the sonic range, and these are what we hear briefly beyond each separate vibration of the fundamental.

mardi 25 août 2009

Défense de l'orgue Saint-Louis

Jan Walgrave m'a signalé, hier, l'existence d'un groupe sur Facebook dédié à la Défense de l'orgue de Saint-Louis .

Je viens de m'inscrire à cette liste, et j'ai regardé avec plaisir les quelques vidéos qu'on y trouve. Ils mettent bien en valeur l'acoustique très généreuse de l'église Saint-Louis-de-France, et les riches sonorités d'un orgue que j'ai eu beaucoup de plaisir à toucher, vraiment.

Un peu à la course, je n'ai pas encore pris le temps de tout lire. Mais je suis étonné par ce que Jan a récemment écrit: "j'attendrai dans le détour certains pontifes qui voudraient me faire taire aussi". Étonnant! Connais-tu vraiment des gens qui veulent te faire taire? Ou s'agit-il plutôt de craintes? Les gens n'ont pas toujours les mêmes opinions, et c'est même souvent fertile. Mais de là à vouloir museler, il y a une marge. Jan, si tu me lis, à quoi fais-tu référence, ici?

Chose certaine, il y a une question de sous derrière tout ça. J'ai survolé dans les archives de ce groupe des suggestions pour la création de lois protectrices et l'établissement de moratoires, mais pour que tout cela ait un sens, il faut en même temps trouver l'argent qui entretient les orgues et qui chauffe les églises l'hiver.

J'ai été témoin un peu de ce qui s'est passé à Saint-Louis-de-France, et suis même surpris que cette paroisse ait pu survivre aussi longtemps, compte tenu de ses très faibles revenus. Personne n'a pu proposer de solution viable pour sauver l'église, et l'orgue. Mais je n'ai pas du tout la fibre de la finance, remarquez, alors mon imagination ne va pas très loin dans ce domaine.

La très faible évaluation de l'orgue, par son constructeur même, m'a décontenancé. L'orgue ne vaut pas beaucoup plus, semble-t-il, que le coût de sa complète réparation. Ou peut-être, plus probablement, que la loi de l'offre et de la demande est telle que le cours de l'orgue, si j'ose dire, n'est pas très élevé de ce temps-ci. C'est un coup dur, mais il faut accuser le coup.

En bref, à mon avis, pour Saint-Louis-de-France, c'est vraiment trop peu, et beaucoup trop tard. Nous allons nous épuiser à vouloir sauver précisément celui-là, et nous fabriquer en cours de route beaucoup plus d'ennemis qu'il nous en faut. Peut-être vaudrait-il mieux identifier plusieurs orgues méritoires qui survivent encore, et trouver maintenant comment les protéger pour plus tard.

Les orgues sont tous uniques et originaux, chacun à leur manière, c'est dans l'essence même de l'art et de la facture. Ils ont tous, forcément, une valeur patrimoniale. Quel que soit l'attachement qu'on leur porte, on ne pourra les sauver tous. Il faut s'y faire. Utiliser une rhétorique remplie de révolte ou de scandale n'aidera pas beaucoup la cause. La pire chose à faire serait de cultiver notre frustration, puis critiquer le gouvernement, l'évêché, la population civile, la population catholique, et tout ce qui bouge ou ne bouge pas.

Il faut plutôt examiner, froidement, calmement, les actions à prendre, sans fanatisme, en examinant tous les aspects de la question, pour au moins prouver à nos interlocuteurs que l'on a les pieds sur terre, et pour mériter aux yeux de tous une certaine crédibilité. Et si des actions raisonnables sont possibles, alors vraiment, vraiment faire plus qu'en parler, ou exiger que d'autres les fassent.

dimanche 2 août 2009

Tomboy, Web and maths

Seeing the tomboy-LaTeX add-in for Tomboy, I decided to give it a try. Here are my first impressions.

More it goes, more I use Tomboy for taking notes and organising my work in various ways. Many pages on my Web sites are now Tomboy notes, converted to HTML using my tboy and site.mk machinery. The Tomboy page holds, near its beginning, a few links yielding to other Tomboy-related comments of mine.

Popping an existing Tomboy note, either through the menu of recently edited notes, by following a link from another note, or though the search facility, is easier to me than traditionally looking for a file and calling an editor on it. Slight or moderate edition is immediate and simple within Tomboy. So I converted many of my previous reST or HTML notes to Tomboy, knowing they are going to be easier to manage, while being converted back to HTML of quality comparable to what it was before. Granted that reST is far more expressive, yet I found out that I do not need all of reST power on the average.

I have a few notes, not so many, which are a bit more mathematically oriented, and for which I used a kind of character art for representing the formulas. So, when I recently stumbled on the tomboy-LaTeX page, I felt it might be worth a try. I'm not really setup to compile C# code, and did not succeed at compiling Tomboy from sources the last time I tried, so I feared tomboy-LaTeX might be difficult to install. But to my pleasure, it went rather well: the add-in compiles simply and quickly, independantly of Tomboy itself. The activation of the add-in is also a simple matter.

I initially found tomboy-LaTeX suprisingly speedy. I was expecting worse, given that for each mathematical snippet in a Tomboy note, processing or external programs are needed for generating a LaTeX source, and for turning it successively into DVI, then PostScript, then PNG. But that speed impression vanished when I started using LaTex within real, actual notes. The conversion appears fast enough indeed, but the overall edition crawls to the point of irritation: the cursor often disappears out of sight for many seconds, and inserting or deleting a single character often take a long time. I presume this is related to the number of formulas in a single Tomboy note, but am not sure. However, when there is no or little math involved, like in this very note I'm writing, everything is speedy as normal.

Using LaTeX in my case in only acceptable if it can be extended to automatic HTML conversion, so this was the natural next step. There does not seem to be a universally supported way to embedd images within an HTML page, we can only include references to external images. This means that we need ways for naming each mathematical snippet extracted from Tomboy notes, turning these into images, caching the work already done so to avoid useless regeneration, recovering the cache if it existed already, and cleanup of images which are not needed anymore. Everything being fully automatic, of course.

I'm not overly satisfied with the result, even if is acceptable to a certain extent. See the page choose(n, k) is an integer! for an example, which I criticize at least on these two points:
  • The mathematical images maybe aligned either top, center or bottom (which is the default), and best is to center formulas within the surrounding text. However, at least within Firefox, the image is apparently centered with respect to the base line of text, while it would be nicer to center it with respect to the center of text. When the formula does not use more than the vertical width of a line, this is clearly too low, so I tried to use bottom alignement for those, which yields an image which, despite being too high, is less shocking. It is uneasy to decide, while generating the HTML, if a formula uses a single line or not, so I felt back on the naive heuristic of checking if the formula contains any curly brace — which are needed whenever LaTeX coding is a bit more complex — as simple formulas, like variables, do not need any on the average.
  • The font used for mathematical rendering is a bit small, and I am not familiar enough with LaTeX for resizing it. For the HTML rendering, I merely reused the recipes found within tomboy-LaTeX source code — all hails to source code! The small font problem is especially noticeable with my notation. I was not patient enough to try more substantial LaTeX, in view of really using the rendering I hoped for, so this is a compromise. tomboy-LaTeX does not provide an easy way for augmenting LaTeX with one's own definitions, as each mathematical snippet starts with a virgin environment.
Within Tomboy itself, tomboy-LaTeX uses bottom alignment, which is not ideal, and even a bit ugly at times. Nevertheless, previewing the rendered mathematical formulas within Tomboy notes, almost in real-time, is a comfortable capability.

vendredi 31 juillet 2009

Gitification of Tomboy notes

I was previously using Dropbox to spread Tomboy synchronisation directories between the machines I access, yet I sometimes forget to Tomboy synchronize before leaving home or work, feeling a bit miserable afterwards — as I now depend on Tomboy for many aspects of my duties. However, as I never forget Git synchronisation, the idea came to me that I should use Git to synchronize Tomboy notes, just like for most of my other things.

Tomboy is a wonderful and very useful tool. Yet, its internal file and directory formats are under-documented, and some guess work is needed here and there, when I want to handle my Tomboy notes through various scripts. There is a D-Bus interface that I could use, and this is how I originally started my tboy tool. But I found out I do not master that interface so well; so tboy was sometimes using D-Bus, and other times avoiding it. As tboy was progressively growing to accomodate my various needs, I finally decided it was easier to uniformize it towards direct reading of directories and files, with the practical result I almost never use D-Bus by now. Another tiny advantage is that tboy may work even when Tomboy is not running.

The guess work may have strange consequences. I got the persistent impression that the Tomboy synchronisation directories were designed in such a way to preserve the evolution history of successive synchronisation calls — each being called a revision in Tomboy terminology — yet I had some difficulty in deciphering every detail of this. Sandy Armstrong, the actual maintainer of Tomboy, is especially friendly and talkable, so I dared asking him for some help. He explained to me that maintaining the synchronisation history has never been an intent in Tomboy, and that if I ever find two versions of the same note, than I'm uncovering a Tomboy bug. Surprised by this statement, I made a more thorough examination of my whole Tomboy synchronisation directory, and found a lot of duplication. So there is a bug in Tomboy sync in which the cleanup does not work properly — I'm not fully sure, but the cleanup apparently works only in a few cases, for notes being handled in adjacent revisions. All the clutter which results holds a lot of recoverable history. So that particular bug was quite productive in my case ☺.

Each revision has an associated number, counting chronologically from zero. The synchronisation directory has one subdirectory per revision, named after the revision number. To be precise, a revision directory is two-level down from the top synchronisation directory, as directories are grouped one hundred at a time (the grouping directory merely uses the hundred digit of the revision number). Each revision directory holds a manifest.xml file explaining which notes were still existing at that revision level, and for each note, at which revision it was last modified. The revision directory also holds the full note contents for notes which were introduced or modified at that revision level.

The gitification works in two passes. The first pass establishes, for each note, all revision numbers for which the revision holds a full note contents for that note. It also attributes a timestamp to the revision (the manifest.xml modification time seems a good estimator). The second pass checks, for each revision and referenced note, if we still have its contents at the last modified revision. Most of the times, because of the cleanup bug, we do. If we do not, than we pick a copy at the closest higher revision where the full text of the note exists. If there is no such higher revision, then most likely, the note has been deleted for good, but still exists in the ~/.tomboy/Backup/ directory, so we pick that backed up note instead.

Now that we have a set of existing notes at each revision level, it becomes a trivial matter to restore a previous state one revision at a time, and to generate Git commands for staging and commiting that state. After the execution of the transformation script, I manually copied back the few other administrative directories from the original ~/.tomboy/, that is: addins/, addin-db-001/, sync_temp/ and Backup/. With Tomboy stopped, the final step has been to replace ~/.tomboy/ by the new one.

When I gitified my Tomboy notes as described above, 375 notes existed after the 187'th revision, and I was ideally expecting 187 Git commits. Whenever the Tomboy sync cleanup code worked correctly, some historical information was lost, to the point that some commits were discarded as being empty. I ended up with 176 commits, which is rather satisfying result. The space savings are interesting as well. The Tomboy synchronisation directory was taking 22M; the resulting Git pack uses 1,5M, while the checkout itself uses 1,8 Meg.

While Tomboy built-in synchronisation works live, proper care is needed to stop Tomboy before Git synchronisation, and start it afterwards. If one plays straight and never takes chances about this, there is no reason to have a synchronisation problem ever. As I use many scripts and tools for moving files between machines, as I wander between home and work, it seems easy to slip a few more commands in the scripts ot make sure Tomboy does not run when it should not be running.

As the cleanup bug will likely be corrected in Tomboy some day, the whole trickery above, however fun it may be, might not work for long ☺. However, I'll continue using Git to store history, do synchronisation, and even knowingly let conflicts get in, knowing the powerful machinery I now have to resolve them.

vendredi 10 juillet 2009

Away from os.path.walk

Quite a number of times in all these years, I needed to explore directory hierarchies in my programming. When I switched to Python (it was version 1.5.2 then), os.path.walk was the documented way to do it, and I surely used it a lot. Yet, the functional argument was sometimes awkward for fitting the exact processing functionality I needed for each particular directory walking. The impedance mismatch between the functional argument and os.path.walk has been painful more than once (the From Cfengine to Python case comes to my memory as I write, I do not know why this one in particular).

Walking big directory trees has never been inordinately slow in Python, as it is I/O bound overall. This I/O may well be dominated by stat calls, checking if a particular directory entry is a sub-directory or another type of file. In Unix, each directory is pointed to by its father directory, has a . entry pointing at itself, and is pointed to by all .. entries from its immediate sub-directories, and these all contribute to the link count for that directory entry (the root directory has no father, however). GNU find uses this fact for optimizing out stat calls: it saves the link count and monitors the number of sub-directories seen so far, as soon as enough sub-directories has been seen to explain the link count, it safely assumes that the remaining entries have no further directories, without any need to stat them out. Of course, stat may be needed for lot of other checks within find, in which case the above optimization is useless. But in the most frequent use, by far, the optimization well applies. I discussed the thing with Guido and implemented it for os.path.walk, but finally never officially submitted the patch, as for some reason I never figured out, the patch did not yield the spectacular improvement I expected ☺.

The later os.walk, despite undoubtedly nicer than os.path.walk, was still a bit insufficient for my needs. It offers more control over some aspects of the walking, while loosing others. I rather see os.walk as part of the initial effort for pushing iterators in the Python library and in all Python programmers' mind. So I finally gave up completely on using the provided wrappers, and instead started rewriting the walking part explicitly within each application. This is only a few lines of code along:


  stack = [top_directory]
  while stack:
      directory = stack.pop()
      for base in os.listdir(directory):
          name = os.path.join(directory, base)
          if os.path.isdir(name):
              if not os.path.islink(name):
                  stack.append(name)
          else:
              # File processing part


(Note: the check for a symbolic link avoids a looping bug, when the link points to a directory which is higher in the hierarchy. Thanks to Al Danial for pointing out this potential problem.)

These may be torn and bent at will, depending of the particular need — pre-order, post-order, anything in between or elsewhere, name it — much more easily that if implicit recursion was used: the recursion stack is explicit, here. The advantage is especially bold when writing generators: if the file processing part contains any yield statement, implicit recursion gets cumbersome and expensive, having to chain imbricated yield loops just for delivering results at the outer level.

Now, let's presume the file processing part delivers diagnostic of some sort, prefixed by the directory name. Humans like output to be sorted, because the eyes are usually the search tool. And besides, output is more entertaining when sorted, one has a better feeling of a progression, than with a jumble of all-mixed up directory names that psychologically appears like if it will never finish.

One solution is to accumulate all output first, sort it, and only then display it. While being conceptually simple, this is not perfect. This is not a real problem usually that it spoils the memory savings from using a generator, but it surely defeats the elegance of using one. We lose the entertainment resulting of more progressive and parallel output.

There is only one way to keep the advantages, and this is to directly walk directories in sorted order. This means that appending names at the end of a stack, and popping from the end of the stack, is too naive. On the other hand, since the above loop is often rewritten, it should nevertheless stay simple. Here is one easy way to do so:


  stack = [top_directory]
  while stack:
      directory = stack.pop(0)
      for base in sorted(os.listdir(directory)):
          name = os.path.join(directory, base)
          if os.path.isdir(name):
              if not os.path.islink(name):
                  stack.append(name)
          else:
              # File processing part


There is only two modifications: we pop from the start of the stack instead of from its end, and we sort the bases separately for each directory. Doing so, we append in a kind of sorted order. But it still has a few disadvantages: the stack will be shifted whole at each removal, the sorted function produces copies, and the sort is not exactly what one wants: it will be a bit like restarted at each directory level, exploring width-first.

I attempted several remedies to these drawbacks over the years. To give only one example, I once saved directories in an intermediate list, reverse-sorting it before appending it to the stack, so I could still pop from the end of the stack, do smaller sorts, sometimes sparing sorts depending on the precise output I knew I was to get, and God knows what. But deep down, this is all annoying optimization complexity.

Here is the last trick I recently found in this series, and yet another application of priority queues — the documentation of which amusingly recycles another article of mine! ☺ As I still feel it now, it is surprinsingly elegant. Enough at least so I share it with my readers:


  from heapq import heappop, heappush
  stack = [top_directory]
  while stack:
      directory = heappop(stack)
      for base in os.listdir(directory):
          name = os.path.join(directory, base)
          if os.path.isdir(name):
              if not os.path.islink(name):
                  heappush(stack, name)
          else:
              # File processing part


It seems to work! Each generated name is produced by suffixing the last directory extracted from the priority queue; so being lexicographically after it, it can be inserted back in the heap within the same run (in sorting terminology). So, this is all a nice, simple compromise between both snippets above. Instead of shifting the whole stack at each removal, we rearrange only a logarithmic number of elements. Instead of doing a whole sort at each directory level, we spread a single heapsort over all directories. A real nice effect is that the traversal appears like lexicographical depth-first. The optimization complexity wholly vanished. Moreover, the code is not essentially longer nor different than the simplest which comes to mind. As a result, I find it quite easy to remember.

After thoughts


(These come from a recent conversation with Guido on the above.)

I did not time the above loops, but I'm pretty sure the improvement is more on the side of conceptual elegance. sort() is very fast, to the point one may easily abuse of it without having to pay the penalty. Unless we have a big list, and as I perceive it, sorted() is not so different than list() efficiency-wise. Unless sort is used in the bottleneck of a computation, there is no compelling reason not to liberally use it. I would have a hard time advocating that, for mere efficiency considerations, heapq is really the best approach.

In practice, if we keep a stack of unprocessed directories, that stack will be rather small on average, so popping the first element is likely faster that heap-popping it, even if it implies N operations rather than log(N). When N is small, log(N) is not very different from N, and mere shifting does not involve comparisons.

I said that there is a single heap sort over all directories, rather than one sort per directory, as an argument that we save processing. This is debatable. Each per-directory sort handles a small number of entries, and calling sort K times on N / K entries each time is roughly the same effort as calling sort once on N entries. However, the heap sort only sees directories, while the per-directory sorts handles both files and directories.

For this discussion, I was interested in sorting directories, not files. The precise case I had when I wrote this, was to produce one .gitignore file per project out of all .cvsignore files within that project, for many projects held in a single big directory hierarchy. So, there was not much to diagnose about non-directories. If files ought to be sorted as well, per-directory sorts are still required, and then, the elegance benefits of heapq fade out a bit.

jeudi 14 mai 2009

Tomboy report 582696

Trackers and robots were not kind with me in the past, they each have their own flurry of bugs, drawbacks and limitations, some even loose submissions at times. Some maintainers just go overboard with their tracker toys, they should play all their soul themselves, and give some rest to their users. The truth is that I learned to hate robots, I've been avoiding them for years (life is too short! ☺). Yet, Sandy insisted that I make an effort at using the Tomboy one, and since Sandy has been so nice to me, I'll comply and try again, once more. But I'll keep a copy of the submission on my side, at least for a good while, just in case!

Hi, Sandy. You see, I'm trying your robot! ☺

The Lier and Texte buttons (likely Link and Text in English) are fairly close to one another, and when doing massive editing, or being tired, or various other reasons, it may happen that I click on the wrong one: I highlight a word and then want to change font size or feature, but accidentally ask for a link instead. Now, if this happens to be a common word, I might have created many, many dead links (because I will end-up deleting the wrongly created note).

Undoubtedly, this is my error. Yet, a friendly tool should help me at not hurting myself too badly. So here is my suggestion, hoping it is affordable to do. If Tomboy could estimate, when I create a new note, how many links would be created from everywhere else in other notes to this new note, it could request prior user confirmation, whenever that number exceeds some threshold (five seems a reasonable value). As a consequence, creating a legitimate new link is likely to require no confirmation in practice, and creating a link over a common word is likely to request for a confirmation. If the link is wrong with fewer than five induced links, it would then be reasonably fast to use the Find links on this note from the newly created note to spot the links that will die, once the newly created note gets deleted.

The same confirmation might also be available whenever a dead link is resurrected. It is an easy error, clicking next to a dead link with the intent of erasing it and typing it again (non-dead this time), to accidentally click it instead, and to instantly destroy some patient prior work meant exactly to un-dead-ify multiple occurrences of that dead link at many other places.

Maybe that five would be too high a number for some users, and too low a number for others. It could be set from the Preferences window. Zero would imply unconditional confirmation, a high value would correspond to the actual behavior.

I presume that this suggestion would cover the main absolute source of spoiled time, in my experience of Tomboy so far. Hoping that you would look at it with a favorable eye!

dimanche 3 mai 2009

Dia criticism

 Dia is a sophisticated editor for diagrams, which I tried many years ago, and a second time more recently. This tool is loved and cherished by many users, and directly available within most Linux distributions. Clicking on this logo yields you to the Dia site. However, this product did not leave me happy, and I explain why below.

A little while ago (a month maybe?), I needed to produce a few diagrams, and looked around for some tool of the right availability and size. I wanted it bundled for the few systems I use at work or at home, or if not, at least very easy to install. I refused to go overboard with UML, some tools being rich and complex, but not so usable outside UML — I was more attracted to eclectic tools. At the other hand, I did not want to go too simple, expecting a minimum of flexibility and rendering quality.

Dia seemed to fit the bill pretty well. I vaguely remember I tried it, years ago, and was not satisfied. Revisiting the documentation, it still looks attractive. Maybe I was too stubborn at the time to understand all benefits? Maybe the package evolved over the years, and I would see it differently now? So, I gave Dia a good and honest second try, studying it afresh with the most opened mind possible. I used it quite seriously, for many days and a few types of projects.

Reluctantly, I had to give up. It did not pass the reality test for me and did not work well. I realized my quest was not over, and that I had to seek for something else. In the end, I finally opted for Inkscape, which is surely less featured than Dia in its speciality: if one moves objects aroud, best still is to revise all arrows manually. Yet, Inkscape is more usable overall, so the pros and the cons counterbalance in practice. Also, I have the rather strong feeling that it is a good investment in the long run, so the learning effort is more productive.

The main problem with Dia, for me, has been stability. I can of course suffer a few bugs here and there and work around them, but in the Dia case, their pace of arrival or severity was too high for me. Many years ago, I would have fearlessly contacted maintainers and users groups to report all the problems, but experience taught me that this might be pretty time-consuming, and I'm not available enough to do all the research and tries which usually accompanies or follows such discussions.

I saved a few points which annoyed me with Dia. Time going by, these progressively fade in my memory. Yet, I'll list them here merely to remember a bit, and not be tempted to return there a third time ☺. I'm surely not seeking a systematic rebuttal or discussion. So if you are a Dia proponent, please do not take it too personnal: your comments are welcome, just do not push to convert me back…
  • Saved .dia files are not equivalent after transport between different Dia versions, even for fairly simple works. One may not afford to loose work because one travels between computers, and it is unreasonnable to expect all to use the same Dia version at all times.
  • Producing .png output produces it truncated, part of the diagram is missing.
  • Producing .svg seems to be lacking elements. I suspected that layer order is not properly represented, and so, elements are spuriously hidden, but this is a mere hypothesis.
  • Placing lines using other points than hot points is quite difficult, one has to fight with the mouse, even if the mouse usually does not win.
  • When editing a text, it might spontaneously return to the center of the box where it has been initially introduced, and repositioning it many times becomes irritating.
  • When working on two diagrams, any click in the toolbox raises the first diagram over the second, making it tedious to work on the second.
  • If one uses Ctrl-E at start, this chooses %nan for a zoom value, the application then freezes.
  • Starting the application, the F9 windows spuriously pops up, and I did not find how to prevent this.
P.S. - @burakbayramli says: Francois: What is your alternative to Dia? I am a user myself, and would love to hear about what's out there.  (I had a hard time reaching him, should try later)  I also had good comments from Sergio, should resume this all here.

lundi 2 mars 2009

Passage à Tomboy


 Les pannes du réseau électrique, dans notre province, peuvent durer parfois plusieurs longs et pénibles jours. Les gens veulent comprendre. Bon, la tempête de verglas d'il y a quelques années était crédible. Mais le nombre de taches sombres à la surface du soleil pour expliquer les pannes, ça, ça passe moins bien. Finalement, après bien des années, le chat sort du sac! Échappée des documents secrets d'Hydro-Québec, voici la véritable raison pour laquelle tout dure parfois si longtemps. ☺

Depuis 2009-03, je transfère mes notes personnelles et diverses, à partir de plusieurs fichiers en format Allout, sous la forme de notes Tomboy (et ce lien aussi). Il m'en reste encore beaucoup à transférer, mais j'ai au moins transporté l'essentiel en ce qui concerne ma gestion quotidienne, et les travaux pour mon employeur. La prochaine étape s'occupera de mes fichiers NOTES et PLAN pour les loisirs (musique, dessins, bricolages, etc.). Tout le reste suivra ensuite. C'est un changement non-négligeable (!) dans mes habitudes de travail, mais je sens que ça va dans la bonne direction.

De plus, ça pousse bien dans une direction que Richard Nault m'avait déjà décrite comme un idéal, lors d'une longue discussion que nous avions eu au SRAM, l'un de ces soirs où il était particulièrement enthousiasme et inspiré. Il espérait alors, et ardemment, un outil qui lui permettrait de facilement sauver et éditer ses idées sous la forme d'un réseau de notes fortement inter-reliées (et non pas dans une nécessaire hiérarchie d'idées).

À l'usage, encore récent, je me rends compte que l'outil est fourbi de détails d'opération intelligemment pensés, et c'est encore un plaisir et une découverte pour moi. Je l'utilise avec Gnome sur Linux, avec lequel il est bien intégré, c'est certain. La documentation dit, quelque part: Tomboy is a desktop note taking application for Linux and Unix. Mais j'ai vu dans les notes du site plusieurs références à une implantation sur Windows, alors j'imagine qu'elle existe, mais sans l'avoir expérimentée. Comme Tomboy est écrit en C# (un langage privilégié par .NET), j'imagine, mais sans le savoir vraiment, qu'il doit aussi être orienté vers les usages sur Microsoft.

Il m'est venu à l'esprit qu'un autre outil Python, qui s'occupe déjà de convertir une partie de mes notes Tomboy vers le Web pourrait être adapté pour rattraper toutes les notes Tomboy se terminant par " blog entry", les ordonner chronologiquement, et les ajouter à mon pseudo-blog. Finalement, j'ai fait mieux que ça et me suis libéré du besoin de suffixer le titre spécialement. Je place une date comme première ligne d'une note Tomboy si je veux qu'elle se retrouve au pseudo-blog. Lors de la transformation de toutes mes notes Tomboy vers HTML, le convertisseur note la présence des dates et sauve la référence. Le blogue est ensuite produit, en ordre de date descendante, durant les phases finales de la conversion. J'ai transformé toutes les entrées existantes de mon pseudo-blog en notes Tomboy de telle manière que les sources véritables s'y retrouvent en entier

Je voulais aussi trouver moyen de garder l'idée d'une photo avec un court commentaire pour chaque entrée, cet aspect me semble agréable. C'est fait aussi, et même pas particulier pour mon pseudo-blogue. Je peux maintenant attacher à toute note Tomboy une ou plusieurs photos, chacune possiblement cliquable vers un autre lien. Ces photos sont attachées pendant la transformation vers HTML.

Chose certaine, je devrais nourrir mon pseudo-blogue plus souvent. Je ne manque pas de matériel pour le faire, mais il faut que le temps pris par l'édition et les transformations soit vraiment réduit au minimum. Je sens qu'il y a une avenue, dans ce que je viens de rapidement décrire. Ça devrait dorénavant m'être un peu plus facile. On verra bien si j'en profite!