GSoC wrapup report

Hey all. The following is a (admittedly rather thorough) “wrapup” report on my Google Summer of Code project entitled: “TranslateSvg: bringing the translation revolution to Wikimedia Commons”. TL;DR: I’m happy.

—-

On 9 July 2011, South Sudan declared independence, and during that buzz, an Italian Wikimedian found his map showing the borders of the new nation had been translated into a dozen other languages, among them English, Greek, Catalan, and Macedonian. These copies were then uploaded onto Wikimedia Commons as separate files. Of course, one would expect the map to change significantly over the next decade. More often than not, these kinds of change are picked up first by editors of the larger projects, who rapidly update their own versions of the map. To do so takes, say, 20 minutes; but to replicate that same change across Catalan, Greek, Macedonian? Hours of work – and dozens of separate uploads.

My project, named “TranslateSvg”, aimed to change this workflow – for SVG format files at least – firstly by making it easier to translate those files (thus reducing the all-too-common sight of English-language diagrams in use on non-English wikis), and secondly by embedding the new translations within the same SVG file. When boundaries change, a single update will propagate to all language versions instantly. That was the intent, anyhow.

Overall, a lot has been achieved: a test wiki was set up, and, if I load the bleeding edge code onto it, the software is both feature complete and has been updated in line with user comments. The video at [1] gives a good idea of the current interface and how it works; I’ll send another message to this list when the test wiki reliably uses feature complete code.

The most pleasing (and indeed satisfying) thing, however, is that nothing I wanted to achieve was “left behind”. Admittedly, a few things aren’t quite as polished as I’d like them to be, and there are still a few weeks’ worth of code review left to do. But fundamentally, it is (or will be) what I want it to be. Mostly, I attribute this to some prototyping work I did before I pitched for GSoC, which allowed me to come up with a plan I knew to be doable (or more accurately, doable by me), which avoided the costs of running into deadends late in the process.

Once code review is complete, there’ll be at least one more testing phase, this time with specific questions, followed by a pitch by me to Wikimedia Commons. Only after that will I even utter the “d” word in the context of TranslateSvg.

I ended up with quite an unusual mentoring setup. In the end, the work of mentoring me ended up being split between my official mentor for the project, Max Semenik (MaxSem) and the original author of the Translate extension (which, early on in the project, I decided to use as a foundation for my work) Niklas Laxström. Both have been very helpful, especially with code review and generally “keeping an eye on me”, with Niklas (I think it’s fair to say) taking the lead in places due to his specialised knowledge. Actually, this worked out well, but my advice to potential applicants would be to think about mentor choice carefully, considering what support they’d need *from their mentor* and what they might instead be able to source *from the community in general* in order to avoid overloading their mentor. We have a great community, and thankfully I knew quite a few people already, so I could tap that more easily.

Of course, I am greatly indebted to both Max and Niklas, as well as the literally dozens of people who at some point contributed via IRC (there’s another protip—get on IRC early! *Such* a useful resource). Just off the top of my head, that list includes Andrew and Ryan for the Labs stuff (which turned out to be the most challenging aspect of the summer, mostly because I hadn’t considered it at all before [1]), Mark and Timo’s help with JavaScript stuff, Sam for his general omnipresence, especially when a quick review was needed, Federico, Amir and all the other potential users of the extension who tried it out, plus of course Sumana and Greg for keeping the whole thing going. There are plenty of other people I’ve forgotten, I’m sure: there are simply far too many to properly remember.

—-

Once again, thanks everyone and I hope to keep you posted over the coming months about further progress.

[1] I think this is particularly worth flagging up because I can’t be the only student whose experience lay with PHP (etc.) programming rather than system administration. Thus, it was probably worth thinking about this earlier and thus coming up with a considered plan of attack.

GSoC update

No blogposts for a week: but what have I been up to?

Well, the short answer is “mostly code review”: getting my code tidied up, polished, submitted for review, reviewed and merged.

It’s not glamorous, and significant rewrites can take a great deal of time. But it’s working: I’m closing in on getting Translate’s translatesvg branch merged, and code for TranslateSvg proper is beginning to filter through the review system.

I’ve also been tweaking a few bits and pieces as a result of the testing phase, which began a week ago.

TranslateSvg v2.0 beta testing begins

After a short delay while I sorted out a Wikimedia Labs account, I am pleased to announce that version 2.0 of the TranslateSvg extension is officially available for testing.

TranslateSvg enables the easy translation of virtually any (currently 93.1%, but increasing all the time) SVG image containing text, with the result embedded into the SVG file so that graphical updates instantly propagate to all language versions.

Available for testing are three images to give you a feel for the interface. There’s likely to be one future change – the introduction of an extra dialog box – but it’s 99% feature complete. Well, until *you* tell me what’s wrong with it 🙂

So what are you waiting for? Find ten minutes and get yourself to http://translatesvg.wmflabs.org/wiki/Main Page .

From 87% to 100%

A week I posted about how TranslateSvg can handle 87% of all translatable files (in fact, the figure is probably now at around 84.5% due to some methodology tweaks). I haven’t been working on that much since, but I did run an analysis of how to get from there to 100%, a move related to my original analysis of the structures I need to support. The breakdown, then, is as follows:

  • 84.5% – already supported
  • 7% – ability to look inside style tags
  • 3.5% – supporting random clutter inside text tags, not sure what this might be yet
  • 2.8% – support for existing switches with deep hierarchies
  • 2.2% – support for nested tspans

Looks like I’ll be working on some of those, then, over the next fortnight.

UPDATE (30 July): Currently

  • 93.1% – already supported
  • 3.0% – support for existing switches with deep hierarchies
  • 2.5% – support for nested tspans
  • 0.8% – IDs used in CSS
  • 0.6% – supporting random clutter inside text tags, still not sure what this might be

UPDATE (11 August): Currently, and probably for the forseeable future

  • 96.0% – already supported
  • 2.3% – support for nested tspans inside tspans (these don’t actually render correctly on Wikimedia wikis anyway).
  • 0.95% – support for random clutter inside text tags (mostly textPath plus some custom namespace tags)
  • 0.75% – IDs used in CSS

TranslateSvg: what’s it for?

Public domain map of South Sudan (click for more details)

As my Google Summer of Code project progresses, I realise that I haven’t got any blog posts actually explaining what TranslateSvg is for. Thus, I thought I should at least give one example (there are many I could have picked from) to illustrate the point, so here goes.

On 9 July 2011, South Sudan declared independence. A year on, 142 Wikipedias have created some sort of entry about it, many of them during the initial buzz. I haven’t checked, but I suspect a high proportion haven’t really been edited since.

Several months before, an Italian Wikimedian created a map showing the likely borders of the new nation and its proposed state boundaries. Sometimes with the aid of an existing tool, that map was then translated into other languages, among them English, Greek, Catalan and even Macedonian. These copies were then uploaded onto Wikimedia Commons as separate files.

So far, so good. But South Sudan is a state in its infancy. It has numerous boundary disputes ongoing, and no-one really knows if the state boundaries have been drawn in the ideal places. Thus, one would expect the map to change significantly over the next decade – if it has not changed already. More often than not, these kinds of change are picked up first by editors of the larger projects, who rapidly update their own versions of the map. To do so takes, say, 20 minutes; but to replicate that same change across Catalan, Greek, Macedonian? Hours of work – and dozens of separate uploads. So, editors being volunteers and all that, they tend to only update the language(s) they care about. Unfortunately, this means that image versions can become horribly out of sync, normally to the disadvantage of the smaller wikis.

TranslateSvg changes this workflow, firstly by making it easier to translate files (thus reducing the all-too-common sight of English-language diagrams in use on non-English wikis), and secondly by embedding the new translations within the same SVG file. Thus, when boundaries change, a single update will propagate to all language versions instantly (if you’re worried about how Inkscape handles these, don’t be: you’ll simply see one set of translations on the screen at any one time, and you can even move that label around, thus nudging labels in every language at the same time).

I think that’s pretty nifty, and I hope you do too 🙂

GSoC update

Over the past few days I’ve been busy upgrading the parser built into TranslateSvg, such that ~87% of all SVG files with strings in them can now be translated — up from ~75% before the upgrade.

More importantly, the parser is now of a kind that could support up to 100%, whereas the old one was effectively tied to 75%.

GSoC: Phase 3 complete

Today, I reached a turning point in my Google Summer of Code project: the tentative completion of phase 3.

This means that, as of today, my local copy of TranslateSvg can take a freshly uploaded SVG file and shepherd the translation from beginning to end.

Well, with a 75% probability it can, anyway 🙂

Still, there’s lots to do. Documentation, testing, bugfixes, a wizard, a colorpicker, and code review to name just some of the things I still have to do.

But good news nonetheless.

GSoC: Midterm assessments

Yes, it’s coming up to mid-July, or, as its known to Google Summer of Code students, mid-term assessment time. This is something of a misnomer for me personally – I’m only about a third of the way through my own scheduled hours on the project, but it’s nevertheless a good time to take a step back and survey the scene.

The original project plan consisted largely of five parts: three main “phases” plus an introduction and a wrapup. At this point, I’m more or less where I should be: the introduction, phases 1 and 2 completed; phase 3 and the wrapup not yet started.

You can see what it means to say “phases 1 and 2 completed” by taking a look at this video (you may need to turn your sound up), which follows a user (me pretending to be French) translating a file into his/her own language. A wizard or guide to make the interface, which is borrowed from the Translate extension, more intuitive to newbies is in the works, as is the addition of a “color” property to help with recolouring text after translation.

Reuse onwiki for this visitor is now as simple as [[File:Picturebook 1.svg|thumb|lang=fr|Caption.]].

TranslateSvg currently supports about 76% of all translatable SVG files; once the basic import structure is complete (i.e. sometime in the next week or so), I’ll then have time to start pushing that up towards 99%.

GSoC update

Now that my exams are finally over, I can turn my attention back towards my Google Summer of Code project, TranslateSvg.

The first step is to fix up phase 1, taking on board the feedback I have received (including via code review). In particular, Niklas’ insightful comments about the relationship between Translate and TranslateSvg have prompted me to siphon off the new code into a separate extension – albeit one dependent on the presence of Translate.

After that, it’ll be on to phases 2 and 3 – import and export of SVG files.

GSOC – Week 3/4

Progress was slightly slower this week, but phase 1 of the project is still well on target for a Berlin demo:

  • Get Translate to work on an already established message group, loading properties from wiki pages
  • Get Translate to work on an already established message group, saving properties back to wiki pages
  • Implement static thumbnail for Special:Translate
  • Suppress documentation for SVG images
  • Implement static thumbnail for individual translation page
  • Steal file description from file description page
  • Create message files in .i18n.php