As of time of writing, there are 538,152 SVG images on Commons — approximately 123 gigabytes’ worth. Evidently, analysing them all was going to be too big a task, so instead I selected 10,000 at random (based, in fact, on the first letter of their SHA1 hash – in this case, ‘m’) to test.
Of those 10,000, 71% do not include a single <text> tag; the flipside is that 29% do, or, to put it another way, TranslateSvg has the potential to allow for internationalisation of 156,000 files Commons-wide.
11.5% of SVG files (~40% of files with any strings at all) have between 1 and 10 strings. As the number of strings targeted increases, so frequency tends to decrease, with the notable exception of the 7.5% of all SVG files which include exactly 16 strings (that’s not an error by the way). The topmost 20 in my sample ranged from 189 strings to a massive 815 strings. In any case, just 0.6% of all SVG files – or 1 in 50 translatable files – include over 100 strings.
In total, I extracted some 57,805 strings from my sample, suggesting the existence of some 3 million <text> tags on Commons, each of which could be translated. We can, of course look more closely at what comprises those strings. (I should note that the following ignores attributes, and – because I wasn’t expecting <text /> tags – might suffer from a slight rate of error.)
Never the less, I can say that slightly over half of those strings look at bit like “<text><tspan>…</tspan></text>”, which is coincidentally Inkscape’s default (despite the fact there’s no reason I know of to use <tspan>s like that). A further quarter, give or take, use plain ol’ <text> syntax. 8.7% consist of multiple pairs of <tspan>s back to back (a relatively sane construction).
Of the wackier constructions, 2.5% choose to nest <tspan>s, whilst 1.5% of all <text> tags have no visible content whatsoever. A handful of people managed to use the <textPath> tag in their files, which I can’t see a good way of supporting in TranslateSvg.
Okay, so the above isn’t that interesting by itself, but it’s going to inform the design choices I make with TranslateSvg in order to ensure it handles all variety of different constructions properly and optimises in the right places. Hurray for research 🙂