View Issue Details

IDProjectCategoryView StatusLast Update
0001001luatextexlua bugpublic2017-12-31 17:23
Reporterwspr Assigned ToHans Hagen  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionno change required 
Summary0001001: String lengths with unicode.utf8.format is not unicode-aware
DescriptionLuaTeX's built-in string.format function uses the number of bytes when calculating string lengths, so it can't be used to format Unicode strings correctly when a desired "string length" is specified with something like "%4s".

unicode.utf8.format inherits this problem, surprisingly, although unicode.utf8.len DOES calculate Unicode string lengths as expected.
Steps To Reproduces = "‡"
print(unicode.utf8.len(s))
-- "1", as expected

print(unicode.utf8.format("string: [%-4s]",s))
-- "[‡ ]", only 2 chars, not 4; same as string.format
TagsNo tags attached.

Activities

Hans Hagen

2017-12-31 17:23

manager   ~0001679

patching such a core helper will not be compatible (also, %20s can be considered to mean 20 bytes)

it's no big deal to write a helper that does the padding:

function string.utfpadd(s,n)
    local l = 0
    local p = 1
    while p do
        local _, u = string.find(s,"[\0-\x7F\xC2-\xF4][\x80-\xBF]*",p)
        if u then
            l = l + 1
        else
            break
        end
        p = u + 1
    end
    if n > 0 then
        return string.rep(" ",n-l) .. s
    else
        return s .. string.rep(" ",-n-l)
    end
end

function string.utfpadd(s,n)
    if not n or n == 0 then
        return s
    end
    local l = string.utflength(s) -- luatex extension to string
    if n > 0 then
        return string.rep(" ",n-l) .. s
    else
        return s .. string.rep(" ",-n-l)
    end
end

print(string.format("%30s[]","xxaxx"))
print(string.format("%30s[]","xx½xx"))
print(string.utfpadd("xxaxx", 30) .. "[]")
print(string.utfpadd("xx½xx", 30) .. "[]")

print(string.format("%-30s[]","xxaxx"))
print(string.format("%-30s[]","xx½xx"))
print(string.utfpadd("xxaxx",-30) .. "[]")
print(string.utfpadd("xx½xx",-30) .. "[]")

Issue History

Date Modified Username Field Change
2017-12-28 06:52 wspr New Issue
2017-12-31 17:23 Hans Hagen Note Added: 0001679
2017-12-31 17:23 Hans Hagen Assigned To => Hans Hagen
2017-12-31 17:23 Hans Hagen Status new => assigned
2017-12-31 17:23 Hans Hagen Status assigned => closed
2017-12-31 17:23 Hans Hagen Resolution open => no change required