Pdc and fancy quotes / smart quotes

,

I ran into an issue today with fancy quotes in Lua source files. It appears the playdate compiler (pdc) is trying to be "helpful" by silently pre-processing “smart quote” characters translating them into regular double quotes.

As a result, using those utf-8 literals in a string literal ("chicken “salad” sandwich" in a Lua source file will generate a compile time error, despite it being correct Lua code. Womp womp.

The characters in question:

U+201C “  Left double curved quote (8220 in decimal)
U+201D ” Right double curved quote (8221 in decimal)

Steps to reproduce:

# echo 'print("“")' > a.lua
# pdc a.lua
error: a.lua:1: unfinished string near '")'

If anyone you want to use fancy quotes in your fonts, here's a workaround:

-- local text = "chicken “salad” sandwich"
local text = table.concat({ 
  "chicken ", 
  utf8.char(8220), "salad", utf8.char(8221), 
  " sandwich"
})
font:drawText(text, 1, 1)

chicken-salad2


I'd love to see Panic deprecate this behavior of pdc and just throw an error when it encounters a raw smart quote when parsing (unless inside a string literal) like vanilla Lua. I also understand this is super-low priority and so likely won't change.

3 Likes

I found an old thread which acknowledges this behavior of pdc:

In trying to debug this character-by-character in the problematic strings I also discovered that Lua's string.sub(str, start, end) is not utf-8 aware so cannot directly be used to iterate character-by-character over non-ASCII strings.

Here is an example of how to properly iterate glyph-by-glyph in a UTF-8 string in lua:

function utf8_debug(str)
    local n = 0
    print("utf8_debug:", str)
    print("pos", "byt", "chr", "codepoint")
    print("---", "---", "---", "---------")
    for p, c in utf8.codes(str) do
        n = n + 1
        print(n, p, utf8.char(c), string.format("U+%04x (%d)", c, c):upper())
    end
end

text = table.concat({ utf8.char(8220), "\"'  ", utf8.char(8221) })
utf8_debug(text)

Which produces the following helpful output:

utf8_debug:	“"'  ”
pos	byt	chr	codepoint
---	---	---	---------
1	1	“	U+201C (8220)
2	4	"	U+0022 (34)
3	5	'	U+0027 (39)
4	6	 	U+0020 (32)
5	7	 	U+00A0 (160)
6	9	”	U+201D (8221)