Skip to content

Common HTML named entity handled badly within JSX (.tsx) files #47030

@tshinnic

Description

@tshinnic

Bug Report

I've encountered a known HTML named entity that is not recognized by TSC when present within a React JSX file (aka .tsx), which is instead then spat out unchanged into the HTML page in its original entity text form.

Specifically, using   within JSX code does not work, with "&‌numsp;" being displayed in the browser window.   does work when placed directly within index.html, and the numeric form   works everywhere.

Looking at the intersection between Unicode "General Punctuation" (2000–206F) sections "Spaces" (2000-200A) and "Format Characters" (200B-200F) chart@Unicode...

and the list of known HTML entity names table@whatwg...

and looking at the definition of HTML entity names known to TSC TypeScript/src/compiler/transformers/jsx.ts...

TSC has 7 of these named HTML entities defined:

        ensp: 0x2002,
        emsp: 0x2003,
        thinsp: 0x2009,
        zwnj: 0x200C,
        zwj: 0x200D,
        lrm: 0x200E,
        rlm: 0x200F,

I have used 6 of these in projects.

The other known HTML entity names for 'spaces' are:

    2004   emsp13
    2005   emsp14
    2007   numsp
    2008   puncsp
    200A   hairsp
    200B   ZeroWidthSpace

I have used the last 4 in projects as well, the last only for experiments.

I believe it is true that the list of entities in jsx.ts has not changed since that file was created rbuckton committed on Feb 16, 2016 (ca. line 232)

And above I have identified named entities missing from only two very small sections within Unicode.

It is certainly true that the workaround for entity names missing from TSC is to use the numeric entity references, such as  , but then matters of 'usability', 'cryptic', 'confusing', etc. arise.

I am wondering what policy you would use in deciding whether to include or not include additional entity names. I am hoping you can be somewhat less severe than WHATWG's "additions are bad" stance.

In any case, documentation somewhere that TSC does not handle 'every' HTML named entity would be useful. Such as "CounterClockwiseContourIntegral" or "leftrightsquigarrow" or "angrtvbd"...

🔎 Search Terms

entity numsp thinsp HTML

🕗 Version & Regression Information

TSC 4.5.2

Inspecting source on Github shows this code section in jsx.ts has not changed in (4?) years.

🙁 Actual behavior

HTML entity &‌numsp; when present in a JSX file is echoed unchanged to web page and displayed there as &‌numsp;

🙂 Expected behavior

use of &‌numsp; should have exactly the same result as &‌#x2007; , instead JSX (.tsx) source containing for example

      ! ! !
      ! ! !

appears in browser window as

! ! ! ! ! !

which is not surprising given the generated JS code has:

children:[Object(s.jsx)(p,{}),Object(s.jsx)("br",{}),"! !\u2007! !\u2009!\u2009!"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Working as IntendedThe behavior described is the intended behavior; this is not a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions