Discussion:
Restrictions on domain names for Top Level Domains (TLDs) for bidi document
Larry Masinter
2012-02-21 06:20:00 UTC
Permalink
By the way,

I've read over your notes about RTL-TLD, and I'm uncertain how this might be reflected
in the IRI document or even the BIDI document except perhaps as an informational reference
to the IDN specification.

While it's interesting, it doesn't seem to add or remove any restrictions on what a
"legal" IRI is, or how to process an IRI -- or am I missing something.

Larry

===========
*Restrictions on domain names for Top Level Domains (TLDs)*
*Definition:* Right-To-Left Top Level Domains (RTL-TLD). These are
top-level domains that are in languages using right-to-left
characters. Namely the Unicode bidi class of the characters that make
up the TLD is either R or AL (see UAX 9).
As an IRI must always be rendered left-to-right (see section 2) there
exists a number of cases where an RTL-TLD will render in a way that is
http://abc.def.LKJ/IHG
In the above case the path appears after the registered domain and is
in the visual location of the TLD. This can confuse the reader as to
which is the actual TLD. In order to restrict such confusing cases the
1. An RTL-TLD is a TLD which is in a language where the characters
draw right to left. An LTR-TLD is a TLD which is in a language where
the characters draw left to right.
2. The characters in an RTL-TLD MUST always be of the same Unicode
bidi class.
3. The characters of a registered domain MUST match the Unicode bidi
class of the TLD if the TLD is an RTL-TLD.
4. if the characters of a registered domain contain more than one bidi
class, the domain MUST be registered to an LTR-TLD.
The restriction of MUST guarantees that the registered domain and its
corresponding TLD will always appear together and in the same order in
all possible IRIs. There may be cases where numbers and bidi neutral
characters may be reordered by the Unicode bidi algorithm in a way
that changes their visual position relative to the TLD. The above
rules prevent such cases. If the domain registrar needs to register a
name that contains characters that are mixed direction (e.g. contains
numbers, punctuation or LTR characters) then the domain can still be
registered with a TLD that has left to right characters.
http://IHG.FED.CBA/jkl
B. With an LTR second level domain there is a sub-optimal case where
the path appears next to the sub-domain. But in this case it is still
ht
Martin J. Dürst
2012-02-21 08:43:01 UTC
Permalink
Hello Larry,
Post by Larry Masinter
By the way,
I've read over your notes about RTL-TLD, and I'm uncertain how this might be reflected
in the IRI document or even the BIDI document except perhaps as an informational reference
to the IDN specification.
I think the problem that Adil describes happens when IDNs get integrated
into IRIs. An adjacent path component can make a bidi IDN that was
reasonably understandable on its own look quite different and difficult
to parse.

I have thought about this case little by little, and my current thinking
is that it might lead to very similar restrictions like those we already
have on an individual component (e.g. DNS label) level, but one level
higher, e.g. for all of the domain name, all of the path, and so on.

Another aspect is that Adil looked at the domain name first and foremost
because it's the component most vulnerable with respect to spoofing.
Post by Larry Masinter
While it's interesting, it doesn't seem to add or remove any restrictions on what a
"legal" IRI is, or how to process an IRI -- or am I missing something.
To some extent, that may be a wording issue. Even if we don't want to
make such cases invalid, a strong warning may be in order.

Regards, Martin.
Post by Larry Masinter
Larry
===========
*Restrictions on domain names for Top Level Domains (TLDs)*
*Definition:* Right-To-Left Top Level Domains (RTL-TLD). These are
top-level domains that are in languages using right-to-left
characters. Namely the Unicode bidi class of the characters that make
up the TLD is either R or AL (see UAX 9).
As an IRI must always be rendered left-to-right (see section 2) there
exists a number of cases where an RTL-TLD will render in a way that is
http://abc.def.LKJ/IHG
In the above case the path appears after the registered domain and is
in the visual location of the TLD. This can confuse the reader as to
which is the actual TLD. In order to restrict such confusing cases the
1. An RTL-TLD is a TLD which is in a language where the characters
draw right to left. An LTR-TLD is a TLD which is in a language where
the characters draw left to right.
2. The characters in an RTL-TLD MUST always be of the same Unicode
bidi class.
3. The characters of a registered domain MUST match the Unicode bidi
class of the TLD if the TLD is an RTL-TLD.
4. if the characters of a registered domain contain more than one bidi
class, the domain MUST be registered to an LTR-TLD.
The restriction of MUST guarantees that the registered domain and its
corresponding TLD will always appear together and in the same order in
all possible IRIs. There may be cases where numbers and bidi neutral
characters may be reordered by the Unicode bidi algorithm in a way
that changes their visual position relative to the TLD. The above
rules prevent such cases. If the domain registrar needs to register a
name that contains characters that are mixed direction (e.g. contains
numbers, punctuation or LTR characters) then the domain can still be
registered with a TLD that has left to right characters.
http://IHG.FED.CBA/jkl
B. With an LTR second level domain there is a sub-optimal case where
the path appears next to the sub-domain. But in this case it is still
http://abc.LKJ/IHG.FED
Loading...