Martin J. Dürst
2012-05-07 23:45:25 UTC
This post refers to the document at
http://tools.ietf.org/html/draft-ietf-iri-bidi-guidelines-02 .
I have a number of comments on specific clauses in the document, but it
is more urgent to agree or disagree on the general principles on which
the document is based.
A. First of all, it should be agreed about whose problems the document
is supposed to solve. This is not stated in the document, but I see 3
classes of "users":
- Site administrators who create IRIs
- Consumers who see IRIs in print (on paper, on bus sides, etc...) or on
screen
- Implementers who have to implement the rules.
The main requirements stated in the document are:
1. user-predictable conversion between visual and logical
representation; 2. the ability to include a wide range of characters in
various parts of the IRI; and 3. minor or no changes or restrictions for
implementations.
The first requirement is for the benefit of consumers, the second one
for administrators, and the third one for implementers.
If I was to set the priorities, I would say that the first concern is
for consumers reading IRIs on paper or bus side, then for consumers
seeing IRIs on screen, with the requirement that IRIs should appear
identically on paper and everywhere on screen, whether in a browser or
in an application where they can be part of plain text.
The current document does not satisfy completely its own first
requirement, since the visual IRI "http://abc.123.FED" can be
interpreted equally reasonably as the logical IRI "http://abc.123.DEF"
or "http://abc.DEF.123".
It does not satisfy the third requirement either, since it states that
IRIs must be rendered as if within a LTR embedding, which is a kind of
special treatment.
B. The document seems to hesitate between handling IRIs with the UBA
transparently for the application (i.e. the application does not have to
do anything special for displaying IRIs) and special handling.
On one hand, it says "Bidirectional IRIs MUST be rendered by using the
Unicode Bidirectional Algorithm", so it seeks transparency. On the other
hand, it says "Bidirectional IRIs MUST be rendered in the same way as
they would be if they were in a left-to-right embedding; i.e., as if
they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and
followed by U+202C, POP DIRECTIONAL FORMATTING (PDF).", which means
special handling.
Another paragraph states:
<mailto:%22To%20make%20sure%20that%20it%20does%20not%20affect%20the%20rendering%20of%20bidirectional%20IRIs%20too%20much,%20some%20restrictions%20on%20bidirectional%20IRIs%20are%20necessary.%20These%20restrictions%20are%20given%20in%20terms%20of%20delimiters%20(structural%20characters,%20mostly%20punctuation%20such%20as%20%22@>
"To make sure that it does not affect the rendering of bidirectional
IRIs too much, some restrictions on bidirectional IRIs are necessary.
These restrictions are given in terms of delimiters (structural
characters, mostly punctuation such as "@", ".", ":", and "/") and
components (usually consisting mostly of letters and digits)."
The document does not specify what the announced restrictions are (and
the reference to RFC3987bis does not clarify anything, for me at least).
My guess is that the authors are in favor of some special handling that
would prevent interference between components (what appears between
delimiters), but this is not detailed, and of course that would harm the
transparency requirement.
In fact, what is sorely missing is a precise definition of how an IRI
with domain, path, fragment and query all potentially including RTL
characters should be displayed. The problem is that currently there is
no consensus on that matter. Since the target is not clearly painted,
the arrow does not know where to go.
C. So I see 2 possible venues:
1) IRIs are handled transparently. This is ideal for implementers. Then
some more restrictions should be placed on IRI creators to make sure
that the IRI on bus side can be interpreted unambiguously. The
restrictions may not be enforceable for path and query, but this is not
critical, since the IRI on bus side will typically be short and not
include these parts. IRIs on screen can hopefully be clicked on, or
copied and pasted into the address line of a browser, and will not be
typed manually.
2) IRIs are handled specially. This allows displaying IRIs according to
any rules will be agreed upon, including separating the components in
path, fragment and query parts. This puts a burden on implementers who
must identify IRIs within plain text, but many applications already do
this in order to allow clicking on IRIs. The difficult part here will be
to get a consensus on how to display mixed LTR/RTL IRIs.
I think that the discussion above should be resolved before commenting
on finer points of the document.
Shalom (Regards), Mati
http://tools.ietf.org/html/draft-ietf-iri-bidi-guidelines-02 .
I have a number of comments on specific clauses in the document, but it
is more urgent to agree or disagree on the general principles on which
the document is based.
A. First of all, it should be agreed about whose problems the document
is supposed to solve. This is not stated in the document, but I see 3
classes of "users":
- Site administrators who create IRIs
- Consumers who see IRIs in print (on paper, on bus sides, etc...) or on
screen
- Implementers who have to implement the rules.
The main requirements stated in the document are:
1. user-predictable conversion between visual and logical
representation; 2. the ability to include a wide range of characters in
various parts of the IRI; and 3. minor or no changes or restrictions for
implementations.
The first requirement is for the benefit of consumers, the second one
for administrators, and the third one for implementers.
If I was to set the priorities, I would say that the first concern is
for consumers reading IRIs on paper or bus side, then for consumers
seeing IRIs on screen, with the requirement that IRIs should appear
identically on paper and everywhere on screen, whether in a browser or
in an application where they can be part of plain text.
The current document does not satisfy completely its own first
requirement, since the visual IRI "http://abc.123.FED" can be
interpreted equally reasonably as the logical IRI "http://abc.123.DEF"
or "http://abc.DEF.123".
It does not satisfy the third requirement either, since it states that
IRIs must be rendered as if within a LTR embedding, which is a kind of
special treatment.
B. The document seems to hesitate between handling IRIs with the UBA
transparently for the application (i.e. the application does not have to
do anything special for displaying IRIs) and special handling.
On one hand, it says "Bidirectional IRIs MUST be rendered by using the
Unicode Bidirectional Algorithm", so it seeks transparency. On the other
hand, it says "Bidirectional IRIs MUST be rendered in the same way as
they would be if they were in a left-to-right embedding; i.e., as if
they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and
followed by U+202C, POP DIRECTIONAL FORMATTING (PDF).", which means
special handling.
Another paragraph states:
<mailto:%22To%20make%20sure%20that%20it%20does%20not%20affect%20the%20rendering%20of%20bidirectional%20IRIs%20too%20much,%20some%20restrictions%20on%20bidirectional%20IRIs%20are%20necessary.%20These%20restrictions%20are%20given%20in%20terms%20of%20delimiters%20(structural%20characters,%20mostly%20punctuation%20such%20as%20%22@>
"To make sure that it does not affect the rendering of bidirectional
IRIs too much, some restrictions on bidirectional IRIs are necessary.
These restrictions are given in terms of delimiters (structural
characters, mostly punctuation such as "@", ".", ":", and "/") and
components (usually consisting mostly of letters and digits)."
The document does not specify what the announced restrictions are (and
the reference to RFC3987bis does not clarify anything, for me at least).
My guess is that the authors are in favor of some special handling that
would prevent interference between components (what appears between
delimiters), but this is not detailed, and of course that would harm the
transparency requirement.
In fact, what is sorely missing is a precise definition of how an IRI
with domain, path, fragment and query all potentially including RTL
characters should be displayed. The problem is that currently there is
no consensus on that matter. Since the target is not clearly painted,
the arrow does not know where to go.
C. So I see 2 possible venues:
1) IRIs are handled transparently. This is ideal for implementers. Then
some more restrictions should be placed on IRI creators to make sure
that the IRI on bus side can be interpreted unambiguously. The
restrictions may not be enforceable for path and query, but this is not
critical, since the IRI on bus side will typically be short and not
include these parts. IRIs on screen can hopefully be clicked on, or
copied and pasted into the address line of a browser, and will not be
typed manually.
2) IRIs are handled specially. This allows displaying IRIs according to
any rules will be agreed upon, including separating the components in
path, fragment and query parts. This puts a burden on implementers who
must identify IRIs within plain text, but many applications already do
this in order to allow clicking on IRIs. The difficult part here will be
to get a consensus on how to display mixed LTR/RTL IRIs.
I think that the discussion above should be resolved before commenting
on finer points of the document.
Shalom (Regards), Mati