Discussion:
[iri] #121: BIDI: Some users are requiring right-to-left label ordering.
iri issue tracker
2012-03-13 16:05:02 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.

BIDI section 2 requires adding embedding marks with force a "western"
left-to-right ordering of labels. I have requirements from customers,
including government customers, that require a right-to-left ordering of
labels in at least some cases.

This preferences seems to be a user preference, with, perhaps, a strong
language bias.

Specifically, how is a user reading an Arabic domain name from the side of
the bus over a phone going to read it? And how will the person on the end
of the phone type it? My investigation shows that native speakers will
prefer reading a domain name from the right in BIDI contexts.
--
----------------------------+----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: new
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Keywords:
Severity: - |
----------------------------+----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-14 06:14:08 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.


Comment (by duerst@…):

Shawn, thanks for creating this issue. Can you give more details about
your customer's requirements (e.g. is right-to-left ordering meant to work
per component or per run? At what point should a mixed (including RTL and
LTR components) IRI be displayed right-to-left (e.g. even if only a single
component, e.g. a single path component (directory) in a path is RTL)? Are
there details that vary per "customer", and if yes, what?
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: new
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution:
Severity: - |
Keywords: |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:1>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-14 17:47:10 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.


Comment (by shawnste@…):

The primary concern would be a simple domain name, even without http:// :)
Of course an IRI needs to be consistent with that.

The customers have been focused primarily on the domain portion. By the
time we look at the query string they've "lost interest". So RTL in the
domain should probably force reading order.

Interestingly, however, the key indicator isn't the domain itself, but
rather the context/mindset of the user. If they're dealing with Arabic,
they may expect the URL to render labels from right-to-left, even if it's
entirely ASCII! Specifically, if the browser's UI language is Arabic, or
if the Address Field is in Right To Left Reading order, this expectation
increases.

The bias also seems to be cultural &/or experience related. A software
engineer that majored in math speaking from one country may feel more
comfortable with left-to-right behavior than a non-computer/math focused
person in another country.

I know it doesn't help this RFC, but keying off the address box
directionality might be good. In a document, keying off the primary
document language might work. That doesn't provide the consistency
necessary here.

I don't think that "any RTL means all-RTL" works very well, because a
simple Arabic query string to Bing probably doesn't mean that the address
needs flipped. Any RTL within the domain portion (or local part of an
email address) probably does indicate that the labels should be ordered
from Right to Left.

I realize that following these rules may end up with behavior that is
"fuzzier" than some are comfortable with, however the goal here is human
readable (by the 90%, not engineers). Machines and Engineers already know
how to "read" it, we've got byte order if nothing else; our biases should
not impact the "see a domain name on the side of the bus and type it into
my phone" case.

In summary: Follow the order of the address box if the user sets that. If
there is no other context, any RTL in the primary portion (eg domain) of
the IRI should trigger RTL ordering of the labels. EG: put the whole
thing in right to left marks instead of left to right marks.
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: new
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution:
Severity: - |
Keywords: |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:2>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-28 04:20:59 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.

Changes (by adil@…):

* keywords: => bidi
* status: new => closed
* resolution: => wontfix


Comment:

Shawn, being one of the people that wants to see Arabic URLs flowing right
to left I fully understand what you are saying. I have gone around in
circles a few times with this and I concluded that this version of the
Bidi-IRI document is not where we should resolve the issue.

Firstly, internet addresses is a subset of the use of IRIs and I need to
take into account the general purpose of the IRI. IRIs are rendered by a
wide variety of devices that have only a few things in common. The primary
concern is that the IRI is consistent on all these devices when it
contains bidi characters.

Secondly, a full solution to getting to URLs to render readably right-to-
left requires either a modification to the Unicode bidi algorithm (which
Mark Davis proposed) or a restriction to the characters that can be used
for registering right-to-left domain names (e.g. only allow Arabic
alphabetic characters in an Arabic domain name). Both of these cases are
out of the scope of this document.

I think what is needed (independently of this document) is a specification
for URLs that are safe to be drawn right-to-left. Then, if a browser
recognizes a safe URL it can draw the URL right-to-left without concern.
This specification can be advertised to domain name registrars and web
companies. In theory we could then have the Googles, and Facebooks of this
world using and advertising URLs that are right to left.

I am setting this issue as won't fix but if you disagree please comment
here and I will reopen it.
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: closed
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution: wontfix
Severity: - |
Keywords: bidi |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:3>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-28 05:15:21 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.


Comment (by shawnste@…):

Well, I disagree with pretty much every point :)
* clearly everything won't be consistent because plain text that doesn't
know how to detect an IRI isn't going to behave as expected.
* I think that the importance isn't consistency between devices, but
rather the ability for users to consistently transcribe the IRI. That
includes not only display on devices, but input through whatever keyboards
from sticky notes that were transcribed by hand from an IRI on the side of
a bus.
* Related, I don't think they can be "unnatural".
* There's a lot of pressure to ensure that RTL domains are "correctly"
rendered in RTL fashion. So I think we'd get a better job of consistency
if the guidelines took that into account instead of having software
developers trying to do something "better" in an inconsistent fashion.
* Though fixing the BIDI Algorithm would help, it's not required. Indeed,
the proposed behavior uses bidi override marks to get the desired
behavior. The same thing can be done for RTL. Granted a better BIDI
algorithm for IRIs would make "plain text" better, but it’s not required.
* As noted, this isn’t necessarily easily gleaned from the script(s) being
used, as some cultural and user preferences also influence it.

I disagree that there’s anything particularly interesting about “safe”. I
think that as long as the sections are consistently from left to right or
right to left it doesn’t matter whether its drawn http://www.microsoft.com
or com.microsoft.www/ /:http. Indeed if that was the user preference,
independent of the actual script, then they’d always be consistent for
that user. If there does prove to be a spoofing problem with
http://www.spoof.me.com/com.microsoft.www//:http type things, those are
fairly easy for malware filters to detect. Also 90% of users can’t tell
that http://www.microsoft.safe-secure.com isn’t a great place to enter a
credit card #. At the machine level, the rendering is irrelevant since
it’s always stored the same way.

I really need an way, even optional if need be, of rendering for RTL
before I can "sign off" on this draft :)
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: closed
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution: wontfix
Severity: - |
Keywords: bidi |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:4>
iri <http://tools.ietf.org/wg/iri/>
Larry Masinter
2012-03-28 22:30:56 UTC
Permalink
My read on the situation:

It would be helpful if we could get some agreed text describing the nature of the problem --
it sounds to me that there might be agreement on the problem (more or less) ,
just not on whether there are feasible (partial) solutions.

If we have agreement on the problem statement, then we can:

* document partial solutions (with caveats)
* say we don't believe there are any feasible solutions at this time

It would be useful also to get a survey of of what current implementations actually are doing now, along with some concrete examples of the nature of the problems.
I really need an way, even optional if need be, of rendering for RT before I can "sign off" on this draft :)
There's no magic, just "rough consensus and running code":

* if all of the implementations agree, then we can document that.
* If there are multiple implementations currently, we can try to pick one.
* if we don't like any of the implementations, we can say so.
* If there are no implementations or even demos or samples of implementations, we shouldn't hold our breath hoping one will appear.

Larry
--
http://larry.masinter.net


-----Original Message-----
From: iri issue tracker [mailto:trac+***@gamay.tools.ietf.org]
Sent: Wednesday, March 28, 2012 7:15 AM
To: draft-ietf-iri-bidi-***@tools.ietf.org; ***@it.aoyama.ac.jp; ***@microsoft.com; ***@diwan.com
Cc: public-***@w3.org
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

#121: BIDI: Some users are requiring right-to-left label ordering.


Comment (by shawnste@…):

Well, I disagree with pretty much every point :)
* clearly everything won't be consistent because plain text that doesn't
know how to detect an IRI isn't going to behave as expected.
* I think that the importance isn't consistency between devices, but
rather the ability for users to consistently transcribe the IRI. That
includes not only display on devices, but input through whatever keyboards
from sticky notes that were transcribed by hand from an IRI on the side of
a bus.
* Related, I don't think they can be "unnatural".
* There's a lot of pressure to ensure that RTL domains are "correctly"
rendered in RTL fashion. So I think we'd get a better job of consistency
if the guidelines took that into account instead of having software
developers trying to do something "better" in an inconsistent fashion.
* Though fixing the BIDI Algorithm would help, it's not required. Indeed,
the proposed behavior uses bidi override marks to get the desired
behavior. The same thing can be done for RTL. Granted a better BIDI
algorithm for IRIs would make "plain text" better, but it’s not required.
* As noted, this isn’t necessarily easily gleaned from the script(s) being
used, as some cultural and user preferences also influence it.

I disagree that there’s anything particularly interesting about “safe”. I
think that as long as the sections are consistently from left to right or
right to left it doesn’t matter whether its drawn http://www.microsoft.com
or com.microsoft.www/ /:http. Indeed if that was the user preference,
independent of the actual script, then they’d always be consistent for
that user. If there does prove to be a spoofing problem with
http://www.spoof.me.com/com.microsoft.www//:http type things, those are
fairly easy for malware filters to detect. Also 90% of users can’t tell
that http://www.microsoft.safe-secure.com isn’t a great place to enter a
credit card #. At the machine level, the rendering is irrelevant since
it’s always stored the same way.

I really need an way, even optional if need be, of rendering for RTL
before I can "sign off" on this draft :)

--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: closed
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution: wontfix
Severity: - |
Keywords: bidi |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:
Shawn Steele
2012-03-28 22:50:50 UTC
Permalink
IE is currently not great now, getting into the mixed-up situations we all know is undesirable.

A "concrete example" seems hard, but one that I'm keen on is a partial web name on the side of a bus, in Arabic, eg: CCC.BBB.AAA. Note that I'm intentionally leaving out the http:// and any default.html or whatever. I have a difficult time imagining any Arabic speaker copying that onto a notepad other than by writing from right to left. I also expect that they would then naturally type it the same way they wrote it. I think we have to build from there, that's how 90% of the people use an IRI. Nobody's going to type the http://, particularly in Arabic, because it requires a keyboard change, and the browser will add it for them.

In those 90% useful cases there is no mixed Latin/Arabic, it's just a domain name. It's nice if we present mixed up stuff a little more orderly, but nobody cares about the part after the domain name.

I believe that we need to allow the same thing we have with LTR ordering, except for RTL. Where it gets confusing to me is when you choose LTR or RTL behavior. A few options seem possible:

* User Preference
* System/Application Preference (eg: I'm looking at an Arabic web site, so I'll show RTL labels. I'm looking at an English web site, I'll show LTR labels).
* If there're any RTL characters, do the whole thing as RTL
* Restrict the RTL/LTR test to the primary part of the IRI, eg: domain.

Caveats are that many of those probably allow homographs in some cases (Maybe not User Preference, since they'd know it'd always be one direction or the other.) I'm not worried about those cases as SmartScreen will easily filter those out if necessary. It'd be harder if we didn't force RTL/LTR on the whole thing (eg: had current BIDI algorithm behavior).

-Shawn

-----Original Message-----
From: Larry Masinter [mailto:***@adobe.com]
Sent: Wednesday, March 28, 2012 3:31 PM
To: ***@it.aoyama.ac.jp; Shawn Steele; ***@diwan.com
Cc: public-***@w3.org
Subject: RE: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

My read on the situation:

It would be helpful if we could get some agreed text describing the nature of the problem -- it sounds to me that there might be agreement on the problem (more or less) , just not on whether there are feasible (partial) solutions.

If we have agreement on the problem statement, then we can:

* document partial solutions (with caveats)
* say we don't believe there are any feasible solutions at this time

It would be useful also to get a survey of of what current implementations actually are doing now, along with some concrete examples of the nature of the problems.
Post by iri issue tracker
I really need an way, even optional if need be, of rendering for RT
before I can "sign off" on this draft :)
There's no magic, just "rough consensus and running code":

* if all of the implementations agree, then we can document that.
* If there are multiple implementations currently, we can try to pick one.
* if we don't like any of the implementations, we can say so.
* If there are no implementations or even demos or samples of implementations, we shouldn't hold our breath hoping one will appear.
iri issue tracker
2012-03-28 05:40:41 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.

Changes (by duerst@…):

* status: closed => reopened
* resolution: wontfix =>


Comment:

Reopening it for Shawn. We definitely need wider consensus on how to
proceed with this.
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: reopened
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution:
Severity: - |
Keywords: bidi |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:5>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-29 07:52:25 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.


Comment (by duerst@…):

(from Larry)

My read on the situation:

It would be helpful if we could get some agreed text describing the nature
of the problem --
it sounds to me that there might be agreement on the problem (more or
less) ,
just not on whether there are feasible (partial) solutions.

If we have agreement on the problem statement, then we can:

* document partial solutions (with caveats)
* say we don't believe there are any feasible solutions at this time

It would be useful also to get a survey of of what current implementations
actually are doing now, along with some concrete examples of the nature of
the problems.
Post by iri issue tracker
I really need an way, even optional if need be, of rendering for RT
before I can "sign off" on this draft :)
There's no magic, just "rough consensus and running code":

* if all of the implementations agree, then we can document that.
* If there are multiple implementations currently, we can try to pick
one.
* if we don't like any of the implementations, we can say so.
* If there are no implementations or even demos or samples of
implementations, we shouldn't hold our breath hoping one will appear.

Larry
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: reopened
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution:
Severity: - |
Keywords: bidi |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:6>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-29 07:53:25 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.


Comment (by duerst@…):

(from Shawn)

IE is currently not great now, getting into the mixed-up situations we all
know is undesirable.

A "concrete example" seems hard, but one that I'm keen on is a partial web
name on the side of a bus, in Arabic, eg: CCC.BBB.AAA. Note that I'm
intentionally leaving out the http:// and any default.html or whatever. I
have a difficult time imagining any Arabic speaker copying that onto a
notepad other than by writing from right to left. I also expect that they
would then naturally type it the same way they wrote it. I think we have
to build from there, that's how 90% of the people use an IRI. Nobody's
going to type the http://, particularly in Arabic, because it requires a
keyboard change, and the browser will add it for them.

In those 90% useful cases there is no mixed Latin/Arabic, it's just a
domain name. It's nice if we present mixed up stuff a little more
orderly, but nobody cares about the part after the domain name.

I believe that we need to allow the same thing we have with LTR ordering,
except for RTL. Where it gets confusing to me is when you choose LTR or
RTL behavior. A few options seem possible:

* User Preference
* System/Application Preference (eg: I'm looking at an Arabic web site, so
I'll show RTL labels. I'm looking at an English web site, I'll show LTR
labels).
* If there're any RTL characters, do the whole thing as RTL
* Restrict the RTL/LTR test to the primary part of the IRI, eg: domain.

Caveats are that many of those probably allow homographs in some cases
(Maybe not User Preference, since they'd know it'd always be one direction
or the other.) I'm not worried about those cases as SmartScreen will
easily filter those out if necessary. It'd be harder if we didn't force
RTL/LTR on the whole thing (eg: had current BIDI algorithm behavior).

-Shawn
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: reopened
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution:
Severity: - |
Keywords: bidi |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:7>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-29 08:05:27 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.


Comment (by duerst@…):

Hello Shawn,

Two points of clarification:

- At http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:4, you
write "Indeed, the proposed behavior uses bidi override marks to get the
desired behavior.", but it's not override marks, it's embedding marks.
Otherwise, not a single RTL domain label or path component would be
readable. (maybe that's what you meant, but in that case, please be
careful with terminology)

- At http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:7, you
wrote about partial web names in all-Arabic on the side of a bus, e.g.
CCC.BBB.AAA. In this specific case, the current spec (RFC 3987 and draft-
ietf-iri-bidi-guidilines-02.txt) will do the right thing (because the
Unicode Bidi algorithm reorders by runs, not by components). In that case,
no embedding may be necessary. This is explicitly mentioned:

{{{
Also, a
bidirectional relative IRI reference that only contains strong right-
to-left characters and weak characters (such as symbols) and that
starts and ends with a strong right-to-left character and appears in
a text with right-to-left base directionality (such as used for
Arabic or Hebrew) and is preceded and followed by whitespace and
strong characters does not need an embedding.

}}}
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: reopened
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution:
Severity: - |
Keywords: bidi |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:8>
iri <http://tools.ietf.org/wg/iri/>
Shawn Steele
2012-03-29 15:10:58 UTC
Permalink
Post by iri issue tracker
(maybe that's what you meant, but in that case, please be
careful with terminology)
Yes, sorry.
Post by iri issue tracker
- At http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:7, you
wrote about partial web names in all-Arabic on the side of a bus, e.g.
CCC.BBB.AAA. In this specific case, the current spec (RFC 3987 and draft-
ietf-iri-bidi-guidilines-02.txt) will do the right thing (because the
Unicode Bidi algorithm reorders by runs, not by components). In that case,
I actually got confused a bit and reread the specification. Now I like the behavior even less :)

Our investigation is that the parts of an IRI are treated like a list. If I have a list like (Afra, Joe, Mary, Maysun, Mohamed, Phil), I'm not going to change the order of the list because of my language, I expect it to stay (AFRA, joe, mary, MAYSUN, MOHAMED, phil), not (AFRA, joe, mary, MOHAMED, MAYSUN, phil). (Though I confess to mixing metaphors because I used alphabitization to sort my list and clearly in different scripts that'd be different. I imagine I'm getting the idea across though, maybe it was an org chart that just so happens to have people arranged alphabetically by transliterated Latin name :)).

Similarly for http://www.microsoft.com/en-us/default.aspx, it's ordered something like a://b.c.d/e/f.g -- A list can keep its order rendered as either a://b.c.d/e/f.g or g.f/e/b.c.b//:a Which is appropriate depends on the situation, but if we start rearranging the order of the labels it gets really confusing. At that point 99% of the populous would lose all hope of realizing there's an order to an IRI. (Right now few people could correctly parse one anyway, but it'd get way worse).

IMO, which way the parts are ordered is less important than the fact they're consistently ordered.

-Shawn
Slim Amamou
2012-03-29 15:31:30 UTC
Permalink
I support this view. I'd add that It's acceptable for me if the LTR
order for the components is enforced on IRIs. The other solution is to
state in the RFC that every IRI spec MUST define an overall ordering
for the components either LTR or RTL.

On Thu, Mar 29, 2012 at 4:10 PM, Shawn Steele
(...)
Our investigation is that the parts of an IRI are treated like a list.  If I have a list like (Afra, Joe, Mary, Maysun, Mohamed, Phil), I'm not going to change the order of the list because of my language, I expect it to stay (AFRA, joe, mary, MAYSUN, MOHAMED, phil), not (AFRA, joe, mary, MOHAMED, MAYSUN, phil).  (Though I confess to mixing metaphors because I used alphabitization to sort my list and clearly in different scripts that'd be different.  I imagine I'm getting the idea across though, maybe it was an org chart that just so happens to have people arranged alphabetically by transliterated Latin name :)).
Similarly for http://www.microsoft.com/en-us/default.aspx, it's ordered something like a://b.c.d/e/f.g  -- A list can keep its order rendered as either a://b.c.d/e/f.g or g.f/e/b.c.b//:a   Which is appropriate depends on the situation, but if we start rearranging the order of the labels it gets really confusing.  At that point 99% of the populous would lose all hope of realizing there's an order to an IRI.  (Right now few people could correctly parse one anyway, but it'd get way worse).
IMO, which way the parts are ordered is less important than the fact they're consistently ordered.
-Shawn
--
Slim Amamou | سليم عمامو
http://alixsys.com
Shawn Steele
2012-03-29 16:09:59 UTC
Permalink
I disgree (obviously) about enforcing the LTR ordering though.... I think that's up to the situation/user :)

-Shawn

-----Original Message-----
From: ***@gmail.com [mailto:***@gmail.com] On Behalf Of Slim Amamou
Sent: ,  29,  2012 8:32
To: Shawn Steele
Cc: iri issue tracker; draft-ietf-iri-bidi-***@tools.ietf.org; ***@it.aoyama.ac.jp; ***@diwan.com; public-***@w3.org
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

I support this view. I'd add that It's acceptable for me if the LTR order for the components is enforced on IRIs. The other solution is to state in the RFC that every IRI spec MUST define an overall ordering for the components either LTR or RTL.
(...)
Our investigation is that the parts of an IRI are treated like a list.  If I have a list like (Afra, Joe, Mary, Maysun, Mohamed, Phil), I'm not going to change the order of the list because of my language, I expect it to stay (AFRA, joe, mary, MAYSUN, MOHAMED, phil), not (AFRA, joe, mary, MOHAMED, MAYSUN, phil).  (Though I confess to mixing metaphors because I used alphabitization to sort my list and clearly in different scripts that'd be different.  I imagine I'm getting the idea across though, maybe it was an org chart that just so happens to have people arranged alphabetically by transliterated Latin name :)).
Similarly for http://www.microsoft.com/en-us/default.aspx, it's ordered something like a://b.c.d/e/f.g  -- A list can keep its order rendered as either a://b.c.d/e/f.g or g.f/e/b.c.b//:a   Which is appropriate depends on the situation, but if we start rearranging the order of the labels it gets really confusing.  At that point 99% of the populous would lose all hope of realizing there's an order to an IRI.  (Right now few people could correctly parse one anyway, but it'd get way worse).
IMO, which way the parts are ordered is less important than the fact they're consistently ordered.
-Shawn
--
Slim Amamou | سليم �
Slim Amamou
2012-03-29 16:18:19 UTC
Permalink
It can't be users choice. It's either LTR or RTL by the specs. Because
if the user from Bahrain on a trip to UK had to write down a URL
written on a bus in London, he would retranscribe it inverted.

On Thu, Mar 29, 2012 at 5:09 PM, Shawn Steele
I disgree (obviously) about enforcing the LTR ordering though....  I think that's up to the situation/user :)
--
Slim Amamou | سليم عمامو
http://alixsys.com
Shawn Steele
2012-03-29 16:43:55 UTC
Permalink
Yes, it gets complicated. However, while in London, everything else on the bus is in LTR context, while on their computer at home everything is in RTL context. The thinking is more like:

A) An Arabic user will (eventually) use a lot of Arabic domain names, so they'll be used to the ARABIC.WWW://http form.
B) So then http://www.english.com will seem funny to them, if their browser's still aligning stuff to the right, etc.

I think that there's enough other contextual differences when switching languages/travelling, that realizing that a URL on a double decker bus in London needs to be handled in the right way isn't that hard. Certainly there's a bigger immediate danger in driving on the wrong side of the road :)

For "us", we have the order the data is stored in. For readers I don't think there's a "perfect" solution that is unambiguous in all cases, so I would prefer to err on the side of having things readable in the user's normal way of thinking.

-Shawn

-----Original Message-----
From: ***@gmail.com [mailto:***@gmail.com] On Behalf Of Slim Amamou
Sent: ,  29,  2012 9:18
To: Shawn Steele
Cc: iri issue tracker; draft-ietf-iri-bidi-***@tools.ietf.org; ***@it.aoyama.ac.jp; ***@diwan.com; public-***@w3.org
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

It can't be users choice. It's either LTR or RTL by the specs. Because if the user from Bahrain on a trip to UK had to write down a URL written on a bus in London, he would retranscribe it inverted.
I disgree (obviously) about enforcing the LTR ordering though....  I
think that's up to the situation/user :)
--
Sl
Martin J. Dürst
2012-04-02 08:23:57 UTC
Permalink
Hello Slim,
Post by Slim Amamou
I support this view. I'd add that It's acceptable for me if the LTR
order for the components is enforced on IRIs. The other solution is to
state in the RFC that every IRI spec MUST define an overall ordering
for the components either LTR or RTL.
What do you mean by "every IRI spec"? There is only one IRI spec.
Currently, it's RFC 3987, but we are working on an update.

Regards, Martin.
Post by Slim Amamou
On Thu, Mar 29, 2012 at 4:10 PM, Shawn Steele
Post by Shawn Steele
(...)
Our investigation is that the parts of an IRI are treated like a list. If I have a list like (Afra, Joe, Mary, Maysun, Mohamed, Phil), I'm not going to change the order of the list because of my language, I expect it to stay (AFRA, joe, mary, MAYSUN, MOHAMED, phil), not (AFRA, joe, mary, MOHAMED, MAYSUN, phil). (Though I confess to mixing metaphors because I used alphabitization to sort my list and clearly in different scripts that'd be different. I imagine I'm getting the idea across though, maybe it was an org chart that just so happens to have people arranged alphabetically by transliterated Latin name :)).
Similarly for http://www.microsoft.com/en-us/default.aspx, it's ordered something like a://b.c.d/e/f.g -- A list can keep its order rendered as either a://b.c.d/e/f.g or g.f/e/b.c.b//:a Which is appropriate depends on the situation, but if we start rearranging the order of the labels it gets really confusing. At that point 99% of the populous would lose all hope of realizing there's an order to an IRI. (Right now few people could correctly parse one anyway, but it'd get way worse).
IMO, which way the parts are ordered is less important than the fact they're consistently ordered.
-Shawn
Slim Amamou
2012-04-02 10:00:51 UTC
Permalink
hello Martin,
I meant every scheme spec, sorry .
Post by Martin J. Dürst
Hello Slim,
Post by Slim Amamou
I support this view. I'd add that It's acceptable for me if the LTR
order for the components is enforced on IRIs. The other solution is to
state in the RFC that every IRI spec MUST define an overall ordering
for the components either LTR or RTL.
What do you mean by "every IRI spec"? There is only one IRI spec.
Currently, it's RFC 3987, but we are working on an update.
Regards, Martin.
Post by Slim Amamou
On Thu, Mar 29, 2012 at 4:10 PM, Shawn Steele
Post by Shawn Steele
(...)
Our investigation is that the parts of an IRI are treated like a list.
If I have a list like (Afra, Joe, Mary, Maysun, Mohamed, Phil), I'm not
going to change the order of the list because of my language, I expect it
to stay (AFRA, joe, mary, MAYSUN, MOHAMED, phil), not (AFRA, joe, mary,
MOHAMED, MAYSUN, phil). (Though I confess to mixing metaphors because I
used alphabitization to sort my list and clearly in different scripts
that'd be different. I imagine I'm getting the idea across though, maybe
it was an org chart that just so happens to have people arranged
alphabetically by transliterated Latin name :)).
Similarly for http://www.microsoft.com/en-**us/default.aspx<http://www.microsoft.com/en-us/default.aspx>,
it's ordered something like a://b.c.d/e/f.g -- A list can keep its order
rendered as either a://b.c.d/e/f.g or g.f/e/b.c.b//:a Which is
appropriate depends on the situation, but if we start rearranging the order
of the labels it gets really confusing. At that point 99% of the populous
would lose all hope of realizing there's an order to an IRI. (Right now
few people could correctly parse one anyway, but it'd get way worse).
IMO, which way the parts are ordered is less important than the fact
they're consistently ordered.
-Shawn
--
Slim Amamou | سليم عمامو
http://alixsys.com
iri issue tracker
2012-03-29 16:56:32 UTC
Permalink
#121: BIDI: Some users are requiring right-to-left label ordering.


Comment (by adil@…):

I think I should clarify what I meant:

Right now if you have a 'simple' URL that is all in Arabic it will be
rendered right-to-left even given the restrictions of this document. So,
using the normal bidi notation (capitals for rtl characters):
Logical order:
`http://ABC.DEF.GHI/JKL`
Appears as:
`http://LKJ/IHG.FED.CBA`
Or without the http.. :
`LKJ/IHG.FED.CBA`

This is why I believe the current situation satisfies the 'side of a bus
URL' criteria for a small subset of right-to-left URLs.

The point is to strictly define what that subset is and create tools and
documents to verify it so that web sites and browsers can display them.
Also within this subset of URLs it is possible to have browsers draw these
in the URL bar right-to-left and right aligned. But I do not know if this
document is the place for such a definition.
--
---------------------------+-----------------------------------------------
Reporter: shawnste@… | Owner: draft-ietf-iri-bidi-guidelines@…
Type: defect | Status: reopened
Priority: major | Milestone:
Component: bidi- | Version:
guidelines | Resolution:
Severity: - |
Keywords: bidi |
---------------------------+-----------------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:9>
iri <http://tools.ietf.org/wg/iri/>
Shawn Steele
2012-03-29 17:09:19 UTC
Permalink
Also within this subset of URLs it is possible to have browsers draw these in the URL bar
right-to-left and right aligned. But I do not know if this document is the place for such a definition.
The document explicitly prohibits alternate renderings, like LKJ/IHG.FED.CBA//:http

I find the current behavior very bad since it treats the IRI like unstructured text. However there is a structure; there's an order to the labels. If we'd never heard of the BIDI algorithm, our first attempt, from a clean slate, to solve this problem would not allow the ordering of the labels to be exchanged. The only reason we're considering that is because we've seen what the Bidi Algorithm does to other text in completely different contexts.

My requirements are:
1) The logical order of the parts MUST be preserved.
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1 & 2 is that the protocol has to go on the right
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL. I don't think we're going to get it perfect in our first pass.

At a minimum, I'd suggest that any RTL characters in the do
Martin J. Dürst
2012-04-02 02:32:18 UTC
Permalink
Sorry for the delay in writing this answer.
Post by Shawn Steele
Also within this subset of URLs it is possible to have browsers draw these in the URL bar
right-to-left and right aligned. But I do not know if this document is the place for such a definition.
The document explicitly prohibits alternate renderings, like LKJ/IHG.FED.CBA//:http
Yes, it currently does. I personally don't necessarily think we need to
keep it that strict. But we need to be very sure of what the trade-offs
are, and there are definitely very strong trade-offs.

One thing that may be possible to remove is the condition that the
embedding be LTR, thus also allowing RTL embedding. But I understand
that wouldn't yet make you happy.
Post by Shawn Steele
I find the current behavior very bad since it treats the IRI like unstructured text.
Indeed IRIs are treated like unstructured text, but that may not
necessarily be bad.
Post by Shawn Steele
However there is a structure; there's an order to the labels.
Yes. Some people are very aware of that structure, others aren't.
Post by Shawn Steele
If we'd never heard of the BIDI algorithm, our first attempt, from a clean slate, to solve this problem would not allow the ordering of the labels to be exchanged.
I think that was indeed the case, until we realized that in order to do
that, one of two things are needed:
1) You have to insert Bidi marks into the IRI, which means it's no
longer the same IRI, or
2) You end up with different displays between places that "know" there's
an IRI (e.g. browser address bar) and places that don't
Post by Shawn Steele
The only reason we're considering that is because we've seen what the Bidi Algorithm does to other text in completely different contexts.
Actually, the current solution was proposed by Mati Alluche, and he
argued that it would be possible for people to understand the ordering
because of the heuristics they use when reading mixed text:

Read some text in the main direction, if you meet text in the other
direction, jump to the end of that run of text and read "backwards",
then continue with the text in the main direction. That's a different
heuristic to the one you have used as an equivalent, namely the list
(which the Unicode Bidi Algorithm actually also would "mess up" so that
sequential RTL items would be ordered RTL overall; not sure what people
usually do in these cases, whether they fix it up or not).

Mati said that this would not necessarily help URI/IRI experts, but
might actually be quite easy for non-experts, potentially the easiest
solution (easier than the strict component logical order) for them. I'm
not in a location where I have enough non-IRI-expert average bidi users
around me to test this.
Post by Shawn Steele
1) The logical order of the parts MUST be preserved.
That sounds like a very logical requirement :-). As always in the IETF,
any arguments/data to support that would be very much appreciated (your
list equivalent is certainly counting towards that).
Post by Shawn Steele
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1& 2 is that the protocol has to go on the right
By protocol, do you mean the scheme name (such as ftp:, mailto:, http:,
https:,...)?
Post by Shawn Steele
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
You mean some flexibility depending on context? We could also make that
"MUST respect context". But then there's the problem that the context of
a side of a bus is rather vague :-).
Post by Shawn Steele
I don't think we're going to get it perfect in our first pass.
We are already at the second pass. The first pass was RFC 3987.
Post by Shawn Steele
At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
In my personal view, I think that might be overkill. I'm not sure I'd
want everything turned around just because of a few RTL characters. But
if that's what everybody agrees on, I won't stay in the way.


The really tough problem for anything that reorders by component (what
you call 'logical order of parts') is that it may be easy to write a
standard that says so, but it's difficult to implement. Any thoughts
about that?


Regards, Martin.
Shawn Steele
2012-04-02 17:55:22 UTC
Permalink
Post by Martin J. Dürst
Post by Martin J. Dürst
2) You end up with different displays between places that "know" there's
an IRI (e.g. browser address bar) and places that don't
That's unavoidable. People will follow this RFC or they won't. The Unicode Bidi Algorithm doesn't include this guidance, so plain text will also fail, though some apps may try to be "smarter". For years people will have different browser versions with different behaviors, etc. The UBA is also inconsistently applied, and at inconsistent revisions, so I think it's a bit presumptuous of us to think that anything we specify here could cause consistent rendering by our guidance :)

IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
Post by Martin J. Dürst
Actually, the current solution was proposed by Mati Alluche, and he
argued that it would be possible for people to understand the ordering
That doesn't match our investigation. That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA. Unfortunately, regardless of the approach, some training of the user community is likely required.
Post by Martin J. Dürst
Post by Martin J. Dürst
1) The logical order of the parts MUST be preserved.
That sounds like a very logical requirement :-). As always in the IETF,
any arguments/data to support that would be very much appreciated (your
list equivalent is certainly counting towards that).
I don't have a formal white paper user study. This comes from discussions with native bidi speakers, technical, non-technical, and in-between. Also from feedback from the community. This is how we realized that IRI's are best treated like the "list" analogy.

Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
Post by Martin J. Dürst
Post by Martin J. Dürst
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1& 2 is that the protocol has to go on the right
By protocol, do you mean the scheme name (such as ftp:, mailto:, http:,
https:,...)?
Post by Martin J. Dürst
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
You mean some flexibility depending on context? We could also make that
"MUST respect context". But then there's the problem that the context of
a side of a bus is rather vague :-).
Not if it's a bus in Cairo, or a bus in Washington DC. Though either is probably going to be a single script.
Post by Martin J. Dürst
Post by Martin J. Dürst
At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
In my personal view, I think that might be overkill. I'm not sure I'd
want everything turned around just because of a few RTL characters. But
if that's what everybody agrees on, I won't stay in the way.
IMO this is mostly a user preference. "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts. Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page. If that changes in the middle, I'd be unsuccessful.
Post by Martin J. Dürst
The really tough problem for anything that reorders by component (what
you call 'logical order of parts') is that it may be easy to write a
standard that says so, but it's difficult to implement. Any thoughts
about that?
Yes :) I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.

We also came up with a couple practical observations:
Many paths are "long". They are also likely mostly ASCII for the foreseeable future. If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.

Worse, if the path/query gets long enough, then you have 2 really bad options: Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LT
Adil Allawi
2012-04-02 22:10:03 UTC
Permalink
With regard to Shawn's comments. Would it be acceptable to say that a
Bidi IRI is only allowed to be ordered RTL if it is drawn with the
protocol (e.g. http://) and in a right-dominant context (i.e. it is not
embedded in a line of latin text).

In this way we can allow the RTL alignment with the caveat that the
user needs to be educated on the directional issues; but we would not
have the confusion of the order that the elements are appearing as the
"http://" will act as a visible direction guide.

Adil
Post by Shawn Steele
Post by Martin J. Dürst
Post by Martin J. Dürst
2) You end up with different displays between places that "know" there's
an IRI (e.g. browser address bar) and places that don't
That's unavoidable. People will follow this RFC or they won't. The Unicode Bidi Algorithm doesn't include this guidance, so plain text will also fail, though some apps may try to be "smarter". For years people will have different browser versions with different behaviors, etc. The UBA is also inconsistently applied, and at inconsistent revisions, so I think it's a bit presumptuous of us to think that anything we specify here could cause consistent rendering by our guidance :)
IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
Post by Martin J. Dürst
Actually, the current solution was proposed by Mati Alluche, and he
argued that it would be possible for people to understand the ordering
That doesn't match our investigation. That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA. Unfortunately, regardless of the approach, some training of the user community is likely required.
Post by Martin J. Dürst
Post by Martin J. Dürst
1) The logical order of the parts MUST be preserved.
That sounds like a very logical requirement :-). As always in the IETF,
any arguments/data to support that would be very much appreciated (your
list equivalent is certainly counting towards that).
I don't have a formal white paper user study. This comes from discussions with native bidi speakers, technical, non-technical, and in-between. Also from feedback from the community. This is how we realized that IRI's are best treated like the "list" analogy.
Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
Post by Martin J. Dürst
Post by Martin J. Dürst
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1& 2 is that the protocol has to go on the right
By protocol, do you mean the scheme name (such as ftp:, mailto:, http:,
https:,...)?
Post by Martin J. Dürst
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
You mean some flexibility depending on context? We could also make that
"MUST respect context". But then there's the problem that the context of
a side of a bus is rather vague :-).
Not if it's a bus in Cairo, or a bus in Washington DC. Though either is probably going to be a single script.
Post by Martin J. Dürst
Post by Martin J. Dürst
At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
In my personal view, I think that might be overkill. I'm not sure I'd
want everything turned around just because of a few RTL characters. But
if that's what everybody agrees on, I won't stay in the way.
IMO this is mostly a user preference. "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts. Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page. If that changes in the middle, I'd be unsuccessful.
Post by Martin J. Dürst
The really tough problem for anything that reorders by component (what
you call 'logical order of parts') is that it may be easy to write a
standard that says so, but it's difficult to implement. Any thoughts
about that?
Yes :) I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.
Many paths are "long". They are also likely mostly ASCII for the foreseeable future. If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.
Worse, if the path/query gets long enough, then you have 2 really bad options: Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LTR textbox, impacting the usability of the RTL app.
Shawn Steele
2012-04-02 22:19:20 UTC
Permalink
I don't see that helps very much. A bare domain name by itself that was entirely Arabic would reasonably be ordered from right to left in an Arabic document, even if it didn't have an http. Clearly it'd be a helpful indicator that this was an IRI though.

I think that following the document's context is reasonable if you're missing other indicators, but I don't think it's possible to completely avoid confusion, if for no other reason than cut & paste from a compliant app to an older app will likely cause differences in display for the same binary representation.

-Shawn

-----Original Message-----
From: Adil Allawi [mailto:***@diwan.com]
Sent: ,  02,  2012 15:10
To: Shawn Steele
Cc: public-***@w3.org; "Martin J. Dürst"
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

With regard to Shawn's comments. Would it be acceptable to say that a Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text).

In this way we can allow the RTL alignment with the caveat that the user needs to be educated on the directional issues; but we would not have the confusion of the order that the elements are appearing as the "http://" will act as a visible direction guide.

Adil
Post by Shawn Steele
Post by Martin J. Dürst
Post by Martin J. Dürst
2) You end up with different displays between places that "know"
there's an IRI (e.g. browser address bar) and places that don't
That's unavoidable. People will follow this RFC or they won't. The
Unicode Bidi Algorithm doesn't include this guidance, so plain text
will also fail, though some apps may try to be "smarter". For years
people will have different browser versions with different behaviors,
etc. The UBA is also inconsistently applied, and at inconsistent
revisions, so I think it's a bit presumptuous of us to think that
anything we specify here could cause consistent rendering by our
guidance :)
IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
Post by Martin J. Dürst
Actually, the current solution was proposed by Mati Alluche, and he
argued that it would be possible for people to understand the
That doesn't match our investigation. That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA. Unfortunately, regardless of the approach, some training of the user community is likely required.
Post by Martin J. Dürst
Post by Martin J. Dürst
1) The logical order of the parts MUST be preserved.
That sounds like a very logical requirement :-). As always in the
IETF, any arguments/data to support that would be very much
appreciated (your list equivalent is certainly counting towards that).
I don't have a formal white paper user study. This comes from discussions with native bidi speakers, technical, non-technical, and in-between. Also from feedback from the community. This is how we realized that IRI's are best treated like the "list" analogy.
Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
Post by Martin J. Dürst
Post by Martin J. Dürst
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1& 2 is that the protocol has to go on the right
By protocol, do you mean the scheme name (such as ftp:, mailto:,
http:, https:,...)?
Post by Martin J. Dürst
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
You mean some flexibility depending on context? We could also make
that "MUST respect context". But then there's the problem that the
context of a side of a bus is rather vague :-).
Not if it's a bus in Cairo, or a bus in Washington DC. Though either is probably going to be a single script.
Post by Martin J. Dürst
Post by Martin J. Dürst
At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
In my personal view, I think that might be overkill. I'm not sure I'd
want everything turned around just because of a few RTL characters.
But if that's what everybody agrees on, I won't stay in the way.
IMO this is mostly a user preference. "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts. Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page. If that changes in the middle, I'd be unsuccessful.
Post by Martin J. Dürst
The really tough problem for anything that reorders by component
(what you call 'logical order of parts') is that it may be easy to
write a standard that says so, but it's difficult to implement. Any
thoughts about that?
Yes :) I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.
Many paths are "long". They are also likely mostly ASCII for the foreseeable future. If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.
Worse, if the path/query gets long enough, then you have 2 really bad options: Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LTR textbox, impacting the usability of the RTL app
Adil Allawi
2012-04-02 22:31:28 UTC
Permalink
OK. How about saying that a Bidi IRI is only allowed to be ordered RTL
if it is either
- drawn with the protocol (e.g. http://) and in a right-dominant
context (i.e. it is not embedded in a line of Latin text)
- or that the IRI only contains either neutrals or strong
right-to-left characters.

This way we can be sure that the IRI would not reorder unexpectedly.

The cut and paste is an interesting issue. If we forces a single
direction then it would be OK - but that would not solve your problem.

Adil
Post by Shawn Steele
I don't see that helps very much. A bare domain name by itself that was entirely Arabic would reasonably be ordered from right to left in an Arabic document, even if it didn't have an http. Clearly it'd be a helpful indicator that this was an IRI though.
I think that following the document's context is reasonable if you're missing other indicators, but I don't think it's possible to completely avoid confusion, if for no other reason than cut& paste from a compliant app to an older app will likely cause differences in display for the same binary representation.
-Shawn
-----Original Message-----
Sent: ,  02,  2012 15:10
To: Shawn Steele
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.
With regard to Shawn's comments. Would it be acceptable to say that a Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text).
In this way we can allow the RTL alignment with the caveat that the user needs to be educated on the directional issues; but we would not have the confusion of the order that the elements are appearing as the "http://" will act as a visible direction guide.
Adil
Post by Shawn Steele
Post by Martin J. Dürst
Post by Martin J. Dürst
2) You end up with different displays between places that "know"
there's an IRI (e.g. browser address bar) and places that don't
That's unavoidable. People will follow this RFC or they won't. The
Unicode Bidi Algorithm doesn't include this guidance, so plain text
will also fail, though some apps may try to be "smarter". For years
people will have different browser versions with different behaviors,
etc. The UBA is also inconsistently applied, and at inconsistent
revisions, so I think it's a bit presumptuous of us to think that
anything we specify here could cause consistent rendering by our
guidance :)
IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
Post by Martin J. Dürst
Actually, the current solution was proposed by Mati Alluche, and he
argued that it would be possible for people to understand the
That doesn't match our investigation. That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA. Unfortunately, regardless of the approach, some training of the user community is likely required.
Post by Martin J. Dürst
Post by Martin J. Dürst
1) The logical order of the parts MUST be preserved.
That sounds like a very logical requirement :-). As always in the
IETF, any arguments/data to support that would be very much
appreciated (your list equivalent is certainly counting towards that).
I don't have a formal white paper user study. This comes from discussions with native bidi speakers, technical, non-technical, and in-between. Also from feedback from the community. This is how we realized that IRI's are best treated like the "list" analogy.
Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
Post by Martin J. Dürst
Post by Martin J. Dürst
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1& 2 is that the protocol has to go on the right
By protocol, do you mean the scheme name (such as ftp:, mailto:,
http:, https:,...)?
Post by Martin J. Dürst
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
You mean some flexibility depending on context? We could also make
that "MUST respect context". But then there's the problem that the
context of a side of a bus is rather vague :-).
Not if it's a bus in Cairo, or a bus in Washington DC. Though either is probably going to be a single script.
Post by Martin J. Dürst
Post by Martin J. Dürst
At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
In my personal view, I think that might be overkill. I'm not sure I'd
want everything turned around just because of a few RTL characters.
But if that's what everybody agrees on, I won't stay in the way.
IMO this is mostly a user preference. "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts. Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page. If that changes in the middle, I'd be unsuccessful.
Post by Martin J. Dürst
The really tough problem for anything that reorders by component
(what you call 'logical order of parts') is that it may be easy to
write a standard that says so, but it's difficult to implement. Any
thoughts about that?
Yes :) I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.
Many paths are "long". They are also likely mostly ASCII for the foreseeable future. If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.
Worse, if the path/query gets long enough, then you have 2 really bad options: Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LTR textbox, impacting the usability of the RTL app.
Shawn Steele
2012-04-02 23:59:07 UTC
Permalink
I'm not sure what "reorder unexpectedly" means.

Presumably an Arabic speaker that went to an internet site: LABEL2.LABEL1 would definitely NOT expect LABEL1.LABEL2/index.html just because we now have added "index.html" to it. (And LABEL2.LABEL1/index.html is far worse from our investigations).

Your suggestion might make sense for a user that normally only sees LTR text (like me), but for a user that normally sees RTL text, you could argue the opposite: That unless there's strong left-dominant context (eg: it IS embedded in a line of Latin text), that it should be ordered from RTL.

-Shawn

-----Original Message-----
From: Adil Allawi [mailto:***@diwan.com]
Sent: Monday, April 2, 2012 3:31 PM
To: Shawn Steele
Cc: public-***@w3.org; "Martin J. Dürst"
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

OK. How about saying that a Bidi IRI is only allowed to be ordered RTL if it is either
- drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of Latin text)
- or that the IRI only contains either neutrals or strong right-to-left characters.

This way we can be sure that the IRI would not reorder unexpectedly.

The cut and paste is an interesting issue. If we forces a single direction then it would be OK - but that would not solve your problem.

Adil
Post by Shawn Steele
I don't see that helps very much. A bare domain name by itself that was entirely Arabic would reasonably be ordered from right to left in an Arabic document, even if it didn't have an http. Clearly it'd be a helpful indicator that this was an IRI though.
I think that following the document's context is reasonable if you're missing other indicators, but I don't think it's possible to completely avoid confusion, if for no other reason than cut& paste from a compliant app to an older app will likely cause differences in display for the same binary representation.
-Shawn
-----Original Message-----
Sent: ,  02,  2012 15:10
To: Shawn Steele
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.
With regard to Shawn's comments. Would it be acceptable to say that a Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text).
In this way we can allow the RTL alignment with the caveat that the user needs to be educated on the directional issues; but we would not have the confusion of the order that the elements are appearing as the "http://" will act as a visible direction guide.
Adil
Post by Shawn Steele
Post by Martin J. Dürst
Post by Martin J. Dürst
2) You end up with different displays between places that "know"
there's an IRI (e.g. browser address bar) and places that don't
That's unavoidable. People will follow this RFC or they won't. The
Unicode Bidi Algorithm doesn't include this guidance, so plain text
will also fail, though some apps may try to be "smarter". For years
people will have different browser versions with different behaviors,
etc. The UBA is also inconsistently applied, and at inconsistent
revisions, so I think it's a bit presumptuous of us to think that
anything we specify here could cause consistent rendering by our
guidance :)
IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
Post by Martin J. Dürst
Actually, the current solution was proposed by Mati Alluche, and he
argued that it would be possible for people to understand the
That doesn't match our investigation. That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA. Unfortunately, regardless of the approach, some training of the user community is likely required.
Post by Martin J. Dürst
Post by Martin J. Dürst
1) The logical order of the parts MUST be preserved.
That sounds like a very logical requirement :-). As always in the
IETF, any arguments/data to support that would be very much
appreciated (your list equivalent is certainly counting towards that).
I don't have a formal white paper user study. This comes from discussions with native bidi speakers, technical, non-technical, and in-between. Also from feedback from the community. This is how we realized that IRI's are best treated like the "list" analogy.
Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
Post by Martin J. Dürst
Post by Martin J. Dürst
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1& 2 is that the protocol has to go on the right
By protocol, do you mean the scheme name (such as ftp:, mailto:,
http:, https:,...)?
Post by Martin J. Dürst
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
You mean some flexibility depending on context? We could also make
that "MUST respect context". But then there's the problem that the
context of a side of a bus is rather vague :-).
Not if it's a bus in Cairo, or a bus in Washington DC. Though either is probably going to be a single script.
Post by Martin J. Dürst
Post by Martin J. Dürst
At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
In my personal view, I think that might be overkill. I'm not sure
I'd want everything turned around just because of a few RTL characters.
But if that's what everybody agrees on, I won't stay in the way.
IMO this is mostly a user preference. "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts. Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page. If that changes in the middle, I'd be unsuccessful.
Post by Martin J. Dürst
The really tough problem for anything that reorders by component
(what you call 'logical order of parts') is that it may be easy to
write a standard that says so, but it's difficult to implement. Any
thoughts about that?
Yes :) I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.
Many paths are "long". They are also likely mostly ASCII for the foreseeable future. If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.
Worse, if the path/query gets long enough, then you have 2 really bad options: Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LTR textbox, impacting the usability of the RTL app.
Larry Masinter
2012-04-03 09:23:22 UTC
Permalink
I'm really having trouble understanding this discussion.

" Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text)."

I don't know what it "allowed" means here.

I have an IRI which, in logical order, starts with a (ASCII) scheme, includes a RTL domain name, and a path, with RTL, LTR, or mixed components.

Who would be "allowed" to do what? In what circumstances? What would be the consequence of them not doing this?

I don't understand if you're talking about restrictions on allowed characters in IRI, guidelines for software for displaying IRIs, guidelines for encoding IRIs in "plain" RTL or RTL text, or something else ....

Some examples would help enormously.

My fear is that we'll once again get to a set of requirements that you're happy with but which can't be implemented, which won't help us.


-----Original Message-----
From: Shawn Steele [mailto:***@microsoft.com]
Sent: Tuesday, April 03, 2012 1:59 AM
To: Adil Allawi
Cc: public-***@w3.org; "Martin J. Dürst"
Subject: RE: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

I'm not sure what "reorder unexpectedly" means.

Presumably an Arabic speaker that went to an internet site: LABEL2.LABEL1 would definitely NOT expect LABEL1.LABEL2/index.html just because we now have added "index.html" to it. (And LABEL2.LABEL1/index.html is far worse from our investigations).

Your suggestion might make sense for a user that normally only sees LTR text (like me), but for a user that normally sees RTL text, you could argue the opposite: That unless there's strong left-dominant context (eg: it IS embedded in a line of Latin text), that it should be ordered from RTL.

-Shawn

-----Original Message-----
From: Adil Allawi [mailto:***@diwan.com]
Sent: Monday, April 2, 2012 3:31 PM
To: Shawn Steele
Cc: public-***@w3.org; "Martin J. Dürst"
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

OK. How about saying that a Bidi IRI is only allowed to be ordered RTL if it is either
- drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of Latin text)
- or that the IRI only contains either neutrals or strong right-to-left characters.

This way we can be sure that the IRI would not reorder unexpectedly.

The cut and paste is an interesting issue. If we forces a single direction then it would be OK - but that would not solve your problem.

Adil
Post by Shawn Steele
I don't see that helps very much. A bare domain name by itself that was entirely Arabic would reasonably be ordered from right to left in an Arabic document, even if it didn't have an http. Clearly it'd be a helpful indicator that this was an IRI though.
I think that following the document's context is reasonable if you're missing other indicators, but I don't think it's possible to completely avoid confusion, if for no other reason than cut& paste from a compliant app to an older app will likely cause differences in display for the same binary representation.
-Shawn
-----Original Message-----
Sent: ,  02,  2012 15:10
To: Shawn Steele
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.
With regard to Shawn's comments. Would it be acceptable to say that a Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text).
In this way we can allow the RTL alignment with the caveat that the user needs to be educated on the directional issues; but we would not have the confusion of the order that the elements are appearing as the "http://" will act as a visible direction guide.
Adil
Post by Shawn Steele
Post by Martin J. Dürst
Post by Martin J. Dürst
2) You end up with different displays between places that "know"
there's an IRI (e.g. browser address bar) and places that don't
That's unavoidable. People will follow this RFC or they won't. The
Unicode Bidi Algorithm doesn't include this guidance, so plain text
will also fail, though some apps may try to be "smarter". For years
people will have different browser versions with different behaviors,
etc. The UBA is also inconsistently applied, and at inconsistent
revisions, so I think it's a bit presumptuous of us to think that
anything we specify here could cause consistent rendering by our
guidance :)
IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
Post by Martin J. Dürst
Actually, the current solution was proposed by Mati Alluche, and he
argued that it would be possible for people to understand the
That doesn't match our investigation. That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA. Unfortunately, regardless of the approach, some training of the user community is likely required.
Post by Martin J. Dürst
Post by Martin J. Dürst
1) The logical order of the parts MUST be preserved.
That sounds like a very logical requirement :-). As always in the
IETF, any arguments/data to support that would be very much
appreciated (your list equivalent is certainly counting towards that).
I don't have a formal white paper user study. This comes from discussions with native bidi speakers, technical, non-technical, and in-between. Also from feedback from the community. This is how we realized that IRI's are best treated like the "list" analogy.
Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
Post by Martin J. Dürst
Post by Martin J. Dürst
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1& 2 is that the protocol has to go on the right
By protocol, do you mean the scheme name (such as ftp:, mailto:,
http:, https:,...)?
Post by Martin J. Dürst
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
You mean some flexibility depending on context? We could also make
that "MUST respect context". But then there's the problem that the
context of a side of a bus is rather vague :-).
Not if it's a bus in Cairo, or a bus in Washington DC. Though either is probably going to be a single script.
Post by Martin J. Dürst
Post by Martin J. Dürst
At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
In my personal view, I think that might be overkill. I'm not sure
I'd want everything turned around just because of a few RTL characters.
But if that's what everybody agrees on, I won't stay in the way.
IMO this is mostly a user preference. "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts. Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page. If that changes in the middle, I'd be unsuccessful.
Post by Martin J. Dürst
The really tough problem for anything that reorders by component
(what you call 'logical order of parts') is that it may be easy to
write a standard that says so, but it's difficult to implement. Any
thoughts about that?
Yes :) I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.
Many paths are "long". They are also likely mostly ASCII for the foreseeable future. If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.
Worse, if the path/query gets long enough, then you have 2 really bad options: Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LTR textbox, impacting the usa
Shawn Steele
2012-04-03 16:29:08 UTC
Permalink
AFAIK, the scheme is always ASCII (though some cultures want native script schemes so they don't have to do keyboard switching, but that seems like a different problem).

For http, domain names are Unicode, though a restricted subset, and the path is often currently ASCII or % encoded, but could presumably be bigger than that.
For mail, the domain names are also IDN subset of Unicode, however (now) the local part is legally anything > 0x7f, though the <0x80 set is restricted.

I think the legal characters are defined by the schemes? Though EAI mail is clearly overly permissive, and I wouldn't mind disallowing BIDI marks in IRIs if it helped display.

The schemes have delimiters (@, ., /, etc.), which are all bidi character type ON. As far as implementation goes, probably treating those all like L or R (depending on the mode) might "solve" the problem, as that would force the sections into separate units, which would then maintain the same order as the underlying binary representation (label 1 first, label 2 next, etc). That doesn't seem terribly difficult from an implementation perspective.

One big thing that I'm willing to dump is that you and I (or an Arabic speaker) might "see" the same IRI identically at all times. Eg: http://WWW.ARABIC.TLD vs TLD.ARABIC.WWW//:http. However if either of us read it over the phone, we'd all read the same thing (assuming we knew how to read Arabic at all.)

As the discussion continues, I'm also getting more entrenched in my position that this is very likely a user preference. Particularly WRT the usability of the address bar. I'll see if I can find time to make some screen shots that demonstrate the problem.

-Shawn

-----Original Message-----
From: Larry Masinter [mailto:***@adobe.com]
Sent: ,  03,  2012 2:23
To: Shawn Steele; Adil Allawi
Cc: public-***@w3.org; "Martin J. Dürst"
Subject: RE: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

I'm really having trouble understanding this discussion.

" Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text)."

I don't know what it "allowed" means here.

I have an IRI which, in logical order, starts with a (ASCII) scheme, includes a RTL domain name, and a path, with RTL, LTR, or mixed components.

Who would be "allowed" to do what? In what circumstances? What would be the consequence of them not doing this?

I don't understand if you're talking about restrictions on allowed characters in IRI, guidelines for software for displaying IRIs, guidelines for encoding IRIs in "plain" RTL or RTL text, or something else ....

Some examples would help enormously.

My fear is that we'll once again get to a set of requirements that you're happy with but which can't be implemented, which won't help us.


-----Original Message-----
From: Shawn Steele [mailto:***@microsoft.com]
Sent: Tuesday, April 03, 2012 1:59 AM
To: Adil Allawi
Cc: public-***@w3.org; "Martin J. Dürst"
Subject: RE: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

I'm not sure what "reorder unexpectedly" means.

Presumably an Arabic speaker that went to an internet site: LABEL2.LABEL1 would definitely NOT expect LABEL1.LABEL2/index.html just because we now have added "index.html" to it. (And LABEL2.LABEL1/index.html is far worse from our investigations).

Your suggestion might make sense for a user that normally only sees LTR text (like me), but for a user that normally sees RTL text, you could argue the opposite: That unless there's strong left-dominant context (eg: it IS embedded in a line of Latin text), that it should be ordered from RTL.

-Shawn

-----Original Message-----
From: Adil Allawi [mailto:***@diwan.com]
Sent: Monday, April 2, 2012 3:31 PM
To: Shawn Steele
Cc: public-***@w3.org; "Martin J. Dürst"
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

OK. How about saying that a Bidi IRI is only allowed to be ordered RTL if it is either
- drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of Latin text)
- or that the IRI only contains either neutrals or strong right-to-left characters.

This way we can be sure that the IRI would not reorder unexpectedly.

The cut and paste is an interesting issue. If we forces a single direction then it would be OK - but that would not solve your problem.

Adil
Post by Shawn Steele
I don't see that helps very much. A bare domain name by itself that was entirely Arabic would reasonably be ordered from right to left in an Arabic document, even if it didn't have an http. Clearly it'd be a helpful indicator that this was an IRI though.
I think that following the document's context is reasonable if you're missing other indicators, but I don't think it's possible to completely avoid confusion, if for no other reason than cut& paste from a compliant app to an older app will likely cause differences in display for the same binary representation.
-Shawn
-----Original Message-----
Sent: ,  02,  2012 15:10
To: Shawn Steele
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.
With regard to Shawn's comments. Would it be acceptable to say that a Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text).
In this way we can allow the RTL alignment with the caveat that the user needs to be educated on the directional issues; but we would not have the confusion of the order that the elements are appearing as the "http://" will act as a visible direction guide.
Adil
Post by Shawn Steele
Post by Martin J. Dürst
Post by Martin J. Dürst
2) You end up with different displays between places that "know"
there's an IRI (e.g. browser address bar) and places that don't
That's unavoidable. People will follow this RFC or they won't. The
Unicode Bidi Algorithm doesn't include this guidance, so plain text
will also fail, though some apps may try to be "smarter". For years
people will have different browser versions with different behaviors,
etc. The UBA is also inconsistently applied, and at inconsistent
revisions, so I think it's a bit presumptuous of us to think that
anything we specify here could cause consistent rendering by our
guidance :)
IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
Post by Martin J. Dürst
Actually, the current solution was proposed by Mati Alluche, and he
argued that it would be possible for people to understand the
That doesn't match our investigation. That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA. Unfortunately, regardless of the approach, some training of the user community is likely required.
Post by Martin J. Dürst
Post by Martin J. Dürst
1) The logical order of the parts MUST be preserved.
That sounds like a very logical requirement :-). As always in the
IETF, any arguments/data to support that would be very much
appreciated (your list equivalent is certainly counting towards that).
I don't have a formal white paper user study. This comes from discussions with native bidi speakers, technical, non-technical, and in-between. Also from feedback from the community. This is how we realized that IRI's are best treated like the "list" analogy.
Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
Post by Martin J. Dürst
Post by Martin J. Dürst
2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
* So the corollary of 1& 2 is that the protocol has to go on the right
By protocol, do you mean the scheme name (such as ftp:, mailto:,
http:, https:,...)?
Post by Martin J. Dürst
3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
You mean some flexibility depending on context? We could also make
that "MUST respect context". But then there's the problem that the
context of a side of a bus is rather vague :-).
Not if it's a bus in Cairo, or a bus in Washington DC. Though either is probably going to be a single script.
Post by Martin J. Dürst
Post by Martin J. Dürst
At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
In my personal view, I think that might be overkill. I'm not sure
I'd want everything turned around just because of a few RTL characters.
But if that's what everybody agrees on, I won't stay in the way.
IMO this is mostly a user preference. "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts. Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page. If that changes in the middle, I'd be unsuccessful.
Post by Martin J. Dürst
The really tough problem for anything that reorders by component
(what you call 'logical order of parts') is that it may be easy to
write a standard that says so, but it's difficult to implement. Any
thoughts about that?
Yes :) I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.
Many paths are "long". They are also likely mostly ASCII for the foreseeable future. If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.
Worse, if the path/query gets long enough, then you have 2 really bad options: Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LTR textbox, impacting the usability of the RTL app.
Loading...