Discussion:
[iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for
iri issue tracker
2012-03-11 12:01:57 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

What term should we use for the kind of text that the Unicode Bidi
Algorithm was designed for. RFC 3987 and 3987bis use "running text". bidi-
guidelines (-01) changed to "plain text".

We have a definition for running text at
http://tools.ietf.org/html/draft-ietf-iri-3987bis-10#section-1.3:

running text: Human text (paragraphs, sentences, phrases) with
syntax according to orthographic conventions of a natural
language, as opposed to syntax defined for ease of processing by
machines (e.g., markup, programming languages).

In RFC 3987, there are two uses:

The Unicode Bidirectional Algorithm is designed mainly for running text.

[UNIXML] is written in the context of running text rather than in that of
identifiers.

The first use moved to bidi-guidelines, but the second use is still in
3987bis. In both cases, the term "plain text" isn't appropriate, because
the main use of "plain text" is to distinguish from "fancy text", i.e.
text with styling,... But in both usages above, the distinction between
"plain text" and "fancy text" is irrelevant. See also
http://en.wikipedia.org/wiki/Plain_text.
--
----------------------+--------------------------------------
Reporter: duerst@… | Owner: draft-ietf-iri-3987bis@…
Type: defect | Status: new
Priority: major | Milestone:
Component: 3987bis | Version:
Severity: - | Keywords:
----------------------+--------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118>
iri <http://tools.ietf.org/wg/iri/>
Matitiahu Allouche
2012-03-11 13:07:44 UTC
Permalink
Since the question is related to Unicode (the kind of text that the
Unicode Bidi Algorithm was designed for), maybe we should check the
Unicode definition for "plain text". In the Unicode glossary (
http://unicode.org/glossary/#P), we find:
Plain Text. Computer-encoded text that consists only of a sequence of code
points from a given standard, with no other formatting or structural
information. Plain text interchange is commonly used between computer
systems that do not share higher-level protocols. (See also rich text.)


Personally, I find this definition appropriate for "the kind of text that
the Unicode Bidi Algorithm was designed for", and I prefer "plain text"
over "running text". It is also my experience that "plain text" is much
more in use in Unicode circles than "running text".

Shalom (Regards), Mati
Bidi Architect
Globalization Center Of Competency - Bidirectional Scripts
IBM Israel
Mobile: +972 52 2554160




From: "iri issue tracker" <trac+***@trac.tools.ietf.org>
To: draft-ietf-iri-***@tools.ietf.org, ***@it.aoyama.ac.jp
Cc: public-***@w3.org
Date: 11/03/2012 14:03
Subject: [iri] #118: What term to use for the kind of text that the
Unicode Bidi Algorithm was designed for



#118: What term to use for the kind of text that the Unicode Bidi
Algorithm was
designed for

What term should we use for the kind of text that the Unicode Bidi
Algorithm was designed for. RFC 3987 and 3987bis use "running text".
bidi-
guidelines (-01) changed to "plain text".

We have a definition for running text at
http://tools.ietf.org/html/draft-ietf-iri-3987bis-10#section-1.3:

running text: Human text (paragraphs, sentences, phrases) with
syntax according to orthographic conventions of a natural
language, as opposed to syntax defined for ease of processing by
machines (e.g., markup, programming languages).

In RFC 3987, there are two uses:

The Unicode Bidirectional Algorithm is designed mainly for running text.

[UNIXML] is written in the context of running text rather than in that of
identifiers.

The first use moved to bidi-guidelines, but the second use is still in
3987bis. In both cases, the term "plain text" isn't appropriate, because
the main use of "plain text" is to distinguish from "fancy text", i.e.
text with styling,... But in both usages above, the distinction between
"plain text" and "fancy text" is irrelevant. See also
http://en.wikipedia.org/wiki/Plain_text.

--
----------------------+--------------------------------------
Reporter: duerst@… | Owner: draft-ietf-iri-3987bis@…
Type: defect | Status: new
Priority: major | Milestone:
Component: 3987bis | Version:
Severity: - | Keywords:
----------------------+--------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118>
iri <http://tools.ietf.org/wg/iri/>
Martin J. Dürst
2012-03-12 03:05:56 UTC
Permalink
Hello Mati,

Many thanks for your comments.
Post by Matitiahu Allouche
Since the question is related to Unicode (the kind of text that the
Unicode Bidi Algorithm was designed for), maybe we should check the
Unicode definition for "plain text". In the Unicode glossary (
Plain Text. Computer-encoded text that consists only of a sequence of code
points from a given standard, with no other formatting or structural
information. Plain text interchange is commonly used between computer
systems that do not share higher-level protocols. (See also rich text.)
Personally, I find this definition appropriate for "the kind of text that
the Unicode Bidi Algorithm was designed for", and I prefer "plain text"
over "running text". It is also my experience that "plain text" is much
more in use in Unicode circles than "running text".
I agree that if we look at the distinction between plain text and rich
text, then it is appropriate to say that the Bidi Algorithm has been
designed for plain text rather than for rich text. But in the two places
in the spec where we have been using "running text" for the past seven
or more years, it's NOT this distinction between plain text and rich
text that we are after.

To be more specific, it's irrelevant whether an IRI shows up in a plain
text file (.txt) or a rich text file (e.g. MS Word, HTML with
stylesheets,...). We have exactly the same problems with bidi IRIs in
plain text as we have in rich text. This is because although the Bidi
Algorithm was designed for plain text, essentially the same algorithm is
used for rich text. For MS Word, there are usually a few tweaks where it
does not behave exactly the same as the Unicode Bidi Algorithm (the last
one of them is the special behavior regarding parentheses that was
presented and discussed at last year's IUC), but the basics are the
same. Rendered HTML also uses the Unicode Bidi Algorithm for its basic
features.

What the spec is referring to is the fact that the Bidi Algorithm was
designed for sequences of characters, words, and punctuation such as
they turn up in letters, newspaper articles, explanatory text in books,
and so on, as opposed to sequences of characters as they turn up in
artificial stuff such as IRIs, markup source, programming languages, and
so on.

I'm not sure whether "running text" is the best term for this, but I am
very sure "plain text" is wrong for where we want to use it, because
IRIs, markup source, programs, and so on are in many if not most cases
plain text. Running text at least seems to come close, see e.g. the
definition at http://en.wiktionary.org/wiki/running_text.

Regards, Martin.
Post by Matitiahu Allouche
Shalom (Regards), Mati
Bidi Architect
Globalization Center Of Competency - Bidirectional Scripts
IBM Israel
Mobile: +972 52 2554160
Date: 11/03/2012 14:03
Subject: [iri] #118: What term to use for the kind of text that the
Unicode Bidi Algorithm was designed for
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for
What term should we use for the kind of text that the Unicode Bidi
Algorithm was designed for. RFC 3987 and 3987bis use "running text". bidi-
guidelines (-01) changed to "plain text".
We have a definition for running text at
running text: Human text (paragraphs, sentences, phrases) with
syntax according to orthographic conventions of a natural
language, as opposed to syntax defined for ease of processing by
machines (e.g., markup, programming languages).
The Unicode Bidirectional Algorithm is designed mainly for running text.
[UNIXML] is written in the context of running text rather than in that of
identifiers.
The first use moved to bidi-guidelines, but the second use is still in
3987bis. In both cases, the term "plain text" isn't appropriate, because
the main use of "plain text" is to distinguish from "fancy text", i.e.
text with styling,... But in both usages above, the distinction between
"plain text" and "fancy text" is irrelevant. See also
http://en.wikipedia.org/wiki/Plain_text.
Phillips, Addison
2012-03-12 03:52:17 UTC
Permalink
I'm not sure whether "running text" is the best term for this, but I am very sure
"plain text" is wrong for where we want to use it, because IRIs, markup source,
programs, and so on are in many if not most cases plain text. Running text at
least seems to come close, see e.g. the definition at
http://en.wiktionary.org/wiki/running_text.
I'm pretty sure that 'running text' is too limiting as well. It there a need for a specialized term here at all? How about 'text' as the term? Even such "off-line" formats as napkins and bus sides qualify then. As in: "Where an IRI appears in text...."

I notice that the term "running text" in section 1.3 appears exactly once in the document and there only provides a sort o
iri issue tracker
2012-03-13 13:09:08 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for
Post by Matitiahu Allouche
Since the question is related to Unicode (the kind of text that the
Unicode Bidi Algorithm was designed for), maybe we should check the
Unicode definition for "plain text". In the Unicode glossary
Post by Matitiahu Allouche
Plain Text. Computer-encoded text that consists only of a sequence of
code points from a given standard, with no other formatting or structural
information. Plain text interchange is commonly used between computer
systems that do not share higher-level protocols. (See also
[http://unicode.org/glossary/#rich_text rich text].)
Post by Matitiahu Allouche
Personally, I find this definition appropriate for "the kind of text that
the Unicode Bidi Algorithm was designed for", and I prefer "plain text"
over "running text". It is also my experience that "plain text" is much
more in use in Unicode circles than "running text".
--
----------------------+---------------------------------------
Reporter: duerst@… | Owner: draft-ietf-iri-3987bis@…
Type: defect | Status: new
Priority: major | Milestone:
Component: 3987bis | Version:
Severity: - | Resolution:
Keywords: |
----------------------+---------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:1>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-13 13:10:32 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for
Post by Martin J. Dürst
I agree that if we look at the distinction between plain text and rich
text, then it is appropriate to say that the Bidi Algorithm has been
designed for plain text rather than for rich text. But in the two places
in the spec where we have been using "running text" for the past seven or
more years, it's NOT this distinction between plain text and rich text
that we are after.
Post by Martin J. Dürst
To be more specific, it's irrelevant whether an IRI shows up in a plain
text file (.txt) or a rich text file (e.g. MS Word, HTML with
stylesheets,...). We have exactly the same problems with bidi IRIs in
plain text as we have in rich text. This is because although the Bidi
Algorithm was designed for plain text, essentially the same algorithm is
used for rich text. For MS Word, there are usually a few tweaks where it
does not behave exactly the same as the Unicode Bidi Algorithm (the last
one of them is the special behavior regarding parentheses that was
presented and discussed at last year's IUC), but the basics are the same.
Rendered HTML also uses the Unicode Bidi Algorithm for its basic features.
Post by Martin J. Dürst
What the spec is referring to is the fact that the Bidi Algorithm was
designed for sequences of characters, words, and punctuation such as they
turn up in letters, newspaper articles, explanatory text in books, and so
on, as opposed to sequences of characters as they turn up in artificial
stuff such as IRIs, markup source, programming languages, and so on.
Post by Martin J. Dürst
I'm not sure whether "running text" is the best term for this, but I am
very sure "plain text" is wrong for where we want to use it, because IRIs,
markup source, programs, and so on are in many if not most cases plain
text. Running text at least seems to come close, see e.g. the definition
at http://en.wiktionary.org/wiki/running_text.
--
----------------------+---------------------------------------
Reporter: duerst@… | Owner: draft-ietf-iri-3987bis@…
Type: defect | Status: new
Priority: major | Milestone:
Component: 3987bis | Version:
Severity: - | Resolution:
Keywords: |
----------------------+---------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:2>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-13 13:11:31 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for
Post by Phillips, Addison
I'm pretty sure that 'running text' is too limiting as well. It there a
need for a specialized term here at all? How about 'text' as the term?
Even such "off-line" formats as napkins and bus sides qualify then. As in:
"Where an IRI appears in text...."
Post by Phillips, Addison
I notice that the term "running text" in section 1.3 appears exactly once
in the document and there only provides a sort of informative explanation
of UNIXML.
--
----------------------+---------------------------------------
Reporter: duerst@… | Owner: draft-ietf-iri-3987bis@…
Type: defect | Status: new
Priority: major | Milestone:
Component: 3987bis | Version:
Severity: - | Resolution:
Keywords: |
----------------------+---------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:3>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-13 13:14:10 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

Changes (by adil@…):

* owner: draft-ietf-iri-3987bis@… => adil@…
* status: new => assigned
--
----------------------+-----------------------
Reporter: duerst@… | Owner: adil@…
Type: defect | Status: assigned
Priority: major | Milestone:
Component: 3987bis | Version:
Severity: - | Resolution:
Keywords: |
----------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:4>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-13 13:14:57 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

Changes (by adil@…):

* keywords: => bidi
--
----------------------+-----------------------
Reporter: duerst@… | Owner: adil@…
Type: defect | Status: assigned
Priority: major | Milestone:
Component: 3987bis | Version:
Severity: - | Resolution:
Keywords: bidi |
----------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:5>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-13 13:38:33 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

Changes (by adil@…):

* component: 3987bis => bidi-guidelines
--
-----------------------------+-----------------------
Reporter: duerst@… | Owner: adil@…
Type: defect | Status: assigned
Priority: major | Milestone:
Component: bidi-guidelines | Version:
Severity: - | Resolution:
Keywords: bidi |
-----------------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:6>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-13 14:17:30 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for


Comment (by adil@…):

I think the best description is:
''The Unicode Bidirectional Algorithm is designed for general purpose
text''
--
-----------------------------+-----------------------
Reporter: duerst@… | Owner: adil@…
Type: defect | Status: assigned
Priority: major | Milestone:
Component: bidi-guidelines | Version:
Severity: - | Resolution:
Keywords: bidi |
-----------------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:7>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-03-14 06:06:35 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for


Comment (by duerst@…):

The proposal by Adil ("The Unicode Bidirectional Algorithm is designed for
general purpose text") looks very good to me.

I had entered the component as "3987bis" originally, because there is a
definition and one use of "running text" in 3987bis, too.

In line with Adil's proposal, I propose to change "[UNIXML] is written in
the context of running text rather than in that of identifiers." to
"[UNIXML] is written in the context of general proprose text rather than
in that of identifiers."

There are two things we can do with the definition we currenly have for
running text: Change it to a definition of general purpose text, or remove
it. The changed definition would read:

general purpose text: Human text (paragraphs, sentences,
phrases) with syntax according to orthographic conventions of a
natural language, as opposed to syntax defined for ease of
processing by machines (e.g., markup, programming languages).

Becasue we use the term only once in each of two documents, and because we
use it only in contrast, I propose to remove the definition.
--
-----------------------------+-----------------------
Reporter: duerst@… | Owner: adil@…
Type: defect | Status: assigned
Priority: major | Milestone:
Component: bidi-guidelines | Version:
Severity: - | Resolution:
Keywords: bidi |
-----------------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:8>
iri <http://tools.ietf.org/wg/iri/>
iri issue tracker
2012-10-16 06:59:14 UTC
Permalink
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

Changes (by duerst@…):

* status: assigned => closed
* resolution: => fixed


Comment:

Using "general purpose text" as proposed by Adil. Was is already
implemented in bidi. Also changed in 3987bis, and removed the definition,
as proposed before.
--
-----------------------------+---------------------
Reporter: duerst@… | Owner: adil@…
Type: defect | Status: closed
Priority: major | Milestone:
Component: bidi-guidelines | Version:
Severity: - | Resolution: fixed
Keywords: bidi |
-----------------------------+---------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:9>
iri <http://tools.ietf.org/wg/iri/>
Loading...