Jump to content

Bug in ESC procedure

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
9 replies to this topic

#1
Actor

Actor

    Newbie

  • Members
  • PipPip
  • 24 posts
This routine is supposed to implement escape sequences. The ESCAPE char is the BACKSLASH (like in C. Sorry!). When the ESCAPE is encountered during a scan it's deleted and the following char is replaced with something else, e.g., '\t' becomes a TAB. But it doesn't work. The ESCAPE char gets deleted all right, but the following char remains as-is. Why?

I've since rewritten the routine using a different algorithm which works fine, but I'd still like to know what's wrong with this. I've been working on it for a couple of days now with no luck.


     function esc (str : string) : string ;

     {

          ESCAPEs for NUL, TAB, CR, LF, BLANK , etc              

     }

     var

          i : byte ;


     begin

          for i := 1 to length(str) - 1 do begin

               if str[i] = ESCAPE then begin

                    delete (str, i, 1) ;

                    case locase(str[i]) of

                         ZERO : str[i] := NUL ;

                         LETA : str[i] := AMP ;

                         LETB : str[i] := BLANK ;

                         LETC : str[i] := CARET ;

                         LETF : str[i] := FEED ;

                         LETG : str[i] := GREATER ;

                         LETL : str[i] := LESS ;

                         LETP : str[i] := BAR ;   { p for pipe }

                         LETR : str[i] := RETURN ;

                         LETT : str[i] := TAB

                    end { case }

               end

          end ;


          esc := str

     end ;



#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
My best guess is that the variables in your case statement are the problem, but without seeing how they're set up, it's hard to say.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
Firebird_38

Firebird_38

    Programmer

  • Members
  • PipPipPipPip
  • 126 posts
Well, first off, you cannot count (in your for loop) from 1 to length(str)-1.
The reason is that your f**ked. That's why.
Sorry, I mean, you are deleting part of your str. But the Length(str)-1 gets evaluated only once. That means it keeps on counting past the end of Length(str)-1 to its original length. You can fix that part by using a repeat-until or while-do loop (with a manual inc).
What's locase? Does it return a char? I forget. You can't case with strings. Besides that, I see no problem. It should be fine.

Ps, since "delete" is expensive (the string is copied to a newly allocated piece of memory), it's better to have 2 PChar pointers. Both start at the beginning and the first "reads" and the second "writes". You can do that because the string can only shrink in length.

function Unescape(s:string);string; //should be called [B]Un[/B]escape... really
var r,w:PChar;
begin
 if s='' then
  begin
   result:=''; //Must accoutn for special case, because otherwise the PChars will be nil.
   exit;
  end;
 SetLength(s,Length(s)); //this is guaranteed to make the string unique. Strings are reference counted and this could referenced twice. But hereafter it's unique.
 r:=PChar(s);
 w:=r;
 repeat
  w^:=r^;
  if r^='\' then
   begin
    inc(r^);
    case locase(r^) of
     '0' : w^:=#0;
     'A' : W^:='&';
     ...etc...
    end;
   end;
  inc(r);
  inc(w);
 until r^=#0;
 //chop the string to its rightful length
 SetLength(s,w-PChar(s));
 result:=s; //this does NOT copy the string in memory, it only copies a reference to it.
end;

This sucker is a whole bunch faster. Try it on a huge string and compare with your version. Also, depending on what (working) version you have now, you may want to replace it (also depends on how much the function is used, of course).

Good luck.

Posted Image


#4
Actor

Actor

    Newbie

  • Members
  • PipPip
  • 24 posts

Firebird_38 said:

the Length(str)-1 gets evaluated only once.
True, but it doesn't matter because...

Quote

That means it keeps on counting past the end of Length(str)-1 to its original length.
...everything past length(str) is a null character chr(0), meaning there are no ESCAPES out there. This is an undocumented feature which I only discovered by experimentation.

Quote

You can fix that part by using a repeat-until or while-do loop (with a manual inc).
Already tried that. The bug remains.

Quote

What's locase? Does it return a char? I forget.
Just like upcase only it converts to lower case from upper case instead of the other way around. It returns a char.

Quote

You can't case with strings.
If you examine the code you'll see that I'm not casing with strings but with chars. If I were casing with strings, it would not even compile.

#5
Actor

Actor

    Newbie

  • Members
  • PipPip
  • 24 posts

Firebird_38 said:

Ps, since "delete" is expensive (the string is copied to a newly allocated piece of memory), it's better to have 2 PChar pointers.
My code has to be compatible with 5.5. PChar was only implemented with 7.0. Anyway, I'd rather avoid the complexity that goes along with pointers.

#6
Firebird_38

Firebird_38

    Programmer

  • Members
  • PipPipPipPip
  • 126 posts
What are you asking for?
Seems you know it all.
Seriously, seems you know what you're talking about.

I figure it has something to do with deleting or the for loop variable. Could be an internal bug.
BYW, you don't need PChar. You can easily use indexes as follows:
function Unescape(s:string);string; //should be called [B]Un[/B]escape... really
var r,w:Integer;
begin
 if s='' then
  begin
   result:=''; //Must account for special case, because otherwise the PChars will be nil. [B]Doesn't count for ShortStrings (the only option in Turbo 5.5)
[/B]   exit;
  end;
 SetLength(s,Length(s)); //this is guaranteed to make the string unique. Strings are reference counted and this could referenced twice. But hereafter it's unique.
r:=1
w:=r;
 repeat
  s[w]:=s[r];
  if s[r]='\' then
   begin
    inc(r);
    if r>Length(s) then break; //the last char could be a slash!
    case locase(s[r]) of
     '0' : s[w]:=#0;
     'A' : s[w]:='&';
     ...etc...
    end;
   end;
  inc(r);
  inc(w);
 until r>Length(s);
 //chop the string to its rightful length
 SetLength(s,w);
 result:=s; //this does NOT copy the string in memory, it only copies a reference to it. It DOES copy it under Turbo 5.5 (also, Turbo doesn't have "result").
//Use "var s:string" and a procedure instead in Turbo. Superfast.
end;

Posted Image


#7
Actor

Actor

    Newbie

  • Members
  • PipPip
  • 24 posts

Firebird_38 said:

What are you asking for?
Well, as I said in my original post I've found a different algorithm that does the job, so I've moved on.

However, it bugs me (no pun intended) that I can't figure out why my original code doesn't work. It seems like it should.

I do appreciate your input. Your comment that length(str) is computed only once was a revelation. Somewhere around here I have a textbook that says that

     for i := 1 to length(s) do begin

          {

           body

          }

     end ;

is equivalent to

     i := 1 ;

     while i <= length(s) do begin

          {

          body

          }

          i := i + 1

     end ;

If this is so then length(s) is tested each time through the loop. After I first read your post I wrote several short programs to test whether this is true. It seems you are correct. For years I've been programming as though the opposite was true, with no ill effect up to now. It's always good to learn something new. Thank you.

Here's the alternate algorithm which works.

[FONT="Courier New"]

const

     NUL = chr(0) ;

     LASCII = chr(255) ;

     ZERO  = '0' ; SIX = '6' ; SEVEN = '7' ;

     LETB = 'b' ; LETF = 'f' ; LETG = 'g' ; LETL = 'l' ; LETP = 'p' ; LETR = 'r' ; LETT = 't' ;

     CARET = '^' ; AMP = '&', BLANK = chr(32) ; FEED = chr(10) ; GREATER = '>' ; LESS = '<' ;

     BAR = '|' ' ; RETURN = chr(13) ; TAB = chr(9) ;


     function esc (str : string) : string ;

     {

          ESCAPEs for NUL, TAB, CR, LF, BLANK, etc               

     }

     var

          i : byte ;

          ch : char ;

          map : array[NUL .. LASCII] of char ;


     begin

          {

               make map

          }

          for ch := NUL to LASCII do

               map[ch] := ch ;    { every char maps to itself }

          {

          re-map the exceptions

          }

          map[ZERO] := NUL ;

          map[SIX] := CARET ;

          map[SEVEN] := AMP ;

          map[LETB] := BLANK ;

          map[LETF] := FEED ;

          map[LETG] := GREATER ;

          map[LETL] := LESS ;

          map[LETP] := BAR ;       { 'p' for pipeline }

          map[LETR] := RETURN ;

          map[LETT] := TAB ;

          {

               translate

          }

          i := 1 ;

          while i < length(str) do begin

               if str[i] = ESCAPE then begin

                    delete(str, i, 1) ;

                    str[i] := map[str[i]]

               end ;

               i := i + 1

          end ;


          esc := str

     end ;

[/FONT]


#8
Firebird_38

Firebird_38

    Programmer

  • Members
  • PipPipPipPip
  • 126 posts
Mapping is fast.
Use "#10" instead of "chr(10)" if you want (pinting out your options :)) also #13, #255 etc (or even #$FF. These are char constants, as opposed to function results (chr is a function that works at compile time))

Posted Image


#9
Actor

Actor

    Newbie

  • Members
  • PipPip
  • 24 posts
To me chr(10) is more readable than #10. Readability is the whole point of defining these SYMBOLIC CONSTANTS. If chr() works at compile time instead of run time, what's the difference? Even if it works at run-time, it's only set-up, which goes by at a blink at start up on even the slowest computers.

#10
Firebird_38

Firebird_38

    Programmer

  • Members
  • PipPipPipPip
  • 126 posts
I was just sayin'...

Posted Image