Decoding unicode decimal codes

Hello,

We have build a solution for an customer in CRMScript that read an RSS feed from a WordPress site, the title of articles from this RSS feed contains codes such as '̵', '‘', '’', etc.

I have tried the various .%decode() methods that are available on a string but none can decode these values. Is there support for this in CRMScript?

 

RE: Decoding unicode decimal codes

Hi David, 

could you elaborate a bit more how you get the RSS feed?

I guess you're using the HTTP class to get it? Could you provide a small code example?

 

If I try to print the codes, I get the values perfectly, so there might be some encoding issues.

Example:

String text = "̵, ‘, ’";

print(text);

Gives me the values:

̵, ‘, ’

 

Have you tried sending UTF-8 encode with the HTTP class?

Von: Simen Mostuen Iversen 18. Jan 2021

RE: Decoding unicode decimal codes

Hello Simen,

We retrieve the RSS feed using the following code:

    HTTP http;

    NSStream data = http.openAsStream(feedUrl);

    if (http.hasError())
    {
        log("Error getting feed:");
        log(http.getErrorMessage());
      
      	throw "Error getting feed: " + http.getErrorMessage();
    }
    else
    {
      	try
        {	
			XMLNode xml = parseXML(String(data.GetStream()));

			// parse xml
        }
        catch
        {
            printLine("Exception caught: " + error);
            printLine("...at " + errorLocation);
        }
    }
Von: David Hollegien 19. Jan 2021

RE: Decoding unicode decimal codes

Is there any functionality to decode these codes in CRMScript?

I have since my last post added the following:

http.addHeader("Content-Type", "application/rss+xml; charset=UTF-8");

// and then decode the utf8 characters
String text = String(data.GetStream());
XMLNode xml = parseXML(text.utf8Decode());

This fixes some of the other encoding issues but still shows values like '–'.

 

Note: this also happens when you request the rss feed manually using the browser, so I don't think we are retrieving it wrong

Von: David Hollegien 2. Feb 2021

RE: Decoding unicode decimal codes

– is an HTML entity encoding of a unicode code point. Rendering it is up to the browser.

If you wanted to normalize it you need to replace with a unicode character 

Von: Christian Mogensen 2. Feb 2021