Ask coding questions

← Back to all posts
How to change encoding to UTF-16?
coosucks

Whenever I make a repl of HTML (or CSS, etc.) it says <meta charset="utf-8"> within the file. But I would not like UTF-8 in any of my files. I believe it to be a result of the horrifying legacy of Unix text processing, and I despise Unix for its unusable text processing. I would like all of my text including html sources to be in UTF-16. Where is the option to remove UTF-8 and use UTF-16 instead? CRLF newlines as well.

Comments
hotnewtop
AlexHsi

You can use Notepadqq in Ubuntu.

coosucks

@AlexHsi How is that even relevant? Do you think there is Ubuntu in Replit?

InvisibleOne

You can't.

coosucks

@InvisibleOne Yes, I can. I have done UTF-16 websites before on another service. And UTF-16 is a fundamental text encoding.

ReallyBasic

See:
https://stackoverflow.com/questions/50385123/can-we-use-utf-16-in-meta-tag-in-html
https://stackoverflow.com/questions/496321/utf-8-utf-16-and-utf-32
Summary:

The HTML5 specification forbids the use of the meta element to declare UTF-16, because the values must be ASCII-compatible. Instead you should ensure that you always have a byte-order mark at the very start of a UTF-16 encoded file. In effect, this is the in-document declaration. ~ w3.org

coosucks

@ReallyBasic I don't care. I would like UTF-16-CRLF editing. I consider Unicode mutually exclusive with ASCII compatibility. The byte order mark is always optional in Unicode text-based files, though recommended.

ReallyBasic

What it is saying is, by putting a byte-order mark at the start of the UTF encoded file, it is the same as the <meta charset="utf-8"> in the header.
@coosucks

coosucks

@ReallyBasic I want it to be UTF-16 instead. And placing a 16-bit integer of 0xFEFF at the start of the file is not UTF-8, it's UTF-16.

ReallyBasic

That's what I mean. It will be UTF-16. @coosucks

coosucks

@ReallyBasic And I would like a setting to get to UTF-16-CRLF in the editor.

ReallyBasic

I don't think you can. Sorry :/ @coosucks

coosucks

@ReallyBasic So I feature request it. There should be a UTF16-CRLF mode (where it uses UTF-16, in native endianness, with CRLF newlines, may have a BOM), and an ASCII-CRLF mode (it has single byte characters and only allows bytes from 00 to 7F, allowing the use of it in cross-platform C and C++ code). Perhaps also an option to select the default encoding for each type (I personally would use UTF16-CRLF for txt, html, htm, and use ASCII-CRLF for c, cpp, h)

coosucks

@ReallyBasic Look at https://replit.com/@coosucks/sorting?v=1 . All I did was make it Unicode plaintext, and yet the code preview is extremely glitchy (making it impossible to edit in the built-in editor), though the website works correctly.