C and C++ compilers like GCC first take your code and produce assembly, typically a pure ASCII output (so just basic English characters). This assembly code is a low-level representation of the program, using mnemonic instructions specific to the target processor architecture. The compiler then passes this assembly code to an assembler, which translates it into machine code—binary instructions that the processor can execute directly.
GCC(例如GCC)的C和C ++编译器首先采用您的代码并生产组件,通常是纯ASCII输出(因此只有基本的英语字符)。该组件代码是使用目标处理器体系结构的助记符指令,是该程序的低级表示。然后,编译器将此汇编代码传递给汇编器,该代码将其转换为机器代码 - 处理器可以直接执行的二进制指令。
When compiling code, characters like ‘é’ in strings, such as unsigned char a[] ="é"; , may be represented in UTF-8. The Unicode (UTF-8) encoding for ‘é’ is two bytes, \303\251 . However, when this is represented as an assembly string, it requires 8 characters to express those two bytes (e.g.,"\303\251" ) because the assembly is ASCII. Thus, a single character in source code can expand significantly in the compiled output.
编译代码时,字符中的字符之类的字符,例如无符号char a [] =“é”;,可以在UTF-8中表示。编码为“é”的Unicode(UTF-8)是两个字节\ 303 \ 251。但是,当将其表示为一个汇编字符串时,它需要8个字符来表达这两个字节(例如“ \ 303 \ 251”),因为总成是ASCII。因此,源代码中的单个字符可以在编译的输出中显着扩展。
As a related issue, new versions of C and C++ have an ‘#embed’ directive that allows you to directly embed an arbitrary file in your code (e.g., en image). Such data might be encoded inefficiently as assembly.
作为一个相关问题,C ++的新版本具有“ #embed”指令,该指令允许您将任意文件直接嵌入代码中(例如,EN Image)。此类数据可能会效率低下为组装。
What could you do?
你能做什么?
Base64 is an encoding method that converts binary data into a string of printable ASCII characters, using a set of 64 characters (uppercase and lowercase letters, digits, and symbols like + and /). It is commonly used to represent binary data, such as images or files, in text-based formats like JSON, XML, or emails (MIME).
base64是一种编码方法,它使用一组64个字符(大写和小写字母,数字和符号 +和 /)将二进制数据转换为可打印ASCII字符的字符串。它通常用于以基于文本的格式(例如JSON,XML或电子邮件(MIME))表示二进制数据,例如图像或文件。
When starting from binary data, base64 data expands the data, turning 3 input bytes into 4 ASCII characters. Interestingly, in some cases, base64 can be used for compression purposes. Older versions of GCC would compile
从二进制数据开始时,base64数据将扩展数据,将3个输入字节转换为4个ASCII字符。有趣的是,在某些情况下,Base64可用于压缩目的。旧版本的海湾合作委员会会编译
unsigned char a[] ="éééééééé";
未签名的char a [] =“Ééééééééé”;
to
到
.string"\303\251\303\251\303\251\303\251\303\251\303\251\303\251\303\251"
。
GCC 15 now supports base64 encoding of data during compilation, with a new “base64” pseudo-op. Our array now gets compiled to the much shorter string
GCC 15现在使用新的“ Base64”伪OP支持基本64数据编码。我们的阵列现在被编译为短的字符串
.base64"w6nDqcOpw6nDqcOpw6nDqQA="
。