How to read an ELF file. ELF and PE EXE

In this review, we will only talk about the 32-bit version of this format, for 64-bit us for nothing.

Any ELF format file (including object modules of this format) consists of the following parts:

  • Header ELF file;
  • Table of program sections (in object modules may not be absent);
  • ELF file sections;
  • The sections table (in the module performed may be missing);
  • For performance in ELF format, bit fields are not used. And all structures are usually aligned with 4 bytes.

Now consider the types used in the headlines of the ELF files:

Now consider the file header:

#define EI_NIDENT 16 struct elf32_hdr (unsigned char e_ident; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; / * Entry point * / Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; ELF32_HALF E_SHNUM; ELF32_HALF E_SHSTRNDX;);

The E_IDENT array contains information about the system and consists of several undercases.

Struct (unsigned char Ei_Magic; unsigned char ei_class; unsigned char Ei_Data; unsigned char Ei_Version; unsigned char Ei_Pad;)

  • eI_Magic - a constant value for all ELF files equal to (0x7F, "E", "L", "F")
  • eI_CLASS - ELF file class (1 - 32 bits, 2 - 64 bits that we do not consider)
  • eI_DATA - determines the order of the byte for this file (This order depends on the platform and can be direct (LSB or 1) or reverse (MSB or 2)) for intel processors Permissible only value 1.
  • ei_Version is a fairly useless field, and if not equal to 1 (EV_CURRENT), the file is considered incorrect.

In the EI_PAD field, operating systems store their identification information. This field can be empty. For us, it is also not important.

The E_Type header field may contain several values \u200b\u200bfor executable files it should be ET_EXEC equal to 2

e_MACHINE - Defines the processor on which this executable file can work (for us is permissible to EM_386 equal to 3)

The E_VERSION field matches the EI_VERSION field from the header.

The E_ENTRY field defines the starting address of the program, which is located in EIP before the program starter.

The E_PHOFF field determines the offset from the start of the file, which is located the software sections table used to download programs in memory.

I will not list the assignment of all fields, not everything is needed for download. Only two more I will describe.

The E_phentsize field defines the write size in the program sections table.

And the E_phnum field defines the number of records in the program sections table.

Section table (not software) is used to link programs. We will not consider it. We will not consider dynamically linquent modules. This topic is quite complicated, for the first acquaintance is not suitable. :)

Now about the program sections. The format for recording the table of the program sections is:

STRUCT ELF32_PHDR (ELF32_WORD P_TYPE; ELF32_OFF P_OFFSET; ELF32_ADDR P_VADDR; ELF32_ADDR P_PADDR; ELF32_WORD P_FILESSZ; ELF32_WORD P_MEMSZ; ELF32_WORD P_FLAGS; ELF32_WORD P_ALIGN;);

Read more about the fields.

  • p_TYPE - Specifies the type of program section. It can take several values, but we are interested only in one thing. PT_LOAD (1). If the section is this type, it is designed to download in memory.
  • p_OFFSET - Defines the offset in the file from which this section begins.
  • p_VADDR - Specifies the virtual address for which this section must be loaded into memory.
  • p_paddr - determines the physical address by which it is necessary to load this section. This field should not be used and makes sense only for some platforms.
  • p_FILESZ - Specifies the section size in the file.
  • p_memsz - Specifies the size of the section in memory. This value may be greater than the previous one. The P_FLAG field defines the type of access to sections in memory. Some sections are allowed to perform some record. For reading B. existing systems Everything is available.

Loading ELF format.

With the title we figured out a little. Now I will give the download algorithm binary file. ELF format. The algorithm is schematic, do not consider it as a working program.

INT LOADELF (STRUCT ELF32_HDR * EH \u003d (STRUCT ELF32_HDR *) BIN; STRUCT ELF32_PHDR * EPH; if (EH-\u003e E_IIDENT! \u003d 0x7F || // control Magic Eh-\u003e E_Ident! \u003d "E" || EH-\u003e E_IIDENT! \u003d "L" || EH-\u003e E_IIDENT! \u003d "F" || EH-\u003e E_IIDENT! \u003d ELFCLASS32 || // Control the EH-\u003e E_IDENT class \u003d ELFDATA2LSB || // order byte EH-\u003e E_IIDENT! \u003d EV_CURRENT || // version EH-\u003e E_TYPE! \u003d ET_EXEC || // Type Eh-\u003e E_MACHINE! \u003d EM_386 || // EH-\u003e E_VERSion platform! \u003d EV_CURRENT) // and again version, Just in case Return Elf_wrong; Eph \u003d (struct ELF32_PHDR *) (BIN + EH-\u003e E_PHOFF); While (EH-\u003e E_PHNUM--) (if (EPH-\u003e P_TYPE \u003d\u003d PT_LOAD) MEMCPY (EPH-\u003e P_VADDR, BIN + EPH-\u003e P_OFFSET, EPH-\u003e P_FILESZ); EPH \u003d (STRUCT ELF32_PHDR *) ((unsigned char *) EPH + EH-\u003e E_PHENTSIZE));) Return Elf_ok; )

According to serious, it is still necessary to analyze the EPH-\u003e P_FLAGS fields, and put the access rights to the corresponding pages, and simply the copying is not suitable here, but it no longer belongs to the format, but to the memory distribution. Therefore, now we will not talk about it.

PE format.

In many respects, it is similar to the ELF format, well, it is not surprising, there are also the sections available for download.

Like all in Microsoft :) PE format is based on EXE format. The file structure is as follows:

  • 00h - exe header (I will not consider it, he is old as DOS. :)
  • 20h - OEM title (there is nothing significant in it);
  • 3Sh - Real PE offset in the file (DWORD).
  • stub movement table;
  • stub;
  • PE header;
  • table of objects;
  • file objects;

stub is a program that is performed in real mode and producing any preliminary actions. Maybe there is no, but sometimes it can be needed.

We are interested in a little different, PE header.

The structure is such:

Struct pe_hdr (unsigned long pe_sign; unsigned short pe_cputype; unsigned short pe_objnum; unsigned long pe_time; unsigned long pe_cofftbl_off; unsigned long pe_cofftbl_size; unsigned short pe_nthdr_size; unsigned short pe_flags; unsigned short pe_magic; unsigned short pe_link_ver; unsigned long pe_code_size; unsigned long pe_idata_size ; unsigned long pe_udata_size; unsigned long pe_entry; unsigned long pe_code_base; unsigned long pe_data_base; unsigned long pe_image_base; unsigned long pe_obj_align; unsigned long pe_file_align; // ... there is still a lot of things, does not matter).;

Many things are there. Suffice it to say that the size of this title is 248 bytes.

And the main thing is that most of these fields are not used. (Who builds so?) No, they, of course, have an assignment, quite known, but my test program, for example, in the PE_Code_Base fields, PE_Code_Size and TD contains zeros, but it works great. It suggests that the file load is based on the object table. Here we will talk about her.

The object table follows immediately after the header PE. The entries in this table have the following format:

STRUCT PE_OHDR (unsigned char o_name; unsigned long o_vaddr; unsigned long o_psize; unsigned long O_POFF; unsigned char o_reserved; unsigned long o_flags;);

  • o_Name - the name of the section, it is absolutely indifferent to download;
  • o_VSIZE - section size in memory;
  • o_VADDR - address in memory relative to ImageBase;
  • o_psize - section size in the file;
  • o_POFF - section offset in the file;
  • o_Flags - Section Flags;

Here on the flags it is worth staying in more detail.

  • 00000004H - used for code with 16 bit displacements
  • 00000020H - Code Section
  • 00000040H - Initialized Data Section
  • 00000080H - section of uninitialized data
  • 00000200H - Comments or any other type of information
  • 00000400H - overlay section
  • 00000800H - will not be part of the image of the program
  • 00001000h - General data
  • 00500000h - default alignment, if not specified other
  • 02000000H - can be unloaded from memory
  • 04000000H - not cached
  • 08000000H - not subjected to page transformation
  • 10000000H - shared
  • 20000000H - executable
  • 40000000h - you can read
  • 80000000H - you can write

Again, I will not with shared and overlay sections, we are interested in code, data and access rights.

In general, this information is already enough to download the binary file.

Loading PE format.

Int Loadpe (Struct Elf32_HDR * pH \u003d (struct PE_HDR *) (BIN + * ((unsigned long *) & bin)); // Of course, the combination is not from understandable ... just take DWord on 0x3C displacement / / And calculate the header's PE address in the image of the STRUCT ELF32_PHDR * POH file; if (pH \u003d\u003d null || // control PH-\u003e PE_SIGN pointer! \u003d 0x4550 || // PE signature ("P", "E", 0, 0) PH-\u003e PE_CPUTYPE! \u003d 0x14C || // i386 (PH-\u003e PE_FLAGS & 2) \u003d\u003d 0) // file cannot be launched! Return PE_WRONG; POH \u003d (struct PE_OHDR *) ((unsigned char *) pH + 0xF8); While (PH-\u003e PE_OBJ_NUM--) (IF ((POH-\u003e P_FLAGS & 0X60)! \u003d 0) // or code or initialized Memcpy data (PE-\u003e PE_IMAGE_BASE + POH-\u003e O_VADDR, BIN + POH- \u003e O_POFF, POH-\u003e O_PSIZE); POH \u003d (STRUCT PE_OHDR *) ((unsigned char *) POH + SIZEOF (STRUCT PE_OHDR));) Return PE_OK;)

This is again not a ready program, but the load algorithm.

And again, many moments are not covered, as they go beyond the topic.

But now it's worth a little talk about existing systemic features.

System features.

Despite the flexibility of the protection tools available in the processors (protection at the level of descriptors, protection at the level of segments, page level protection) in existing systems (both in Windows and in UNIX) is full of only page protection, which although Recove the code from recording, but cannot save data from execution. (Maybe with this, the abundance of systems vulnerabilities is connected?)

All segments are addressed from the zero line address and extend to the end of linear memory. The distinction of the processes is made only at the level of page tables.

In this regard, all modules are linked not from the initial addresses, but with a fairly large displacement in the segment. The basic address in the segment is used in Windows - 0x400000, in Unix (Linux or FreeBSD) - 0x8048000.

Some features are also associated with the Memory Status.

ELF files are linked in such a way that the boundaries and dimensions of the sections fall on 4 kilobyte file blocks.

And in PE format, despite the fact that the format itself allows you to align the sections by 512 bytes, the alignment of sections on 4K is used, less alignment in Windows is not considered correct.

ELF format

ELF format has several types of files that have so far called differently, for example, executable file. or object file. Nevertheless, the ELF standard distinguishes the following types:

1. File moveable Relocatable File, storing instructions and data that can be related to other object files. The result of such binding can be executable file or shared object file.

2. Shared object file Shared Object File) also contains instructions and data, but can be used in two ways. In the first case, it can be associated with other moved files and shared object files, as a result a new object file will be created. In the second case, when the program starts, the operating system can dynamically associate it with the executable program file, as a result of which the executable image of the program will be created. In the latter case, we are talking about shared libraries.

3. Executable file. Stores a full description, allowing the system to create a process image. It contains instructions, data, a description of the necessary shared object files, as well as the necessary symbolic and debugging information.

In fig. 2.4 The structure of the executable file is given by which the operating system can create an image of the program and run the program to execute.

Fig. 2.4.. Structure of the executable file in ELF format

The title has a fixed location in the file. The remaining components are placed in accordance with the information stored in the title. Thus, the header contains a general description of the file structure, the location of individual components and their dimensions.

Since the ELF file header defines its structure, consider it in more detail (Table 2.4).

Table 2.3. ELF file header fields

Field Description
e_ident An array of bytes, each of which defines some overall characteristics File: File format (ELF), version number, system architecture (32-bit or 64-bit), etc.
E_Type. File type, since the ELF format supports several types
E_machine The architecture of the hardware platform for which this file is created. In tab. 2.4 The possible values \u200b\u200bof this field are given.
E_VERSion. ELF format version number. Usually defined as EV_CURRENC (current), which means last version
E_ENTRY The virtual address by which the system will be managed after downloading the program (input point)
E_PHOFF. Location (offset from the start of the file) Tables of program headers
E_Shoff. Location of the Table of Headers of Sections
E_EHSIZE. Headline size
E_PHENTSIZE. Size of each header program
E_phnum. Number of program headers
E_SHENTSIZE. Size of each segment header (section)
E_Shnum Number of segment headers (sections)
E_shstrndx The location of the segment containing the string table

Table 2.4.. E_MACHINE fields of the ELF file header

Value Hardware platform
EM_M32 AT & T WE 32100
EM_Sparc. Sun Sparc.
EM_386 Intel 80386.
EM_68K. Motorola 68000.
EM_88K. Motorola 88000.
EM_486. Intel 80486.
EM_860 Intel i860.
EM_MIPS MIPS RS3000 Big-Endian
EM_MIPS_RS3_LE MIPS RS3000 LITTLE-ENDIAN
EM_RS6000. RS6000.
EM_PA_RISC. PA-RISC.
EM_NCUBE. ncube.
EM_VPP500. Fujitsu VPP500.
EM_SPARC32Plus. SUN SPARC 32+

The information contained in the program header table indicates the kernel as creating an image of the process from segments. Most segments are copied (displayed) into memory and represent the corresponding process segments when executed, for example, code or data segments.

Each header of the program segment describes one segment and contains the following information:

The type of segment and operation of the operating system with this segment

The location of the segment in the file

Starting address of the segment in the process virtual memory

Segment size in the file

Segment size in memory

Segment access flags (recording, reading, execution)

Part of the segments has a type of LOAD, prescribing the kernel when the program is started to create the corresponding data structure segments, called these segments regionsdefining continuous sections of the process virtual memory and associated attributes. The segment whose location in the ELF file is specified in the corresponding header of the program, will be displayed in the created area, the virtual address of the beginning of which is also specified in the program header. Segments of this type include, for example, segments containing the instructions of the program (code) and its data. If the size of the segment is smaller than the size of the area, the unused space can be filled with zeros. Such a mechanism, in particular, is used when creating uninitialized process data (BSS). We will talk more about the areas in chapter 3.

Intern segment is stored software interpreter. This type The segment is used for programs that need dynamic binding. The essence of dynamic binding is that the individual components of the executable file (separable object files) are connected not at the compilation stage, but at the stage of starting the program to execute. File name that is dynamic editor of connections, stored in this segment. In the process of starting the program to execute the kernel, the image image creates using the specified link editor. Thus, the initial program is initially loaded, but a dynamic editor of the links. At the next stage, the dynamic editor of the links together with the Unix kernel creates a complete image of the executable file. The dynamic editor loads the necessary shared object files whose names are stored in separate segments of the original executable file, and produces the required placement and binding. In conclusion, the control is transmitted by the original program.

Finally completes the header table file sections or sections (Section). Sections (sections) define the sections of the file used to bind to other modules during the compilation process or with dynamic binding. Accordingly, the headlines contain all necessary information To describe these sections. Typically, sections contain more detailed information about segments. For example, the code segment may consist of several partitions, such as a hash table for storing indexes used in the character program, the program of the initialization code of the program, the binding table used by the dynamic editor, as well as a section containing the program instruction itself.

We will return to the ELF format in Chapter 3 when discussing the organization's virtual memory organization, but for now, we proceed to the next common format - COFF.

From book Programming Art for Unix Author Reyond Eric Stephen

From the book Tutorial work on the computer Author Kolisnichenko Denis Nikolaevich

From the book Abstract, term, diploma on the computer Author Balovsyk Nadezhda Vasilyevna

5.2.6. Windows INI format many programs in Microsoft Windows. Used text format Data similar to the fragment given in Example 5.6. IN this example Optional resources with accounts Account, Directory, Numeric_ID and Developer are associated with named Python, SNG, F Etchmail and Py-HOWTO projects. In recording

From the book the newest tutorial of the computer Author Beluncov Valery

14.5.3. Cell format The format sets how the cell value will be displayed. The format is closely related to the cell data type. Type you specify you yourself. If you entered a number, then this is a numeric data type. Excel itself is trying to determine the data type format. For example, if you have entered the text, then

From book Programming Art for Unix Author Reyond Eric Stephen

PDF PDF format is decrypted as Portable Document Format (portable document format). This format was created specifically to eliminate problems with displaying information in files. Its advantage is that, first, a document saved in pDF formatwill be equally

From the TCP / IP architecture, protocols, implementation (including IP version 6 and IP Security) author Faith Sidney M

Format File When the user starts working with any file, the system needs to know, in which format it is recorded and with which program it needs to be opened. For example, if the file contains ordinary text, it can be read in any text program

From the book Yandex for all author Abramon M. G.

5.2.2. RFC 822 Meta Format RFC 822 comes from text message format email Internal. RFC 822 is the main Internet RFC standard describing this format (subsequently replaced RFC 2822). MIME Format (Multipurpose Internet Media Extension - Multipurpose Internet Extensions)

From the book Macromedia Flash Professional 8. Graphics and Animation Author Drontov V. A.

5.2.3. Cookie-Jar format Cookie-Jar Format is used by the Fortune (1) program for its own random quotation database. It is suitable for records that are simply unstructured blocks. As a separator of records in this format Applied symbol

From the book Computer Processing Sound Author Zagumenov Alexander Petrovich

5.2.4. Record-Jar format Cookie-Jar format dividers are well combined with RFC 822 meta format for records formatting the format that is called "Record-Jar" in this book. Sometimes a text format is required that supports multiple entries with a different set of explicit names.

From book Operating system Unix Author Robachevsky Andrey M.

5.2.6. Windows INI format Many programs in Microsoft Windows use text data format similar to the fragment given in Example 5.6. In this example, optional resources with ACCOUNT, Directory, Numeric_ID and Developer are associated with named Python, SNG, Fetchmail and Py-HOWTO projects. In recording

From the book office computer for women Author Pasternak Evgenia

19.5 Generalized URL format summarizing the foregoing, we note that :? The URL begins with the instructions of the access protocol used.? For all applications, except network news and email, follows the separator: //.? Then the server host name is specified.? Finally

From the book of the author

3.3.1. The RSS format read the news sites can be different. The easiest way is to go from time to time to the site and view new messages. You can put a program that connects to the news channel and the headlines or the news annotations itself are

From the book of the author

MP3 format MP3 format was created to distribute music files compressed by the MPEG 1 Level code. 3. Currently, the most popular format for distributing music via the Internet, and not only. Supported by absolutely all sound recording and processing programs, for

From the book of the author

Format MP3 sound compression method, as well as compressed format sound FilesThe Moving Pictures Experts Group (Moving Pictures Experts Group is a video recording expert group) is based on perceptual sound coding. Work on the creation of effective coding algorithms

From the book of the author

ELF format ELF format has several types of files that have so far called differently, for example, executable file or object file. Nevertheless, the ELF standard distinguishes the following types: 1. Relocatable File, storing instructions and data that may be

From the book of the author

The format of the numbers was finally reached to the format of numbers. I have already mentioned him again, now I will decompose everything on the shelves (although the general meaning you could already understand). The calculus in Excel can be displayed in various formats. In this section, we will talk about what the formats of numbers exist and how

Version of this answer with good Toc and large amounts of content: http://www.cirosantilli.com/elf-hello-world (click here 30K CHAR limit)

Standards

ELF sets LSB:

  • core Generic: http://refspecs.linuxfoundation.org/lsb_4.1.0/lsb-core-generic/lsb-core-generic/elf-regneric.html
  • core AMD64: http://refspecs.linuxfoundation.org/lsb_4.1.0/lsb-core-amd64/lsb-core-amd64/book1.html

LSB mainly refers to other standards with minor expansions, in particular:

    generic (Both SCO):

    • System V ABI 4.1 (1997) http://www.sco.com/developers/devspecs/gabi41.pdf, not 64 bits, although the magic number is reserved for it. The same for basic files.
    • System V ABI UPDATE DRAFT 17 (2003) http://www.sco.com/developers/Gabi/2003-12-17/contents.html adds 64 bits. Only updates chapters 4 and 5 of the previous document: the rest remain in strength and still refer.
  • specific architecture:

    • IA-32: http://refspecs.linuxfoundation.org/lsb_4.1.0/lsb-core-ia32/lsb-core-ia32/elf-pore32.html indicates mainly at http://www.sco.com/developers /Devspecs/abi386-4.pdf.
    • AMD64: http://refspecs.linuxfoundation.org/lsb_4.1.0/lsb-core-amd64/lsb-core-amd64/elf-amd64.html, mainly indicates http://www.x86-64.org/ Documentation / ab.pdf.

Convenient resume can be found at:

Its structure can be considered using convenient methods such as READELF and OBJDUMP.

Create example

Let me break the minimum executable example of Linux x86-64:

Section .data Hello_World DB "Hello World!", 10 Hello_World_len Equ $ - Hello_World Section .Text Global _START _START: MOV RAX, 1 MOV RDI, 1 MOV RSI, Hello_World Mov RDX, Hello_World_len Syscall Mov Rax, 60 MOV RDI, 0 Syscall

Compiled with help

NASM -W + ALL -F ELF64 -O "Hello_World.o" "Hello_World.asm" LD -O "Hello_World.out" "Hello_World.o"

  • NASM 2.10.09
  • Binutils version 2.24 (contains LD)
  • Ubuntu 14.04

We do not use the program on C, as this will complicate the analysis that will be 2: -)

hexadecimal submissions binary

HD Hello_World.o HD Hello_World.out

Global file structure

The ELF file contains the following parts:

  • ELF header. Specifies the section title table position and program header table.

    The partition title table (optional in the executable file). Each of them has the headlines of the E_Shnum sections, each of which indicates the position of the section.

    N sections with n<= e_shnum (необязательно в исполняемом файле)

    Table of program headers (only for executable files). Each of them has an E_phnum program headers, each of which indicates the position of the segment.

    N segments, with n<= e_phnum (необязательно в исполняемом файле)

The order of these parts is not fixed: the only fixed thing is the ELF header, which must be the first in the file: General documents say:

Header ELF.

The easiest way to watch the title:

READELF -H Hello_World.o Readelf -h Hello_World.out

Byte in the object file:

00000000 7F 45 4C 46 02 01 01 00 00 00 00 00 00 00 00 00 | .ELF ............ | 00000010 01 00 3E 00 01 00 00 00 00 00 00 00 00 00 00 00 | ..\u003e ............. | 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | [Email Protected]| 00000030 00 00 00 00 40 00 00 00 00 00 40 00 07 00 03 00 |[Email Protected]@.....|

00000000 7F 45 4C 46 02 01 01 00 00 00 00 00 00 00 00 00 | .ELF ............ | 00000010 02 00 3E 00 01 00 00 00 B0 00 40 00 00 00 00 00 | ..\u003e [Email Protected]| 00000020 40 00 00 00 00 00 00 00 10 01 00 00 00 00 00 00 |@...............| 00000030 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00 |[Email Protected]@.....|

Presented Structure:

Typedef struct (unsigned char e_ident; Elf64_Half e_type; Elf64_Half e_machine; Elf64_Word e_version; Elf64_Addr e_entry; Elf64_Off e_phoff; Elf64_Off e_shoff; Elf64_Word e_flags; Elf64_Half e_ehsize; Elf64_Half e_phentsize; Elf64_Half e_phnum; Elf64_Half e_shentsize; Elf64_Half e_shnum; Elf64_Half e_shstrndx;) Elf64_Ehdr;

Manual disintegration:

    0 0: Ei_mag \u003d 7F 45 4C 46 \u003d 0x7F "E", "L", "F": Magic number ELF

    0 4: EI_CLASS \u003d 02 \u003d Elfclass64: 64-bit elf

    0 5: Ei_Data \u003d 01 \u003d ElfData2LSB: Bolshoi data

    0 6: Ei_Version \u003d 01: Format version

    0 7: Ei_osabi (only in 2003) \u003d 00 \u003d Elfosabi_None: no extensions.

    0 8: EI_PAD \u003d 8X 00: Reserved bytes. It must be installed in 0.

    1 0: E_Type \u003d 01 00 \u003d 1 (Big Endian) \u003d ET_REL: Movered format

    In the executable file 02 00 for ET_EXEC.

    1 2: E_MACHINE \u003d 3E 00 \u003d 62 \u003d EM_X86_64: AMD64 architecture

    1 4: E_Version \u003d 01 00 00 00: Must be 1

    1 8: E_ENTRY \u003d 8X 00: PRINCE POINT POINT, OR 0, if applicable, as for the object file, because there is no entry point.

    In the executable file it is B0 00 40 00 00 00 00 00. TODO: What else can we install? The kernel seems to put the IP directly into this value, it is not rigidly programmed.

    2 0: E_PHOFF \u003d 8X 00: Displacement of the program title table, 0, if not.

    40 00 00 00 in the executable file, that is, it starts immediately after the ELF header.

    2 8: E_SHOFF \u003d 40 7x 00 \u003d 0x40: Displacement of the section Table file, 0, if not.

    3 0: E_FLAGS \u003d 00 00 00 00 TODO. Especially for Arch.

    3 4: E_EHSIZE \u003d 40 00: The size of this elf header. Why is this field? How can this change?

    3 6: E_PHENTSIZE \u003d 00 00: Size of each program header, 0, if not.

    38 00 in executable file: the length of the file is 56 bytes

    3 8: E_phnum \u003d 00 00: Number of program title entries, 0, if not.

    02 00 In the executable file: there are 2 entries.

    3 A: E_SHENTSIZE and E_SHNUM \u003d 40 00 07 00: section title size and number of records

Table title sections

Array of ELF64_SHDR structures.

Each entry contains metadata about this section.

e_SHOFF The ELF header gives the starting position here, 0x40.

e_SHENTSIZE and E_Shnum from the ELF header say that we have 7 records, each 0x40 long.

Thus, the table takes bytes from 0x40 to 0x40 + 7 + 0x40 - 1 \u003d 0x1FF.

The names of some sections are reserved for certain types of sections: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections, for example. .text required type SHT_PROGBITS and SHF_ALLOC + SHF_EXECINSTR

rEADELF -S Hello_World.o:

There are 7 section headers, starting at offset 0x40: Section Headers: Name Type Address Offset Size EntSize Flags Link Info Align [0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [1] .data PROGBITS 0000000000000000 00000200 0000000000000000 000000000000000d WA 0 0 4 [ 2] .text PROGBITS 0000000000000000 00000210 0000000000000027 0000000000000000 AX 0 0 16 [3] .shstrtab STRTAB 0000000000000000 00000240 0000000000000032 0000000000000000 0 0 1 [4] .symtab SYMTAB 0000000000000000 00000280 0000000000000018 00000000000000a8 5 6 4 [5] .strtab STRTAB 00000330 0000000000000000 0000000000000034 0000000000000000 0 0 1 [6] .rela.text Rela 000000000000000000370 00000000000018 00000000000018 4 2 4 KEY TO FLAGS: W (WRITE), A (Alloc), X (Execute), M (Merge), S (Strings), L (LARGE) I (Info), L (Link Order), G (Group), T (TLS), E (EXCLUDE), X (Unknown) O (Extra OS Processing Required) O (OS SPECIFIC), P (Processor Specific)

struct, represented by each entry:

Typedef struct (Elf64_Word sh_name; Elf64_Word sh_type; Elf64_Xword sh_flags; Elf64_Addr sh_addr; Elf64_Off sh_offset; Elf64_Xword sh_size; Elf64_Word sh_link; Elf64_Word sh_info; Elf64_Xword sh_addralign; Elf64_Xword sh_entsize;) Elf64_Shdr;

Sections

Section of the index 0.

Contained in bytes from 0x40 to 0x7F.

The first section is always magical: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html says:

If the number of sections is greater than or equal to SHN_LORESERVE (0xFF00), E_SHNUM has the value of SHN_UNDEF (0), and the actual number of entry title table entries is contained in the SH_SIZE field of the section title with an index 0 (otherwise a member of the SH_SIZE of the initial record contains 0).

In the Figure 4-7 section: Special Section Indexes have other magic sections.

In the index 0, sht_null is mandatory. Are there any other uses for this: what is the benefit from the SHT_NULL section in ELF? ?

.data section

DATA is a section 1:

00000080 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 | ................ | 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................ | 000000A0 0D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................ | 000000B0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................ |

    Here 1 says that the name of this section begins with the first symbol of this section and ends on the first NUL symbol, making up a string.data.

    Data is one of the names of the partitions that has a predetermined value http://www.sco.com/developers/gabi-0003-12-17/ch4.strtab.html

    These sections are the initialized data that contributes to the program memory.

  • 80 4: sh_type \u003d 01 00 00 00: SHT_PROGBITS: The contents of the partition are not specified by the ELF, only how the program interprets it. Normally, as a .data.

    80 8: sh_flags \u003d 03 7x 00: shf_alloc and shf_execinstr: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#sh_flags, as required from section.data

    90 0: sh_addr \u003d 8x 00: In which virtual address, the section will be placed during execution, 0 if not placed

    90 8: sh_offset \u003d 00 02 00 00 00 00 00 00 \u003d 0x200: the number of bytes from the start of the program to the first byte in this section

    a0 0: sh_size \u003d 0D 00 00 00 00 00 00 00

    If you take a 0xd byte, starting with sh_offset 200, we see:

    00000200 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0A 00 | Hello WORLD! .. |

    AHA! So, our row "Hello World!" Located in the Data section, as we said it is on NASM.

    As soon as we finish HD, we will look at it as:

    READELF -X .DATA Hello_World.o

    which displays:

    HEX DUMP OF SECTION ".DATA": 0x00000000 48656c6c 6F20776F 726C6421 0A Hello WORLD!.

    NASM sets decent properties for this section, because it is magically refer to K.Data: http://www.nasm.us/doc/nasmdoc7.html#section-7.9.2

    Also note that it was the wrong selection of the section: a good C compiler would place the V.Rodata string instead, because it is read only, and this will continue the OS optimization.

    a0 8: sh_link and sh_info \u003d 8x 0: Do not apply to the type of this section. http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections

    b0 0: SH_ADDRALIGN \u003d 04 \u003d TODO: Why is this alignment needed? Is it only for sh_addr, as well as for characters inside sh_addr?

    b0 8: SH_ENTSIZE \u003d 00 \u003d Section does not contain tables. If! \u003d 0, this means that the section contains a table of fixed-size records. In this file, we see out of the READELF output, which is the case for sections.symtab I.Rela.Text.

.text section

Now that we have done one section manually, give a graduate and use readelf -s other sections.

Name Type Address Offset Size Entsize Flags Link Info Align [2] .Text ProgBits 00000000000000000027 0000000000000027 0000000000000027 00000000000000 AX 0 0 16

Text is executable, but not available for writing: If we try to write to him Linux Segfaults. Let's see if we really have a code:

ObjDump -D Hello_World.o.

Hello_World.o: File File File Elf64-x86-64 DisaSsembly of Section .text: 00000000000000<_start>: 0: B8 01 00 00 00 MOV $ 0x1,% Eax 5: BF 01 00 00 00 MOV $ 0x1,% EDI A: 48 BE 00 00 00 00 00 Movabs $ 0x0,% RSI 11: 00 00 00 14: BA 0D 00 00 00 MOV $ 0xd,% EDX 19: 0F 05 SYSCALL 1B: B8 3C 00 00 00 MOV $ 0x3c,% EAX 20: BF 00 00 00 00 MOV $ 0x0,% EDI 25: 0F 05 Syscall

If we have Grep B8 01 00 00 on HD, we see that this is only at 00000210, which is said in this section. And the size is 27, which also corresponds. Therefore, we must talk about the right section.

It looks like the right code: A WRITE, followed by EXIT.

The most interesting part is a string A, which makes:

Movabs $ 0x0,% RSI

transfer the address of the string to the system call. Currently, 0x0 is simply aggregate. After binding it will be changed:

4000ba: 48 BE D8 00 00 00 00 Movabs $ 0x6000d8,% RSI

This modification is possible due to the data section.rela.Text.

Sht_strtab.

Sections with sh_type \u003d\u003d SHT_STRTAB are called string tables.

Such sections are used by other partitions when string names should be used. The "Use" section says:

  • what line they use
  • what is an index in the table of target lines where the line begins

For example, we could have a string table containing: TODO: Do I need to start with \\ 0?

Data: \\ 0 a b C \\ 0 D E F \\ 0 index: 0 1 2 3 4 5 6 7 8

And if another partition wants to use the D E F string, they must indicate the index 5 of this section (letter D).

Famous string tables:

  • .shstrtab.
  • .strtab.

.shstrtab.

Section Type: SH_TYPE \u003d\u003d SHT_STRTAB.

Common Name: section title title string.

Section name. SSTRTAB Reserved. The standard says:

This section contains partition names.

This section indicates the E_Shstrnd field of the ELF header itself.

The lines of the lines of this section are specified by the SH_NAME field of sections headers that indicate the lines.

This section does not specify Shf_alloc, so it will not be displayed in the executable program.

READELF -X.SHSTRTAB Hello_World.o

Hex dump of section ".shstrtab": 0x00000000 002e6461 7461002e 74,657,874 73,747,274 002e7368 ..data..text..sh 0x00000010 6162002e 73796d74 6162002e strtab..symtab .. 0x00000020 73747274 6162002e 72656c61 2e746578 strtab..rela.tex 0x00000030 7400 t.

Data in this section have a fixed format: http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html

If we look at the names of other sections, we will see that they all contain numbers, for example. Section .Text has number 7.

Then each line ends when the first NUL symbol is found, for example. Symbol 12 \\ 0 immediately after .Text \\ 0.

.symtab.

Section Type: sh_type \u003d\u003d SHT_SYMTAB.

Common Name: Character Table.

First, we note that:

  • sh_link \u003d 5.
  • sh_info \u003d 6.

In the SHT_SYMTAB section, these numbers mean that:

  • Strings
  • which give characters names are in section 5, .strtab
  • data movements are located in section 6, .rela.text

Good tool high level To disassemble this section:

NM Hello_World.o.

which gives:

00000000000000 T _START 0000000000000000 D Hello_World 0000000000000D A Hello_World_len

This, however, representing a high level in which some types of characters are lowered and in which symbols are designated. More detailed disassembly can be obtained using:

READELF -S Hello_World.o.

which gives:

Symbol table ".symtab" contains 7 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello_world.asm 2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 3: 0000000000000000 0 Section Local Default 2 4: 000000000000000000 0 Notype Local Default 1 Hello_World 5: 000000000000000d 0 Notype Local Default ABS Hello_World_len 6: 0000000000000000 0 Notype Global Default 2 _START

The binary format of the table is documented at http://www.sco.com/developers/Gabi/2003-12-17/ch4.symtab.html

READELF -X.SYMTAB Hello_World.o

What gives:

HEX DUMP OF SECTION ".SYMTAB": 0x00000000 00000000 00000000 00000000 00000000 ................ 0x00000010 00000000 00000000 01000000 0400F1FF ............... . 0x00000020 00000000 00000000 00000000 00000000 .................. 0x00000030 00000000 0x00000030 00000000 0x000000 00000000 00000000 .................. 0x00000040 00000000 00000000 000000 03000200 ................ 0x00000050 00000000 00000000 00000000 00000000 ................ 0x00000060 11000000 00000000 000000 00000000 .......... ...... 0x00000070 00000000 00000000 1D000000 0000F1FF ................ 0x00000080 0D000000 00000000 00000000 00000000 ................ 0x00000090 2D000000 10000200 00000000 00000000 --............... 0x000000A0 00000000 00000000 ........

Records are type:

TypeDef Struct (ELF64_WORD ST_NAME; unsigned char st_info; unsigned char st_other; ELF64_HALF ST_SHNDX; ELF64_ADDR ST_VALUE; ELF64_XWORD ST_SIZE;) ELF64_SYM;

As in the partition table, the first record of the magic and is set by fixed meaningless values.

Recording 1 has ELF64_R_TYPE \u003d\u003d STT_FILE. ELF64_R_TYPE continues inside ST_INFO.

Byte analysis:

    10 8: ST_NAME \u003d 01000000 \u003d Symbol 1 V.STRTAB, which up to the next \\ 0 makes hello_world.asm

    This fragment of the information file can be used by the linker to determine which segment segments are going.

    10 12: St_info \u003d 04

    Bits 0-3 \u003d ELF64_R_TYPE \u003d Type \u003d 4 \u003d STT_FILE: The main purpose of this entry is to use ST_NAME to specify the file name generated by this object file.

    Bits 4-7 \u003d ELF64_ST_BIND \u003d binding \u003d 0 \u003d STB_LOCAL. Required value for STT_FILE.

    10 13: ST_SHNDX \u003d Symbol Table Table Headers Index \u003d F1FF \u003d SHN_ABS. Required for STT_FILE.

    20 0: ST_VALUE \u003d 8X 00: required for value for STT_FILE

    20 8: ST_SIZE \u003d 8X 00: no highlighted size

Now from READELF, we quickly interpret the rest.

STT_SECTION

There are two such elements, one specifies by .data, and the other on thisText (partition indexes 1 and 2).

Num: Value Size Type Bind VIS NDX Name 2: 000000000000000000 0 Section Local Default 1 3: 0000000000000000 0 Section Local Default 2

TODO, what is their goal?

STT_NOTYPE.

Then enter the most important characters:

Num: Value Size Type Bind VIS NDX Name 4: 000000000000000000 0 Notype Local Default 1 Hello_World 5: 000000000000000d 0 Notype Local Default ABS Hello_World_len 6: 00000000000000 0 Notype Global Default 2 _START Line

hello_World is located in the section.data (index 1). This value is 0: It indicates the first byte of this section.

Start is marked by the visibility of Global, as we wrote:

Global _STart.

in NASM. This is necessary, as it should be considered as an entry point. Unlike C, the default NASM labels are local.

hello_World_len Indicates a special ST_SHNDX \u003d\u003d SHN_ABS \u003d\u003d 0xF1FF.

0xF1FF is selected so as not to contradict to other sections.

sT_VALUE \u003d\u003d 0XD \u003d\u003d 13, which is the value that we have saved there on the assembly: the length of the HELLO WORLD string! .

This means that the movement will not affect this value: it is a constant.

This is a small optimization that our assembler makes for us and has ELF support.

If we used the address Hello_World_len anywhere, the assembler could not mark it like SHN_ABS, and later the linker would have an additional movement.

Sht_symtab in executable file

By default, NASM places .SymTAB in the executable file.

It is used only for debugging. Without characters, we are completely blind and need to redo everything.

You can delete it with ObjCopy, and the executable file will still work. Such executable files are called split executable files.

.strtab.

Holds strings for the character table.

In this section Sh_Type \u003d\u003d SHT_STRTAB.

Specifies on sh_link \u003d\u003d 5 section.symtab.

READELF -X .STRTAB Hello_World.o

Hex dump of section ".strtab": 0x00000000 0068656c 6c6f5f77 6f726c64 2e61736d .hello_world.asm 0x00000010 0068656c 6c6f5f77 6f726c64 0068656c .hello_world.hel 0x00000020 6c6f5f77 6f726c64 5f6c656e 005f7374 lo_world_len._st 0x00000030 61727400 art.

This means that this is a limitation of the level of ELF, which global variables cannot contain NUL characters.

.rela.text.

Section Type: SH_TYPE \u003d\u003d SHT_RELA.

Common Name: Travel section.

Rela.Text contains the movement data in which the address must be changed when the last executable file is connected. This indicates the bytes of the text area, which must be changed when binding occurs indicating the right memory places.

Basically, it converts the text of the object containing the address of the filler 0x0:

A: 48 BE 00 00 00 00 00 Movabs $ 0x0,% RSI 11: 00 00 00

to the actual executable code containing the final 0x6000d8:

4000ba: 48 BE D8 00 00 00 Movabs $ 0x6000d8,% RSI 4000C1: 00 00 00

It was indicated sh_info \u003d 6 section.symtab.

rEADELF -R Hello_World.o gives:

Relocation Section ".rela.text" AT Offset 0x3b0 Contains 1 Entries: Offset Info Type Sym. Value Sym. Name + AddEnd 00000000000C 000200000001 R_X86_64_64 0000000000000000.Data + 0

The section does not exist in the executable file.

Actual bytes:

00000370 0C 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 | ................ | 00000380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................ |

Presented struct:

TypeDef Struct (ELF64_ADDR R_OFFSET; ELF64_XWORD R_INFO; ELF64_SXWORD R_ADDEND;) ELF64_RELA;

    370 0: R_OFFSET \u003d 0xC: Address to address.Text, whose address will be changed

    370 8: R_info \u003d 0x200000001. Contains 2 fields:

    • ELF64_R_TYPE \u003d 0x1: The value depends on the accurate architecture.
    • ELF64_R_SYM \u003d 0X2: The section index, which indicates the address, therefore.data, which is located in the index 2.

    AMD64 ABI says that type 1 is called R_X86_64_64 and that it represents the operation S + A where:

    • S: Symbol value in the object file, here 0, because we point to 00 00 00 00 00 00 00 00 from Movabs $ 0x0,% RSI
    • a: Adding present in the R_ADDED field

    This address is added to the section in which movement is running.

    This transmitting operation is valid for 8 bytes.

    380 0: R_ADDEND \u003d 0

Thus, in our example, we conclude that the new address will be: s + a \u003d .data + 0, and thus the first in the data section.

Table title programs

Displayed only in the executable file.

Contains information on how the executable file must be placed in the virtual memory of the process.

The executable file is created by a linker object file. The main tasks that the linker performs:

    determine which partitions of object files will enter which segments of the executable file.

    In Binutils, this comes down to analyzing the Script linker and working with a multitude of default values.

    You can get a Script linker used with LD --Verbose, and install a custom with LD -T.

    move on text sections. It depends on how several sections are placed in memory.

rEADELF -L Hello_World.out gives:

Elf file type is EXEC (Executable file) Entry point 0x4000b0 There are 2 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x00000000000000d7 0x00000000000000d7 RE 200000 LOAD 0x00000000000000d8 0x00000000006000d8 0x00000000006000d8 0x000000000000000d 0x000000000000000d RW 200000 Section To Segment Mapping: Segment Sections ... 00 .text 01 .data

The ELF E_PHOFF header, E_phnum and E_phentsize told us that there are 2 headers of the program that start with 0x40 and a length of 0x38 bytes each, so they:

00000040 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 |................| 00000050 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 |[Email Protected]@ ..... | 00000060 D7 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................ | 00000070 00 00 20 00 00 00 00 00 | .. ..... |

00000070 01 00 00 00 06 00 00 00 | ........ | 00000080 D8 00 00 00 00 00 00 00 D8 00 00 00 00 00 00 00 | .......... `..... | 00000090 D8 00 00 00 00 00 00 00 0D 00 00 00 00 00 00 00 | .. ............. | 000000A0 0D 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 | .......... ..... | typedef struct (Elf64_Word p_type; Elf64_Word p_flags; Elf64_Off p_offset; Elf64_Addr p_vaddr; Elf64_Addr p_paddr; Elf64_Xword p_filesz; Elf64_Xword p_memsz; Elf64_Xword p_align;) Elf64_Phdr;

Sample first:

  • 40 0: P_TYPE \u003d 01 00 00 00 \u003d PT_LOAD: TODO. I think it means that it will be loaded into memory. Other types may not be necessary.
  • 40 4: p_flags \u003d 05 00 00 00 \u003d Perform and read permissions, do not write TODO
  • 40 8: p_offset \u003d 8x 00 TODO: What is it? It seems to displace from the start of the segments. But this will mean that some segments are intertwined? You can play with him a bit: gcc -wl, -ttext-segment \u003d 0x400030 Hello_World.c
  • 50 0: p_vaddr \u003d 00 00 00 00 00 00 00 00: The initial address of the virtual memory to load this segment in
  • 50 8: p_paddr \u003d 00 00 40 00 00 00 00 00: Starting physical address to download to memory. Only issues for systems in which the program can establish a physical address. Otherwise, as in system systems V, it can be anything. It seems that NASM simply copies P_VADDRR
  • 60 0: p_filesz \u003d d7 00 00 00 00 00 00 00: TODO VS P_MEMSZ
  • 60 8: p_memsz \u003d d7 00 00 00 00 00 00 00: TODO
  • 70 0: P_ALIGN \u003d 00 00 00 00 00 00 00 00: 0 or 1 means that no alignment is required TODO, what does this mean? otherwise excessively with other fields

The second is similar.

Section to Segment Mapping:

section Readelf tells us that:

  • 0 - segment.Text. Yeah, so it is executable and not available for recording.
  • 1 - segment.data.
Standard means Developments compile your program to the ELF file (Executable and Linkable Format) with the ability to enable debug information. The format specification can be read. In addition, each architecture has its own characteristics, for example, ARM features. Consider briefly this format.
The executable file of the ELF format consists of such parts:
1. Title (ELF Header)
Contains general information About the file and its main characteristics.
2. Program header (Program Header Table)
This is the table of conformity of the segment segments of the memory segments, indicates the loader to which area of \u200b\u200bmemory to write each section.
3. Sections
Sections contain all information in the file (program, data, debug information, etc.)
Each section has a type, name and other parameters. The ".text" section is usually stored code, in the ".symtab" - a program character table (file names, procedures and variables), in ".strtab" - Table of strings, in sections with the prefix ".debug_" - debug information and t .. In addition, the file must necessarily be an empty section with an index 0.
4. Section header (Section Header Table)
This is a table containing an array of sections headers.
The format is discussed in more detail in the ELF Creation section.

Review dwarf.

DWARF is a standardized debug information format. Standard can be downloaded on the official website. There is also a wonderful short review of the format: Introduction to the Dwarf DEBUGGING FORMAT (Michael J. Eager).
Why do I need debug information? It allows you to:
  • install the stoppoint (breakpoints) is not a physical address, but on the line number in the source code file or the function name
  • display and change the values \u200b\u200bof global and local variables, as well as function parameters
  • display call stack (backtrace)
  • execute the program step by step is not one of the assembler instruction, but on the rows of the source code
This information is stored as a tree structure. Each tree node has a parent, can have descendants and is called DIE (Debugging Information Entry). Each node has its own tag (type) and the list of attributes (properties) describing the node. Attributes may contain anything, such as data or references to other nodes. In addition, there is information stored outside the tree.
Nodes are divided into two main types: nodes describing data, and nodes describing code.
Nodes describing data:
  1. Data types:
    • Basic data types (node \u200b\u200bwith DW_TAG_BASE_TYPE), such as type int in C.
    • Composite data types (pointers, etc.)
    • Arrays
    • Structures, Classes, Combines, Interfaces
  2. Data objects:
    • constants
    • parameters of functions
    • variables
    • etc.
Each data object has a DW_AT_LOCATION attribute, which indicates how the address is calculated by which the data is. For example, the variable can have a fixed address, to be in the register or on the stack, be a class member or object. This address can be calculated rather difficult, so the standard provides so-called Location Expressions, which may contain a sequence of operators of a special internal stack.
Nodes describing code:
  1. Procedures (functions) - nodes with a DW_TAG_SUBPROGRAM tag. Descendation nodes may contain descriptions of variables - function parameters and local variable function variables.
  2. Compiling Unit (Compilation Unit). Contains information program and is the parent of all other nodes.
The information described above is in the ".debug_info" and ".debug_abbrev" sections.
Other information:
  • Row number information (Section ".debug_line")
  • Macro Information (Section ".debug_macInfo")
  • Frame Information Information Information (Call Frame Information) (Section ".debug_frame")

Creating ELF.

Create files in EFL format, we will use the libelf library from the Elfutils package. The network has a good article on the use of Libelf - Libelf by Example (unfortunately, the creation of files in it is described very briefly) as well as documentation.
Creating a file consists of several stages:
  1. Initialization Libelf.
  2. Creating a file header (ELF HEADER)
  3. Creating a program header (Program Header Table)
  4. Creating sections
  5. Record file.
Consider the steps more details
Initialization Libelf.
First you need to call the ELF_VERSION function (EV_CURRENT) and check the result. If it is equal to Ev_none - an error has arisen and it is impossible to produce further actions. Then you need to create the file you need on the disk, get it a descriptor and transfer it to the ELF_BEGIN function:
ELF * ELF_BEGIN (int fd, ELF_CMD CMD, ELF * ELF)
  • fD - descriptor just open File
  • cMD - mode (ELF_C_READ for reading information, ELF_C_WRITE to write or ELF_C_RDWR to read / write), it must match the open file mode (ELF_C_WRITE in our case)
  • eLF - only needed to work with archive files (.a), in our case, you need to pass 0
The function returns a pointer to the descriptor created, which will be used in all Libelf functions, returns in case of an error.
Creating a header
The new file header is created by the ELF32_NEWEHDR function:
ELF32_EHDR * ELF32_NEWEHDR (ELF * ELF);
  • eLF - descriptor returned by the function Elf_begin
Returns 0 when an error or pointer to the structure - the ELF file header:
#define EI_NIDENT 16 typedef struct (unsigned char e_ident; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; ) ELF32_EHDR;

Some fields are filled with a standard way, some need to fill us:

  • e_Ident - byte array of identification, has such indices:
    • EI_MAG0, EI_MAG1, EI_MAG2, EI_MAG3 - These 4 bytes must contain simomel 0x7F, "ELF", which the ELF32_NEWEHDR function has already done for us
    • EI_DATA - indicates the type of data encoding in the file: ElfData2LSB or ElfData2MSB. You need to install ElfData2LSB as follows: E_IIDENT \u003d ElfData2LSB
    • EI_VERSion - the file header version, already installed for us
    • EI_PAD - do not touch
  • e_TYPE - File type, maybe ET_NONE - without type, ET_REL - a moved file, ET_EXEC - executable file, ET_DYN - shared object file, etc. We need to set the file type in ET_EXEC
  • e_MACHINE - Architecture required for this file, for example EM_386 - For Intel architecture, for ARM we need to write here EM_ARM (40) - see ELF for the Arm Architecture
  • e_VERSION - File version, you need to install in EV_CURRENT
  • e_ENTRY - Address of the entry point, for us it is not necessary
  • e_PHOFF - offset in the program header file, E_SHOFF - offset sections, do not fill
  • e_FLAGS - Special for the Flags processor, for our architecture (Cortex-M3) you need to set 0x05000000 (ABI Version 5)
  • e_EHSIZE, E_PHENTSIZE, E_PHNUM, E_SHENTSIZE, E_SHNUM - Do not touch
  • e_ShstrNDX - Contains the section number in which there is a string table with sections headers. Since we do not have any sections, we will set this number later.
Creating a program header
As already mentioned, the Program Header Table (Program Header Table) is a table of conformity of the file sections of the memory segments that indicates the loader where to write each section. The coupling is created using the ELF32_NEWPHDR function:
ELF32_PHDR * ELF32_NEWPHDR (ELF * ELF, SIZE_T COUNT);
  • eLF - our descriptor
  • cOUNT - the number of table elements created. Since we will have only one section (with the program code), then Count will be 1.
Returns 0 when an error or pointer to the program title.
Each element in the title table is described by such a structure:
TypeDef struct (ELF32_WORD P_OFFSET; ELF32_OFF P_OFFSET; ELF32_ADDR P_VADDR; ELF32_ADDR P_PADDR; ELF32_WORD P_FILESZ; ELF32_WORD P_MEMSZ; ELF32_WORD P_FLAGS; ELF32_WORD P_ALIGN;) ELF32_PHDR;
  • p_TYPE - type of segment (section), here we must specify PT_Load - downloadable segment
  • p_OFFSET - displacements in the file where the section data begins to be loaded into memory. We have this section .Text, which will be immediately after the header of the file and the program header, the offset we can calculate as the sum of the lengths of these headers. The length of any type can be obtained using the ELF32_FSIZE function:
    Size_t ELF32_FSIZE (Elf_Type Type, Size_t Count, Unsigned Int Version); Type - here the ELF_T_HXX constant, we need the dimensions of ELF_T_EHDR and ELF_T_PHDR; Count - the number of items of the desired type, version - need to be installed in EV_CURRENT
  • p_vaddr, p_paddr - virtual and physical address by which the contents of the section will be downloaded. Since we have no virtual addresses, we install it equal to physical, in the simplest case - 0, because our program will be loaded here.
  • p_FILESZ, P_MEMSZ - section size in the file and memory. We have they are the same, but since there are no more sections with the program code, set them later.
  • p_Flags - permissions for the loaded memory segment. There may be pf_r - reading, pf_w - recording, pf_x - execution or combination thereof. Set P_Flags equal to PF_R + PF_X
  • p_ALIGN - Alignment of the segment, we have 4
Creating sections
After creating headers, you can start creating sections. An empty section is created using the ELF_NEWSCN function:
ELF_SCN * ELF_NEWSCN (ELF * ELF);
  • eLF - descriptor returned earlier by the ELF_BEGIN function
The function returns a pointer to section or 0 in case of error.
After creating the section, you need to fill in the section header and create a sequence data descriptor.
The pointer to the section title we can get using the ELF32_GETSHDR function:
ELF32_SHDR * ELF32_GETSHDR (ELF_SCN * SCN);
  • sCN is a pointer to the section that we received from the ELF_NEWSCN function.
The section title looks like this:
typedef struct (Elf32_Word sh_name; Elf32_Word sh_type; Elf32_Word sh_flags; Elf32_Addr sh_addr; Elf32_Off sh_offset; Elf32_Word sh_size; Elf32_Word sh_link; Elf32_Word sh_info; Elf32_Word sh_addralign; Elf32_Word sh_entsize;) Elf32_Shdr;
  • sh_name - section name - offset in the string table of sections headers (section.shstrTab) - see "Row Tables" Next
  • sH_TYPE - the type of content section, for the program code section you need to install SHT_PROGBITS, for sections with a string table - SHT_STRTAB, for the symbol table - SHT_SYMTAB
  • sh_flags - Section flags that can be combined, and we only need three:
    • Shf_alloc - means that the section will be loaded into memory
    • SHF_EXECINSTR - section contains executable code
    • SHF_STRINGS - section contains a string table
    Accordingly, for the section.Text with the program you need to install the shf_alloc + shf_execinstr flags
  • sh_addr - address by which the section will be loaded into memory
  • sh_offset - selection offset in the file - do not touch, the library will set for us
  • sh_size - section size - do not touch
  • sh_link - contains the associated section number, needed to communicate the section with the row table corresponding to it (see below)
  • sh_info - additional Informationdepending on the type of section, set in 0
  • sh_addralign - address alignment, do not touch
  • sh_entsize - if the section consists of several elements of the same length, indicates the length of such an element, do not touch
After filling in the header, you need to create a description of the section data of the ELF_NEWDATA function:
ELF_DATA * ELF_NEWDATA (ELF_SCN * SCN);
  • sCN - just the resulting pointer to a new section.
The function returns 0 when an error, or a pointer to the ELF_DATA structure, which will be fill in:
TypeDef Struct (void * d_buf; Elf_Type d_type; size_t d_size; off_t d_off; size_t d_align; unsigned d_version;) Elf_Data;
  • d_BUF - pointer to the data you want to write to the section
  • d_Type - data type, for us everywhere will suit ELF_T_BYTE
  • d_SIZE - Data Size
  • d_OFF - offset in the section, set in 0
  • d_ALIGN - alignment, can be installed in 1 - without alignment
  • d_VERSion - version, be sure to install in EV_CURRENT
Special sections
For our purposes, we will need to create the minimum required section set:
  • .Text - Section with program code
  • .SymTAB - File Symbol Table
  • .strtab is a string table containing characters names from the.symtab section, since the names are not stored in the latter, but their indices
  • .shstrTab - Table rows containing sections names
All sections are created as described in the previous section, but each special section has its own characteristics.
Section.Text
This section contains executable code, so you need SH_TYPE to install in SHT_PROGBITS, SH_FLAGS - in SHF_EXECINSTR + SHF_ALLOC, SH_ADDR - Set the address to which this code will be downloaded
Section.symtab.
The section contains a description of all characters (functions) of the program and the files in which they were described. It consists of such elements of 16 bytes long:
TypeDef Struct (ELF32_WORD ST_NAME; ELF32_ADDR ST_VALUE; ELF32_WORD ST_SIZE; unsigned char st_info; unsigned char st_other; ELF32_HALF ST_SHNDX;) ELF32_SYM;
  • sT_NAME - symbol name (index in the string table.strTab)
  • sT_VALUE - value (input address for function or 0 for file). Since Cortex-M3 has a thumb-2 command system, this address must be odd (real address + 1)
  • sT_SIZE - Function Code Length (0 for File)
  • sT_INFO - type of symbol and its scope. To determine the value of this field there is a macro
    #Define ELF32_ST_INFO (B, T) (((B)<<4)+((t)&0xf))
    where b is the scope, and T - the type of symbol
    The scope may be STB_LOCAL (the symbol is not visible from other object files) or STB_GLOBAL (visible). Use STB_GLOBAL to simplify.
    Symbol type - STT_FUNC for function, stt_file for file
  • sT_OTHER - Set in 0
  • sT_SHNDX is a section index for which the symbol is defined (Section .Text index), or SHN_ABS for the file.
    The section index of its SCN descriptor can be determined using ELF_NDXSCN:
    Size_t ELF_NDXSCN (ELF_SCN * SCN);

This section is created in the usual way, only SH_TYPE must be installed in SHT_SYMTAB, and the SCTRTAB section index is written in the sh_link field, so these sections will become connected.
Section.strtab
In this section there are names of all characters from the Section .SymTAB. It is created as a regular section, but SH_TYPE must be installed in SHT_STRTAB, SH_FLAGS - in SHF_STRINGS, so this section becomes a string table.
The data for the section can be collected when passing through the source text to the array, the pointer to which is then written to the section data descriptor (D_BUF).
Section.shstrTab
Section - String Table, contains headings of all file sections, including its heading. It is created in the same way as the Section.strtab. After creating its index, you need to record the file header in the E_shstrndx field.
Tables row
Row tables contain a row running rows covered by zero byte, the first byte in this table must also be 0. The line index in the table is simply an offset in bytes from the beginning of the table, so the first line "name" has index 1, the next string " var "has an index 6.
Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 \\ 0 N A M E \\ 0 V A R \\ 0
Record file.
So, the headlines and sections are already formed, now they need to write to the file and complete the work with Libelf. The entry is performed by the ELF_UPDATE function:
OFF_T ELF_UPDATE (ELF * ELF, ELF_CMD CMD);
  • eLF - descriptor
  • cMD - command must be equal to ELF_C_WRITE to record.
The function returns -1 in case of error. The error text can be obtained by calling the ELF_ERRMSG (-1) function, which will return the pointer to the error string.
We finish working with the library function ELF_END, which we transmit our descriptor. It remains only to close the previously open file.
However, our created file does not contain debugging information that we will add in the next section.

Creating dwarf.

We will create debug information using the library, which is included with the PDF file with the documentation (LIBDWARF2P.1.PDF - A PRODUCER Library Interface to DWARF).
Creating debug information consists of such steps:
  1. Creating nodes (DIE - DEBUGGING INFORMATION ENTRY)
  2. Creating an attribute of the node
  3. Creating data types
  4. Creating procedures (functions)
Consider the steps more details
Initialization libdwarf producer
We will create debug information during compilation simultaneously with the creation of characters in section.symtab, so the library initialization must be made after the Libelf initialization, creating an ELF header and the program header, before creating sections.
For initialization, we will use the DWARF_PRODUCER_INIT_C function. The library has several more initialization features (DWARF_PRODUCER_INIT, DWARF_PRODUCER_INIT_B), which differ in some nuances described in the documentation. In principle, you can use any of them.

DWARF_P_DEBUG DWARF_PRODUCER_INIT_C (DWARF_UNSIGNED FLAGS, DWARF_CALLBACK_FUNC_C FUNC, DWARF_HANDLER ERRHAND, DWARF_PTR ERRARG, VOID * User_Data, DWARF_ERROR * Error)

  • flags - a combination of "or" several constants that define some parameters, such as the information of the information, following bytes (Little-Endian, Big-Endian), the format of the ralocations, from which we necessarily need DW_DLC_WRITE and DW_DLC_SYMBOLIC_RELOCATIONS
  • fUNC - Callback function that will be called when creating ELF sections with debug information. See below in the section "Creating sections with debugging information"
  • errhand is a pointer to a function that will be called when errors occur. You can pass 0.
  • eRRARG - Data that will be transmitted to the Errhand function can be set 0
  • user_data - data that will be transmitted to the FUNC function can be set 0
  • eRROR - Return error code
The function returns dwarf_p_debug - a descriptor used in all subsequent functions, or -1 in case of an error, and the error code will be the error code (you can get the text of the error message by its code using the DWARF_ERRMSG function, passing it to it)
Creating nodes (DIE - DEBUGGING INFORMATION ENTRY)
As described above, debug information forms a tree structure. In order to create a node of this tree, you need:
  • create it function dwarf_new_die
  • add attributes to it (each type of attributes is added to its function to be described below)
The node is created using the DWARF_NEW_DIE function:
Dwarf_p_die dwarf_new_die (dwarf_p_debug dbg, dwarf_tag new_tag, dwarf_p_die parent, dwarf_p_die child, dwarf_p_die left_sibling, dwarf_p_die right_sibling, dwarf_error * error)
  • new_Tag - TEG (type) of the node - DW_TAG_XXXX constant, which can be found in the libdwarf.h file
  • parent, Child, Left_sibling, Right_Sibling - respectively, parent, descendant, left and right neighbors of the node. You do not need to specify all these parameters, it is enough to specify one, instead of putting 0. If all parameters are equal to 0, the node will be or root, or insulated
  • eRROR - will contain an error code when it occurs
The function returns dw_dlv_badaddr when an error or a DWARF_P_DIE node descriptor is successful
Creating an attribute of the node
To create an node attributes, there is a whole family of functions dwarf_add_at_hxxx. Sometimes it is problematic to determine which function you need to create the necessary attribute, so I have even dropped several times in the source code of the library. Some of the functions will be described here, some below - in the relevant sections. All accept the OwnerDie parameter - the node descriptor to which the attribute will be added, and return the error code in the Error parameter.
The dwarf_add_at_name function adds to the node "Name" attribute (DW_AT_NAME). Most nodes must have a name (for example, procedures, variables, constants), some names may not be (for example, Compilation Unit)
DWARF_P_ATTRIBUTE DWARF_ADD_AT_NAME (DWARF_P_DIE OWNERDIE, CHAR * NAME, DWARF_ERROR * ERROR)
  • name - Actually the value of the attribute (node \u200b\u200bname)

DWARF_ADD_AT_SIGNED_CONST, DWARF_ADD_AT_UNSIGNED_CONST features added to the node the specified attribute and its iconic (unsigned) value. Signal and unsigned attributes are used to specify the values \u200b\u200bof the constants, sizes, row numbers, etc. Function format:
Dwarf_p_attribute dwarf_add_at_ (un) signed_const (dwarf_p_debug dbg, dwarf_p_die ownerdie, dwarf_half attr, dwarf_signed value, dwarf_error * error)
  • dBG - DWARF_P_DEBUG descriptor obtained when initializing the library
  • aTTR - attribute whose value is set is the DW_AT_XXXX constant that can be found in the libdwarf.h file
  • value - Attribute value
Return DW_DLV_BADADDR in case of an error or an attribute descriptor upon successful completion.
Creating a compilation unit (Compulation Unit)
In any tree, there must be a root - we have a compilation unit that contains information about the program (for example, the name of the main file, the programming language, the name of the compiler, the sensitivity of the characters (variables, functions) to the register, the main function of the program, the starting address and. so d). In principle, no attributes are mandatory. For example, create information about the main file and the compiler.
Main File Information
To store information about the main file, use the "Name" attribute (dw_at_name), use the DWARF_ADD_AT_NAME function, as shown in the section "Creating an attribute node".
Compiler information
Use the DWARF_ADD_AT_PRODUCER function:
DWARF_P_ATTRIBUTE DWARF_ADD_AT_NAME (DWARF_P_DIE OWNERDIE, CHAR * PRODUCER_STRING, DWARF_ERROR * ERROR)
  • producer_String - line with information text
Returns DW_DLV_BADADDR in case of an error or an attribute descriptor upon successful completion.
Creating COMMON INFORMATION ENTRY
Usually, when calling a function (subroutine), its parameters and return address is placed on the stack (although each compiler can do it in its own way), all this is called Call Frame. The debugger needs information about the frame format to correctly determine the return address from the function and build a backtrace - the chain of the function calls, which led us to the current function, and the parameters of these functions. The processor registers that are stored on the stack are also usually indicated. The code that reserves the place on the stack and stores the processor registers, is called a function prologue, code that restores registers and an epilogue stack.
This information is highly dependent on the compiler. For example, a prologue and epilogue should not necessarily be at the very beginning and end of the function; Sometimes a frame is used, sometimes not; The processor registers can be saved in other registers, etc.
So, the debugger needs to know how the processor registers change their value and where they will be saved when entering the procedure. This information is called Call Frame Information - information on the format of the frame. For each address in the program (containing code), the frame address is in memory (Canonical Frame Address - CFA) and information on processor registers, for example, you can specify that:
  • the register is not saved in the procedure.
  • the register does not change its value in the procedure.
  • the register is stored on the stack at CFA + N
  • register persists in another register
  • the register is stored in memory at some address that can be calculated rather unlikely.
  • etc.
Since the information should be specified for each address in the code, it is very voluminous and stored in a compressed form in the .debug_frame section. Since it changes little from the address to the address, only its changes are encoded in the form of instructions DW_CFA_HXXX. Each instruction indicates one change, for example:
  • DW_CFA_SET_LOC - indicates the current address in the program
  • Dw_cfa_advance_loc - promotes a pointer to some byte
  • DW_CFA_DEF_CFA - Indicates the address of the stack frame (Numerical Constant)
  • DW_CFA_DEF_CFA_REGISTER - indicates the address of the stack frame (taken from the processor register)
  • DW_CFA_DEF_CFA_EXPRESSION - Indicates how to calculate the address of the stack frame
  • DW_CFA_SAME_VALUE - Indicates that the register does not change
  • DW_CFA_REGISTER - specify that the register is saved in another register
  • etc.
Section elements .Debug_frame are records that can be two types: Common Information Entry (CIE) and Frame Description Entry (FDE). CIE contains information that is common to many FDE records, roughly speaking, it describes a specific type of procedures. FDE also describe each specific procedure. When entering the procedure, the debugger first executes instructions from CIE and then from FDE.
My compiler creates the procedures in which CFA is in the SP register (R13). Create CIE for all procedures. To do this, there is a DWARF_ADD_FRAME_CIE function:
Dwarf_Unsigned dwarf_add_frame_cie (Dwarf_P_Debug dbg, char * augmenter, Dwarf_Small code_align, Dwarf_Small data_align, Dwarf_Small ret_addr_reg, Dwarf_Ptr init_bytes, Dwarf_Unsigned init_bytes_len, Dwarf_Error * error);
  • aUGMENTER is a string in utf-8 encoding, the presence of which shows that there is additional platform information to CIE or FDE. We put an empty string
  • code_align - alignment of the code in bytes (we have 2)
  • data_ALIGN - alignment of data in the frame (we put -4, which means all the parameters occupy 4 bytes on the stack and it grows in memory down)
  • rET_ADDR_REG - Register containing the address of the return from the procedure (we have 14)
  • init_Bytes - an array containing instructions DW_CFA_HXXX. Unfortunately, there is no convenient way to generate this array. You can create it manually or to high in the ELF file that was generated by the compiler with what I did. For my case, it contains 3 bytes: 0x0c, 0x0d, 0, which is decrypted as DW_CFA_DEF_CFA: R13 OFS 0 (CFA is in the R13 register, the offset is 0)
  • init_Bytes_len - Array Length Init_Bytes
The function returns dw_dlv_nocount when an error or a CIE is an error, which should be used when creating an FDE for each procedure that we will look at further in the "Creating FDE procedure" section
Creating data types
Before creating procedures and variables, you must first create nodes corresponding to data types. Data types There are many, but all of them are based on basic types (elementary types like int, double, etc.), the remaining types are built from the basic.
Basic type is a node with a DW_TAG_BASE_TYPE tag. He must have attributes:
  • "Name" (dw_at_name)
  • "Coding" (dw_at_encoding) - means which data is described by this basic type (for example, DW_ATE_BOOLAN - logical, dw_ate_float - floating point, dw_ate_signed - a whole sign, DW_ATE_UNSIGNED - whole unsolved, etc.)
  • "Size" (DW_AT_BYTE_SIZE - size in bytes or dw_at_bit_size - size in bits)
Also, the node may contain other optional attributes.
For example, to create a 32-bit whole iconic basic type "INT", we will need to create a node with a DW_TAG_BASE_TYPE tag and set the attributes DW_AT_NAME - "INT", DW_AT_ENCODING - DW_ATE_SIGNED, DW_AT_BYTE_SIZE - 4.
After creating basic types, you can create derivatives from them. Such nodes must contain the DW_AT_TYPE attribute - reference to their base type. For example, an int - node with a TEG DW_TAG_POINTER_TYPE must contain a link to the previously created type "int" in the DW_AT_TYPE attribute.
The attribute with reference to another node is created by the dwarf_add_at_reference function:
DWARF_P_ATTRIBUTE DWARF_ADD_AT_REFERENCE (DWARF_P_DEBUG DBG, DWARF_P_DIE OWNERDIE, DWARF_HALF ATTR, DWARF_P_DIE OTHERDIE, DWARF_ERROR * ERROR)
  • aTTR - attribute, in this case DW_AT_TYPE
  • otherdie - a type descriptor type that reference
Creating procedures
To create procedures, I need to explain another type of debugging information - information about row numbers (Line Number Information). It serves to compare each machine instruction of a specific line of source code as well as for the possibility of a short debugging program. This information is stored in the .debug_line section. If we had enough space, it would be stored in the form of a matrix, one line for each instruction with such columns:
  • file name with source code
  • row number in this file
  • column number in the file
  • is the instruction on the beginning of the operator or the operator block
  • etc.
Such a matrix would be very big, so it is necessary to compress. First, duplicate strings are removed, and secondly, not the strings themselves are saved, but only changes in them. These changes look like teams for a finite automaton, and the information itself is already considered a program that will be "executed" by this machine. The commands of this program look like this: dw_lns_advance_pc - promote command counter to some address, DW_LNS_SET_FILE - Set the file in which the procedure is defined, DW_LNS_CONST_ADD_PC is to promote command counter to several bytes, etc.
At such a low level, it is difficult to create this information, so in the libdwarf library there are several functions that facilitate this task.
Store the file name for each instruction is invoicing, so instead of the name its index is stored in a special table. To create a file index, you need to use the DWARF_ADD_FILE_DeCL function:
DWARF_UNSIGNED DWARF_ADD_FILE_DECL (DWARF_P_DEBUG DBG, CHAR * NAME, DWARF_UNSIGNED DIR_IDX, DWARF_UNSIGNED TIME_MOD, DWARF_UNSIGNED LENGTH, DWARF_ERROR * ERROR)
  • name - File Name
  • dir_idx - the folder index in which the file is located. The index can be obtained using the dwarf_add_directory_decl function. If complete paths are used, you can set 0 as the folder index and do not use dwarf_add_directory_decl
  • time_Mod - the file modification time, you can not specify (0)
  • length - the file size, also not necessarily (0)
Funca will return the file index or DW_DLV_NOCOUNT when error.
To create information about rows numbers, there are three functions dwarf_add_line_entry_b, dwarf_lne_set_address, dwarf_lne_end_sequence, which we will look below.
Creating debug information for the procedure takes place in several stages:
  • creating a symbol procedure in section.symtab
  • creating an attribute procedure node
  • creating a FDE procedure
  • creating procedure parameters
  • creating information about row numbers
Creating a procedure symbol
The procedure symbol is created as described above in the section "Section.symtab". In it, simults of procedures are mixed with file symbols in which the source code of these procedures is located. First create a file symbol, then procedures. In this case, the file becomes current, and if the next procedure is in the current file, the file symbol is not needed again.
Creating an attribute procedure node
First, create a node using the dwarf_new_die function (see "Creating nodes") by specifying the DW_TAG_SUBPROGRAM as a tag, and as a parent - Compilation Unit (if this is a global procedure) or the corresponding DIE (if local). Next, create attributes:
  • the name of the procedure (function dwarf_add_at_name, see "Creating a node attributes")
  • the line number in the file where the procedure code begins (DW_AT_DECL_LINE attribute), the dwarf_add_at_unsigned_const function (see "Creating a node attributes")
  • the initial address of the procedure (attribute DW_AT_LOW_PC), the function dwarf_add_at_targ_address, see below
  • the end address of the procedure (attribute dw_at_high_pc), dwarf_add_at_targ_address function, see below
  • the type of result of the result procedure (attribute DW_AT_TYPE - a reference to the previously created type, see "Creating data types"). If the procedure does not return anything - this attribute does not need to create
Attributes DW_AT_LOW_PC and DW_AT_HIGH_PC You need to create a specially designed function DWARF_ADD_AT_TARG_ADDRESS_B:
DWARF_P_ATTRIBUTE DWARF_ADD_AT_TARG_ADDRESS_B (DWARF_P_DEBUG DBG, DWARF_P_DIE OWNERDIE, DWARF_HALF ATTR, DWARF_UNSIGNED PC_VALUE, DWARF_UNSIGNED SYM_INDEX, DWARF_ERROR * ERROR)
  • aTTR - Attribute (DW_AT_LOW_PC or DW_AT_HIGH_PC)
  • pC_VALUE - address value
  • sym_INDEX - procedure symbol index in the table.symtab. Optional, you can pass 0
The function will return dw_dlv_badaddr in case of error.
Creating a FDE procedure
As mentioned above in the "Creating Common Information Entry" section, for each procedure you need to create a frame descriptor, which is happening in several stages:
  • creating a new FDE (see Create Common Information Entry)
  • attachment created by FDE to the general list
  • adding instructions to the created FDE
You can create a new FDE function dwarf_new_fde:
DWARF_P_FDE DWARF_NEW_FDE (DWARF_P_DEBUG DBG, DWARF_ERROR * ERROR)
The function will return the new FDE or DW_DLV_BADADDR descriptor in case of error.
Attach the new FDE to the list using DWARF_ADD_FRAME_FDE:
DWARF_UNSIGNED DWARF_ADD_FRAME_FDE (DWARF_P_DEBUG DBG, DWARF_P_FDE FDE, DWARF_P_DIE DIE, DWARF_UNSIGNED CIE, DWARF_ADDR VIRT_ADDR, DWARF_UNSIGNED CODE_LEN, DWARF_UNSIGNED SYM_IDX, DWARF_ERROR * ERROR)
  • fDE - just received descriptor
  • dIE - DIE Procedures (see Creating an attribute procedure node)
  • cIE - CIE descriptor (see Create Common Information Entry)
  • virt_addr - the initial address of our procedure
  • code_Len - procedure length in bytes
The function will return DW_DLV_NOCOUNT in case of error.
After all this, you can add instructions dw_cfa_hxxx to our FDE. It is done by DWARF_ADD_FDE_INST and DWARF_FDE_CFA_OFFSET functions. The first adds the specified instruction to the list:
DWARF_P_FDE DWARF_ADD_FDE_INST (DWARF_P_FDE FDE, DWARF_SMALL OP, DWARF_UNSIGNED VAL1, DWARF_UNSIGNED VAL2, DWARF_ERROR * ERROR)
  • op - instruction code (DW_CFA_XXXX)
  • val1, Val2 - Parameters of the instruction (various for each instruction, see Standard, section 6.4.2 Call Frame Instructions)
The DWARF_FDE_CFA_OFFSET function adds the DW_CFA_OFFSET instruction:
DWARF_P_FDE DWARF_FDE_CFA_OFFSET (DWARF_P_FDE FDE, DWARF_UNSIGNED REG, DWARF_SIGNED OFFSET, DWARF_ERROR * ERROR)
  • fDE - Descriptor created FDE
  • rEG - register that is written in the frame
  • offset - its displacement in the frame (not in bytes, and in the frame elements, see Creating Common Information Entry, Data_ALIGN)
For example, the compiler creates a procedure, in the prologue of which the LR register (R14) is stored in the stack frame. The first thing you need to add the DW_CFA_ADVANCE_LOC instruction with the first parameter equal to 1, which means promoting the PC register to 2 bytes (see Creating Common Information Entry, Code_Align), then add DW_CFA_DEF_CFA_OFFSET with parameter 4 (setting the data offset in the frame to 4 bytes) and call The function dwarf_fde_cfa_offset with the parameter REG \u003d 14 OFFSET \u003d 1, which means the record of the register R14 in a frame with a -4 byte offset from CFA.
Creating procedure parameters
Creating parameters of the procedure similarly to the creation of conventional variables, see "Creating variables and constants"
Creating information about numbers lines
Creating this information is happening like this:
  • at the beginning of the procedure, start the instructions block with the DWARF_LNE_SET_ADDRESS function
  • for each line of code (or machine instruction), create information about the source code (DWARF_ADD_LINE_ENTRY)
  • at the end of the procedure, we complete the instructions block with the DWARF_LNE_END_SEQUENCE function
The dwarf_lne_set_address function sets the address by which the instructions block begins:
DWARF_UNSIGNED DWARF_LNE_SET_ADDRESS (DWARF_P_DEBUG DBG, DWARF_ADDR OFFS, DWARF_UNSIGNED SYMIDX, DWARF_ERROR * ERROR)
  • oFFS - address of the procedure (address of the first machine instruction)
  • sym_idx - symbol index (optional, you can specify 0)

The dwarf_add_line_entry_b function adds information about the line of source code to the .Debug_line section. I call this feature for each machine instruction:
Dwarf_Unsigned dwarf_add_line_entry_b (Dwarf_P_Debug dbg, Dwarf_Unsigned file_index, Dwarf_Addr code_offset, Dwarf_Unsigned lineno, Dwarf_Signed column_number, Dwarf_Bool is_source_stmt_begin, Dwarf_Bool is_basic_block_begin, Dwarf_Bool is_epilogue_begin, Dwarf_Bool is_prologue_end, Dwarf_Unsigned isa, Dwarf_Unsigned discriminator, Dwarf_Error * error)
  • file_INDEX - the index of the source code file obtained earlier by the dwarf_add_file_decl function (see "Creating Procedures")
  • code_Offset - address of the current machine instruction
  • lineno - Row number in the source code file
  • column_number - speaker number in the source code file
  • is_source_stmt_begin - 1 If the current manual is the first in the code in the Lineno line (I always use 1)
  • iS_BASIC_BLOCK_BEGIN - 1 If the current instruction is the first in the operator block (I always use 0)
  • is_epilogue_begin - 1 If the current instruction is the first in the epilogue of the procedure (not necessarily, I always have 0)
  • iS_Prologue_END - 1 If the current instruction is the latest in the procedure prologue (required!)
  • iSA - INSTRUCTION SET ARCHITECTURE (command set architecture). Be sure to specify DW_ISA_ARM_THUMB for ARM Cortex M3!
  • discriminator. One position (file, string, column) of the source code can be represented by different machine instructions. In this case, for the sets of such instructions, you need to install different discriminators. If there are no such cases, there must be 0
The function returns 0 (success) or DW_DLV_NOCOUNT (error).
Finally, the DWARF_LNE_END_SEQUENCE function completes the procedure:
DWARF_UNSIGNED DWARF_LNE_END_SEQUENCE (DWARF_P_DEBUG DBG, DWARF_ADDR ADDRESS; DWARF_ERROR * ERROR)
  • aDDRESS - address of the current machine instruction
Returns 0 (success) or DW_DLV_NOCOUNT (error).
This is completing the creation of the procedure.
Creating variables and constants
In general, variables are quite simple. They have a name, a memory area (or processor register), where their data is also located as well as the type of these data. If the global variable - its parents should be a compilation unit, if the local is the corresponding node (it applies to the parameters of the procedures, they must have a parent itself). You can also specify in which file, the row and column is the declaration of a variable.
In the simplest case, the variable value is at a certain fixed address, but many variables are dynamically created when entering the procedure on the stack or register, sometimes the calculation of the address of the value can be very nontrivial. The standard provides a description of the description of where the value of the variable is the address expressions (Location Expressions). Address Expression is a set of instructions (DW_OP_XXXX) for a form-like stack, in fact this is a separate language with branching, procedures and arithmetic operations. We will not overcome this language completely, we will actually be interested in only a few instructions:
  • Dw_op_addr - indicates the address of the variable
  • Dw_op_fbreg - indicates the displacement of the variable from the base register (usually the stack pointer)
  • DW_OP_REG0 ... DW_OP_REG31 - indicates that the variable is stored in the corresponding register
In order to create an address expression, you must first create an empty expression (dwarf_new_expr), add instructions to it (dwarf_add_expr_addr, dwarf_add_expr_gen, etc.) and add it to the node as the value of the DW_AT_Location_EXPRESSION attribute (dwarf_add_at_location_expression).
The function of creating an empty address expression returns its descriptor or 0 in case of error:
DWARF_EXPR DWARF_NEW_EXPR (DWARF_P_DEBUG DBG, DWARF_ERROR * ERROR)
To add instructions to the expression, you need to use the DWARF_ADD_EXPR_GEN function:
DWARF_UNSIGNED DWARF_ADD_EXPR_GEN (DWARF_P_EXPR EXPR, DWARF_SMALL OPCODE, DWARF_UNSIGNED VAL1, DWARF_UNSIGNED VAL2, DWARF_ERROR * ERROR)
  • oPCode - Operation Code, DW_OP_KHXX CONSTANT
  • val1, Val2 - instruction parameters (see Standard)

To explicitly task the address of the variable, the DWARF_ADD_EXPR_ADDR function should be used instead of the previous one:
DWARF_UNSIGNED DWARF_ADD_EXPR_ADDR (DWARF_P_EXPR EXPR, DWARF_UNSIGNED ADDRESS, DWARF_SIGNED SYM_INDEX, DWARF_ERROR * ERROR)
  • eXPR - a targeted expression descriptor in which the instruction is added
  • address - address of the variable
  • sym_INDEX - symbol index in table.symtab. Optional, you can pass 0
The function also returns DW_DLV_NOCOUNT in case of error.
Finally, add the created address expression to the node can be function dwarf_add_at_location_expr:
DWARF_P_ATTRIBUTE DWARF_ADD_AT_LOCATION_EXPR (DWARF_P_DEBUG DBG, DWARF_P_DIE OWNERDIE, DWARF_HALF ATTR, DWARF_P_EXPR LOC_EXPR, DWARF_ERROR * ERROR)
  • ownerDie - node to which an expression is added
  • aTTR - attribute (in our case dw_at_location)
  • lOC_EXPR - Descriptor of the previously created address expression
The function returns an attribute descriptor or DW_DLV_NOCOUNT in case of error.
Variables (as well as procedures) and constants are conventional nodes with DW_TAG_VARIABLE, DW_TAG_FORMAL_PARAMETER and DW_TAG_CONST_TYPE, respectively. For them you need such attributes:
  • variable / constant name (dwarf_add_at_name function, see "Creating an attribute of a node")
  • the line number in the file where the variable is declared (attribute dw_at_decl_line), the function dwarf_add_at_unsigned_const (see "Creating a node attributes")
  • file name index (attribute DW_AT_DECL_FILE), DWARF_ADD_AT_UNSIGNED_CONST (see "Creating a node attributes")
  • data type variable / constant (attribute DW_AT_TYPE - Reference to the previously created type, see "Creating data types")
  • address expression (see above) - you need to variable or parameter procedure
  • or value - for a constant (attribute dw_at_const_value, see "Creating an attribute node")
Creating sections with debugging information
After the creation of all nodes of the debug information tree, you can proceed to the formation of ELF sections with it. This happens in two stages:
  • first you need to call the DWARF_TRANSFORM_TO_DISK_FORM function, which will call the call the written function to create the desired ELF sections once for each section
  • for each section, the DWARF_GET_SECTION_BYTES function will return to us the data you want to write to the appropriate section
Function
DWARF_TRANSFORM_TO_DISK_FORM (DWARF_P_DEBUG DBG, DWARF_ERROR * ERROR)
Translates the debug information created by us into a binary format, but nothing writes to the disk. It will return us the number of created ELF sections or DW_DLV_NOCOUNT in case of error. At the same time, for each section, a callback function is called, which we passed when the library is initialized into the dwarf_producer_init_c function. This feature should write we ourselves. Its specification is this:
TypeDeF int (* dwarf_callback_func_c) (char * name, int size, dwarf_unsigned type, dwarf_unsigned flags, dwarf_unsigned link, dwarf_unsigned info, dwarf_unsigned * sect_name_index, void * user_data, int * error)
  • name - the name of the ELF section you want to create
  • size - section size
  • type - Section Type
  • flags - Section Flags
  • link - Section Communication Field
  • info - Section Information field
  • sECT_NAME_INDEX - You need to return the section index with remocions (not required)
  • user_data - is transmitted to us the same as we set it in the library initialization function
  • error - You can transfer the error code here.
In this function we must:
  • create a new section (ELF_NEWSCN function, see Creating Sections)
  • create a section title (ELF32_GETSHDR function, Ibid)
  • completely fill it (see ibid.) It is simply, since the section header fields correspond to the parameters of our function. The missing fields sh_addr, sh_offset, sh_entsize install in 0, and sh_addralign in 1
  • return the index of the created section (ELF_NDXSCN function, see "Section.symtab") or -1 in case of error (installing error code in Error)
  • we also have to skip the ".rel" section (in our case), returning 0 when returning from the function
After completing the DWARF_TRANSFORM_TO_DISK_FORM function will return to us the number of sections created. We will need to walk in the cycle from 0 for each section, after following steps:
  • create data to write to the DWARF_GET_SECTION_BYTES function section:
    DWARF_PTR DWARF_GET_SECTION_BYETES (DWARF_P_DEBUG DBG, DWARF_SIGNED DWARF_SECTION, DWARF_SIGNED * ELF_SECTION_INDEX, DWARF_UNSIGNED * LENGTH, DWARF_ERROR * ERROR)
    • dWARF_SECTION - Section number. Must be in the range of 0..n, where N is the number returned to us by the DWARF_TRANSFORM_TO_DISK_FORM function
    • eLF_SECTION_INDEX - Returns the section index to which you want to record data
    • length - the length of this data
    • eRROR - not used
    The function returns a pointer to the obtained data or 0 (in the event that
    When sections for creating no longer left)
  • create a data descriptor of the current section (ELF_NEWDATA function, see Creating sections) and fill it (see ibid.) By setting:
    • d_BUF - pointer to the data obtained by us from the previous function
    • d_SIZE - the size of this data (ibid)
End of work with the library
After generating sections, you can shut down with the LIBDWARF function dwarf_producer_finish:
DWARF_UNSIGNED DWARF_PRODUCER_FINISH (DWARF_P_DEBUG DBG, DWARF_ERROR * ERROR)
The function returns DW_DLV_NOCOUNT in case of error.
I note that the recording on the disk at this stage is not produced. Recording must be done by means of functions from the "Create ELF - file entry" section.

Conclusion

That's all.
I repeat, the creation of debugging information is very extensive, and many of them did not touch the topic, only the repaid vest. Those who wish can deepen indefinitely.
If you have any questions, I will try to answer them.

If you have on a computer installed antivirus program can scan all files on your computer, as well as each file separately. You can scan any file by right-clicking on the file and selecting the appropriate option to check the file for the presence of viruses.

For example, in this picture allocated file My-File.elfthen you need to right-click on this file, and select the option in the File menu "Scan with AVG". When this parameter is selected, AVG Antivirus will open, which will check this file for viruses.


Sometimes an error may occur as a result invalid software installationWhat can be related to the problem that occurred during the installation process. It may interfere with your operating system. tie your ELF file with proper application softwarehaving an impact on the so-called "Association of File Extensions".

Sometimes simple reinstalling Dolphin (Emulator) Can solve your problem correctly linking ELF with Dolphin (Emulator). In other cases, problems with file associations may result from bad Software Programming Developer, and you may be required to contact the developer for more accommodation.


Tip: Try updating the Dolphin (Emulator) to the latest version to make sure that the latest fixes and updates are installed.


It may seem too obvious, but often directly the ELF file itself may be the cause of the problem.. If you received a file through an email attachment or downloaded it from a website, and the download process was interrupted (for example, a power outage or for another reason), the file may be damaged. If possible, try to get a new copy of the ELF file and try to open it again.


Caution: A damaged file may entail the occurrence of accompanying damage to the previous or already existing malware on your PC, so it is very important that the updated antivirus constantly worked on your computer.


If your ELF file associated with hardware on your computerTo open the file you may need update device driversassociated with this equipment.

This problem Usually related to the types of multimedia filesthat depend on the successful opening of hardware inside the computer, for example, sound card or video card. For example, if you are trying to open an audio file, but you can't open it, you may need update sound card drivers.


Tip: If you try to open the ELF file, you get error message associated with S.Sys File, the problem is likely to be associated with damaged or outdated device driversthat need to be updated. This process can be facilitated by using software to update drivers, such as DriverDoc.


If the steps did not solve the problemand you still have problems with the opening of ELF files, it may be related to lack of available system resources. For some versions of ELF files, considerable amount of resources may be required (for example, memory / RAM, computing power) for proper opening on your computer. Such a problem occurs quite often if you use enough old computer hardware and at the same time a much newer operating system.

This problem may occur when a computer is difficult to cope with the task, since the operating system (and other services operating in the background) can consume too much resources to open the ELF file.. Try to close all applications on your PC before opening Nintendo Wii Game File. After freeing all the available resources on your computer, you will provide the best conditions for attempting to open the ELF file.


If you performed all the steps described above, and your ELF file still does not open, it may be necessary to perform equipment update. In most cases, even when using old versions of equipment, computing power can still be more than sufficient for most user applications (if you do not perform a lot of resource-intensive processor operation, such as 3D rendering, financial / scientific modeling or intensive multimedia work) . In this way, it is likely that your computer lacks the required amount of memory.(more often called "RAM", or RAM) to perform the task of opening a file.