Interview With Siem Korteweg on System Configuration Collector

In this interview we learn how the System Configuration Collector (SCC) project began, how the software works, why Siem chose to make it open source, and information on future developments.

Introduction:

Have you ever noticed changes on your departmental server, but couldn't quite pinpoint what exactly happened? How many times have staff forgotten to make an entry in the log-book, or the entries made were not detailed enough? Administrators are faced with these problems on a day-by-day basis. The System Configuration Collector (SCC) project attempts to automate this process. Rather than depending on staff to keep accurate records, SCC enables a system to record all changes taking place. Additionally, the software has the functionality to send all configuration data to a central server so that it can be analyzed when needed.

System Configuration Collector Project Website:

LinuxSecurity.com: Please tell us about the SCC project and how it began. When did it start, and who are some of the key contributors?

Siem Korteweg: In 2001 a younger colleague asked whether it was possible to automatically track the changes that were made to the configuration of a system. I told him that was impossible due to variable nature of the output of the commands we have to use to show the configuration of a system. Being a much younger colleague he accepted this answer. But I did not like to say it was "impossible" and it kept nagging me.

I thought that when I could split the variable and fixed parts of the output of system commands, I would be able to track changes. I started a small, hobby project by collecting configuration data and preceding each line with "fix:" or "var:". After some time I was able to detect some changes made to configuration. But when a kernel parameter was changed, all I saw was a change from 128 to 256. I had to search in the snapshot to find out what part of the configuration had changed. Therefore I extended the fix-var classification with a hierarchy of keywords indicating the nature of the data.

The development continued and the customer where I was developing the software, was wondering how to maintain this software without hiring me indefinitely. By that time I realized that this software also could/should be used by others. I talked to the manager of the customer and to the manager of the company I am working for and suggested to make SCC a GPL project. They both agreed and from then on, SCC was an Open Source project. To extend the collection of configuration data I looked at the code of cfg2html and check.sh (HP specific) and the FAQ's of several newsgroups. At the customer site where I started developing SCC, we deployed the software on some 300 systems. This gave us a great opportunity to tune the "fixed" and "variable" parts of the configuration to avoid unnecessary changes.

The first versions of the software collected configuration data and converted the data and logbook to HTML on a per system basis. At the customer site, Bram Lous started to collect all snapshots and logbooks on a server and built the first version of the CGI-interface. Later on, Paul te Vaanholt contributed much for the HP OpenView modules. His main contribution is the analysis and conversion to SCC-format of the Operations Center database. A colleague Oscar Meijer wrote the Windows version of the SCC-client, based on WMI and WSH. The configuration of the data we are collecting on Windows systems still needs to be tuned. The software itself is stable, but it detects too many changes. The whole process of tuning what data is "fixed" and what data is "variable" takes quiet some time.

LinuxSecurity.com: What is the most important benefit an administrator can get out of SCC? How can this improve the overall security of a network or host?

Siem Korteweg: Each administrator should document his/her systems. We all know that, but we all lack time to do this properly. SCC automates the documentation process. For HP-UX systems SCC collects more than 95% of the configuration of the system is covered by SCC. For other system the percentage is somewhat lower at the moment.

The logbooks and snapshots can assist administrators in finding the cause of an incident. Configuration changes can have unwanted side-effects (on other systems). By examining the logbooks for the changes during the last days/weeks an administrator might find the cause of an incident easier/faster. Another way of using the SCC-data to find the cause of an incident is to compare (parts of) the configuration of a system with a comparable system that does function correctly.

Comparing the configuration of systems can also be used to assure that the systems in a cluster are consistent and identical. Do they run the same (versions of) software? Do they have the same kernel-configuration? It is also possible to check your security policies. Just check the snapshots on the server for the aspects of the policies. By default the server checks and signals accounts without a password.

Another use of the SCC-data on the server is to quickly identify systems. After an advisory from Sun, I was able to identify within one minute the 100 systems that needed to be addressed out of a total of 600 systems. Because the selection was automated and because the collection of SCC-data was accurate and outdate, I did not miss a system. This obviously contributes to the safety of the network.

LinuxSecurity.com: How difficult is it to get started? How long would it take for an administrator to get the system fully setup? Can you describe at a high level the steps necessary to setup SCC?

Siem Korteweg: The easiest way to start and get the feeling of the software is to install only the client part and keep the data and logbook on the client. Just create a simple cron-job after the installation of the client and you are finished. This way you are able to pilot the software before you deploy it more widely.

The setup of the server takes some more steps. First you have to decide how to transport the SCC-data from the clients to the server. Supported mechanisms are email (optionally encrypted, using OpenSSL), scp, rcp and cp. Then setup the webserver to display the data. To achieve this, you have to indicate the path under the document-root and indicate the CGI-script of SCC. Then schedule a cron-job to transfer the SCC-data that is sent by the clients from the transfer-area to the website Finally all cronjobs of the clients have to be extended with the proper options to transfer the SCC-data to the scc-server.

For several systems I recorded the entire process of configuring the server in logbooks. These logbooks are present at the website. For our HP-UX 11.i system:

LinuxSecurity.com: What improvement would you like to make in the future? What direction is this project heading?

Siem Korteweg: When running SCC on a system that uses clustering software, like MC ServiceGuard from HP, switching a "package" from one system to another, results in changes of the SCC-data for both systems involved in switching. We want to make the software cluster-aware by extracting the configuration data for each package and sending it separately to the scc-server.

Another future extension is the collection of the configuration of network devices like routers and switches.

LinuxSecurity.com: What advantage does SCC have over using a typical pen & paper log book for recording system changes?

Siem Korteweg: It is automated, so it does not "forget" to record a change (supposing the changed attribute is part of the SCC-snapshot). It is not lazy (once you run it through cron). - The pen & paper logbook is a physical item that can only be at one place. Each admin of a group of systems can be at a different place, without access to the paper logbook. Suppose 7x24 systems, where the admins "follow the sun". - By consolidating all snapshots on a system with scc-srv, you obtain much data that can be searched automatically. This enables you to quickly identify the systems that need an update or to compare two systems when one of them does not function correctly. This is impossible with pen & paper.

LinuxSecurity.com: What operating systems does SCC run on? What type of license is it under?

Siem Korteweg: HP-UX, Solaris, AIX, Linux (RedHat, Suse, Gentoo). As the code of SCC only uses "standard" Unix tools, I think it runs on almost all Unix/Linux systems. The coverage of the configuration data depends on the OS. For example the coverage of HP-UX configuration is more than 90%. For other systems this will be less. The license is GPL.

LinuxSecurity.com: If an administrator needs assistance setting up or configuring SCC is support available? If so, how can support be obtained?

Siem Korteweg: Besides the documentation on our website, SCC comes with documentation and manual pages. We offer an implementation service, where a consultant visits a customer and installs the server and at most 5 clients and introduces the software to the admins of the customer. This is only feasible in the Netherlands. Otherwise, support via email is possible. When the requested support is more than a few simple questions, we have to agree upon payment.

LinuxSecurity.com: How does SCC differ from other similar configuration collectors? What are some of the strengths and weaknesses of SCC?

Siem Korteweg: SCC collects configuration data without formatting it immediately to HTML. Instead it prefixes each line of configuration data with fix/var and a hierarchical classification. This makes it easy to process the snapshots. The processing consists of comparing consecutive snapshots to generate the logbook, formatting the snapshot to HTML and comparing the snapshots of two systems to determine the differences.

The philosophy of SCC is to collect data, not to judge its value or correctness. Stupid configuration errors in Apache/Samba are not detected in scc, this should be done at the server where all snapshots are collected. Some might question the value of all the data in the snapshots. It is true that a considerable part of the snapshots will never change during the lifetime of a system. Nevertheless this data is collected, just in case someone needs it sometimes.

One commercial configuration collector works by allowing remote root-access to all clients from their server. This is not very security minded. I had security in mind when coding scc and scc-srv.

A weakness of SCC is that I coded the classifications of all collected configuration data. This classification has to be used when an admin wants to view specific information. I decided to store cron configuration data under classification "software:cron:" and swap info under classification "system:swap:". Each user of SCC has to follow my intuition.

Another weak point is that the clients are autonomous. The scc-srv can be DOSed by mailing much snapshots from seemingly different systems. Therefore, I suggest to install scc-srv only in a "trusted" network. Finally, scc has to do "reverse engineering" to collect for example the Apache configuration. Apache can be installed and configured in dozens of different locations. We have to determine the correct paths and files from the running processes.

LinuxSecurity.com: How can the project benefit from the open source community?

Siem Korteweg: The project can benefit from the open source community when admins use it and contribute their extensions. These extensions can be specific applications/hardware/OS they use or new features. At the moment some people already contribute knowledge of specific software. Feedback concerning the strong and weak aspects admins experience while they are using SCC, is also valuable.

Area's for future extensions are SAN/NAS and network devices. I am looking for people and organisations that are willing to contribute in any way in these areas.

LinuxSecurity.com: I wish to thank Siem, and other contributors to the System Configuration Collector project. We at LinuxSecurity.com would like to wish you the best of luck!