In this post we’re going to look at a fairly simple collection of things to create a log collection system in Microsoft Azure. The motivation behind this is to avoid building anything from scratch and use existing software and Azure Platform as a Service (PaaS) services as far as possible to create a log collection system using Azure to store and index the logs. We’re going to try and minimise our hardware footprint to a single Raspberry Pi.

Why?

I’ve worked for a cyber security monitoring provider and a key underpinning tenet is the centralisation and management of logs. I’m interested to capture the logs generated by the various things around my home (NAS media center, router, print server, etc), just because. Maybe at some point in the future we’ll do some fun stuff with the data, but for now I just want to start by capturing it.

What do you create?

The idea is this:

  • Use rsyslog (default syslog daemon for Debian Linux distributions) to create a syslog server. Configure other devices in the network to send their logs to this server. A Raspberry Pi is good for this.
  • Write a NodeJS script to act as a plugin for rsyslog to send data to Azure Event Hub (we’ll try a Python version too and find it horribly slow)
  • Azure Event Hub acts as a queuing system, soaking up all the logs
  • More Javascript runs as an Azure Function, for serverless event processing and persists the messages in Azure Blob Storage. We might come back and look at something more interesting, like Azure Cosmos DB in a subsequent post.

At the end of this you can have something simple like a Raspberry Pi as a local log server, relaying logs from your local network, to the cloud. Given all the data is in Azure, there are no local log centralisation or database servers clogging up your cupboards and annoying your significant other.

Word of warning: at a small scale it isn’t very cost effective. It looks like Event Hub and Cosmos DB charge part of their fee by the hour and part of their fee by usage (events processed, searches run).

Azure preamble

Before starting the Azure section of this how-to, create a Resource Group to house your Azure resources. This is just for the convenience of finding everything again later.

I won’t specifically mention the Location or region within which you will deploy each of the elements. I will use UK South throughout, but I will leave it as an exercise for the reader to select their own region and ensure it is applied consistently throughout for the resources you create.

Azure Event Hub Setup

Setting up the Event Hub itself is pretty easy. In the Azure Portal search for Event Hubs and click Create. You will be greeted by a Create Namespace page a bit like this:

Create Namespace

To keep costs low, set the Pricing tier to Basic and leave Enable auto-inflate? unchecked. Set a name, assign it to your new Resource Group and click Create.

This just creates a namespace within which your actual Event Hubs live. If you are familiar with Apache Kafka, think of the namespace as a Kafka cluster and an individual Event Hub as a Kafka topic.

Note that we having selected the Basic tier, there is only a single Consumer Group, so our Event Hub will behave as a queue rather than a topic.

Once the Event Hub is deployed, open the Overview page and at the top click on the + Event Hub button to create a new Event Hub.

Create Namespace

In the Basic tier there isn’t much you can do except change the partition count. A home network doesn’t generate much in terms of log volumes, so leaving the default of 2 partitions is fine.

Give the Event Hub a name and click Create.

The last thing we need to do is to create a couple of Shared Access Policies for rsyslog to use to publish messages and a Function App to receive them.

Click on Shared access policies and by default you will find a single RootManageSharedAccessKey. We are not going to use this as this has total administrative access to the Event Hub and we don’t need that just to publish a few messages. It is much better, from a security standpoint, to create separate access keys for the applications you want to publish data to your Event Hubs.

Create SAS policy

Click Add and give it a name. I’ve just gone for Publisher in the example above. The only permission required is Send.

Once the new SAS policy is created, click on it again and the Access Keys are displayed.

SAS policy

The above diagram shows the NodeJS script we’ll use later next to the shared access policy to show you how to edit the script to populate the key name and values. We will get onto this later, but I’ve left the image here for reference.

Before we leave Event Hubs, create another SAS policy for the Azure Function Apps. Call it FunctionApp and give it the permission Listen.

Azure Function Setup

We are now going to use an Azure Function App to do some very basic syslog parsing before passing the messages off to Blob Storage for persistence. This is mostly a useful placeholder for part 2, where we’ll look at other storage mechanisms and processing engines.

Function App has some handy templates that we’re just going to go ahead and use.

Search for Function App in the Azure Portal and click Create.

Function App

In the above image I create a new Resource Group. This is unnecessary and you can add the Function App to the same Resource Group as used earlier.

Give the App a name and click Create.

In the following page click the big blue + (circled in red in the below image), followed by Custom function (also circled).

Create Function

You are then presented with an array of templates to start from. For this example I change the Language filter to JavaScript and select the EventHubTrigger - JavaScript example.

Function

Click new and a dialog box pops up to assist with selecting the Event Hub namespace, hub name and shared access policy. Ensure the values are correct and that the FunctionApp policy is selected.

Lastly, ensure the Event Hub name field is set to the Event hub name (not the namespace, the same as <event hub name> in the rsyslog script from earlier) and click Create. The consumer group can be left as $Default.

Basic Function App

A basic function is then created for you and clicking on it, you will see a screen much like the above. Nothing much interesting happens, except that for every log message received, it is printed out to the log-streaming service console.

By expanding the Logs and Test panes of the screen, you can send test messages and see them being printed at the console.

Output to Blob

We are going to make two very small, quick edits and a few config changes so that log messages are output to Blob Storage.

1
2
3
4
5
6
7
8
9
module.exports = function (context, eventHubMessages) {
    context.log(`JavaScript eventhub trigger function called for message array ${eventHubMessages}`);

    eventHubMessages.forEach(message => {
        context.log(`Processed message ${message}`);
    });

    context.done();
};

becomes

1
2
3
4
5
6
7
module.exports = function (context, eventHubMessage) {
    context.log(`JavaScript eventhub trigger function called for message ${eventHubMessage}`);

    context.log(`Processed message ${eventHubMessage}`);
    context.bindings.outputBlob = eventHubMessage;
    context.done();
};

Note that eventHubMessages became eventHubMessage and that we’ve removed the for-loop. We have also added something about context.bindings.outputBlob. More on this in a moment.

In the left-hand panel, underneath the Function is a menu item called Integrate. Click that. By default it’ll open on the Triggers section. Notice a field called event hub cardinality? Change that from many to one.

What we’ve done here is change the behaviour of the Function so that it is called individually for each message, rather than being called with a batch of messages for processing. Sometimes some of the outbound integrations (Cosmos DB as an example) don’t give you a mechanism to output a batch of messages, so you have to fall back to processing individual messages one at a time.

To finish off, near the top-right of the Integrate screen, click New Output and select Azure Blob Storage.

Ensure that:

  • Blob parameter name is outputBlob - this is to match context.bindings.outputBlob from earlier and is how the Function outputs data to this specific integration.
  • Click new next to Storage account connection and select a Storage account within which to store the log records.
  • Path is set to something sensible like homelogs/{rand-guid} - this is the container, within the Blob Storage Account, that the log records will be stored as individual objects.
    • You will need to go to the Storage Account, go into the Blob section and create a Container named homelogs for this to work.

To validate this works, you should now be able to go back to the function and:

  • in the Test panel, send a message to the Function
  • in the Logs panel, observe the message being printed back out to the console and the function completing successfully
  • in the Blob part of the Storage Account that we are outputting to, see that a new blob object has been created. If you were to download it and view its contents, it would be the test message you had sent.

Whilst developing, it’s often the case that you don’t want to be outputting lots of test messages to your Blob storage. In that case, just comment out the context.bindings.outputBlob = eventHubMessage; line in your Function App and the only output will be to the Logs console.

Now, onward to the Raspberry Pi and sending logs from our home network.

NodeJS script and omprog

omprog (documentation) is how rsyslog enables the development of custom log output processors. In this instance we want to push logs to an Azure Event Hub because that describes itself as a “highly scalable data streaming platform”.

In our little proof of concept we’re not going to be stressing Event Hub that hard, it’s just nice to know that it’s plausible that if we wanted to, we could push a lot of logs through it. Enough to maybe cope with a few handful of devices pushing a few hundred or maybe a few thousand log records per second.

I did try Python first as it’s probably my most-familiar language however it was too slow and the interpreter caused some strange behaviour under test. There is a Python ServiceBusService library provided, but by using a basic test script to generate a lot of logs and the Linux time command, Python was returning very early but in reality was buffering and actually flushing the messages much later. As such the total time needed to see all the events coming through the function App was actually longer than the NodeJS equivalent.

NodeJS seemingly took longer for the test logging script to return, but it was actually faster overall. By watching the event flow in the Logs panel, it was obvious that it took less time for all of the events to appear in the Function App logging. NodeJS doesn’t have an equivalent of the ServiceBusService library, byt the amqp10 works just fine.

I imagine the observed differences are more to do with the differences between NodeJS and Python, than it is the protocol.

Rsyslog / Raspberry Pi Setup

There isn’t really much to this. Rsyslog comes as the standard syslog daemon for the default Raspbian operating system.

Assuming that you have managed to setup a Raspberry Pi with Raspbian, all that remains is to permit receiving events on TCP/514 and setup the NodeJS script as per the section below.

In most standard rsyslog.conf files, the settings are already present, but commented out. They may be either:

module(load="imtcp")
input(type="imtcp" port="514")

or

$ModLoad imtcp
$InputTCPServerRun 514

Uncomment whichever of those is present and onwards to configuring the omprog script.

Setting up the NodeJS script

Most of the following is already explained in the README.MD in my azure-loghandling project on GitHub.

Download a copy of the az_eventhub.js script and look for the following section near the top. These need editing with the details of your Event Hub and the Publisher SAS policy created earlier.

var serviceBusHost = '<servicebus name here>' + '.servicebus.windows.net';
var sasName = "<key name>";
var sasKey = "<key>";
var eventHubName = "<event hub name>";

Replace:

  • <servicebus name here> with the Event Hub namespace name (I have been using homelog in my example so far)
  • <key name> with the name of the SAS policy (Publisher was used in my example)
  • <key> with the primary or secondary key value
  • <event hub name> with the name of the Event Hub created earlier.

Ensure that the az_eventhub.js script is only readable and executable by root (the user that rsyslog typically runs as).

Install the rsyslog configuration file 41-az_event_hub.conf into /etc/rsyslog.d, edit it to point to the correct path of the az_eventhub.js script and restart rsyslog.

Testing

The rsyslog configuration applied above inherits the default, which is to send messages at INFO or greater priority. We can stimulate such messages with the following command on our Raspberry Pi

logger -p user.info this is a test

In the Logs panel of the Function App, if all is well we should start seeing messages arrive. If you run commands such as sudo on the Raspberry Pi (which generate audit log messages) you will likely see messages arriving at the Function App.

If you are seeing problems with messages not arriving, try the following.

  • rsyslog keeps an instance of the script running and sends messages to its stdin. You should see an instance of the script running if you use ps -ef | grep node. If not, try running the script manually to see if there’s an issue with dependencies or the script
  • Running the script manually is a useful way of seeing if it works. Try the following:
    • cat /var/log/messages | node az_eventhub.js
    • This sends the contents of /var/log/messages into stdin of the NodeJS script. The first few messages may be missed as the script connects to the Event Hub, but afterwards the messages should begin to flow.

To-do in part 2

In part 2 we’ll look at doing something more interesting with logs than just put them in Blob Storage. We may also look into Azure Queue Service as a less-expensive alternative to Event Hub and at mechanisms to search and visualise the log data you are collecting.

Clean up

Event Hubs are not inexpensive. It is worthwhile removing the Hub whilst it is not in use and just rewiring it back up to the Function App when you want to use it.

Secondly, Function Apps can also be stopped so that they do not accidentally rack up cost.

Things used

Rsyslog http://www.rsyslog.com/
NodeJS https://nodejs.org/
Azure Event Hub https://azure.microsoft.com/en-us/services/event-hubs/
Azure Function Apps https://azure.microsoft.com/en-us/services/functions/
GitHub https://github.com/jdcockrill/azure-loghandling