How to: Parse Android Logs for Analytics and Machine Learning Applications
Introduction: What are Logs?
Building Android based apps, or any software for that matter, will eventually end up in understanding why a bug is occurring. Bugs are just a natural part of software development. A key tool in understanding the state of your software at the time an issue happens are logs. Think of logs as a ledger for what is happening when the code is running. Engineers can print almost anything to the logs that might help them understand problems that pop up in the future.
Given that logs are often structured, contain a ton of useful data, easy to acquire, and key to development software logs are ripe for sophisticated analysis and maybe even applying machine learning to them. There are lots of tools for log analytics like: Scalyr, Logz.io, Sematext, GrayLog, Nagios, and many others (https://opensource.com/article/19/4/log-analysis-tools). In many cases, utilizing an open-source, pre-built, will work in a pinch and be pretty reliable when a mission critical bug plagues the backlog. However, it might be useful to have a way of creating your own customized solution.
Android LogCat Logs:
The structure of the Android Logs are as follows:
The main files that can be analyzed are the radio, main, event, and system logs. Each log file contains different characteristics about the system at any given time.
Each message in the log consists of the following elements:
A tag indicating the part of the system or application that the message came from
A timestamp (at what time this message came)
The message log level (or priority of the event represented by the message)
The log message itself( detail description of error or exception or information)
There are a few different log types:
Application log -
Utilize the android.util.Log class methods to write messages of different priority to the log file
Java classes declare their tag statically as a string and can be many layers deep
System log -
Utilize the android.util.Slog class
Many frameworks use the system logs to separate certain messages from a potentially messy application log
Event log -
Event logs messages are created using android.util.EventLog class
Log entries consist of binary tags and they are followed by binary parameters
The message tag codes are stored on the system at: /system/etc/event-log-tags
Radio log
Used for radio and phone(modem) related information
Log entries consist of binary tags code and message for Network info
Android Log Structure:
tv_sec tv_nsec priority pid tid tag messageLen Message
tag: log tag
tv_sec & tv_nsec: the timestamp of the log messages
In the logs we are going to parse the date and timestamp (down to the milliseconds)
pid: process Id
tid: thread id
Priority value is one of the following character values:
V: Verbose (lowest priority)*
D: Debug*
I: Info*
W: Warning*
E: Error*
F: Fatal*
S: Silent (highest priority, on which nothing is ever printed)
Code for Parsing:
The parsing of the files is fairly straightforward—especially because the text files are delimited by simple whitespace.
After the import of key libraries, then you will check the working directory and assign it as a variable. This will all be done to allow for the script to be placed in the directory of the log files:
The cwd should be within the folder where the log files are located. We’ll define a function to be used later that will programmatically level out the arrays. Then, we get to work decompressing the log files so everything ends up as a text file:
Next lines do the following:
need to get a list of all the main.log files into a list
need to loop through the list
read / parse each file
append each parsed line to the appropriate empty list
strip out some of the files from the list of files we are going to loop over and read
After we have written our parsed files to the lists we need to combine the messages and tags together since we split by whitespace. This next little piece of code will recombine tags and texts to a human readable string:
Next lines of code will assess the length of each list. In order for a dictionary of lists to be transformed into a pandas dataframe, each of the lists must be the same length.
The following code finalizes the processing of the main log:
Combine the lists into a dictionary
Call the function that pads the lists and evens them out
Create the dataframe for the main log
For the remainder of this post, we will process the remainder of the log files, combine them together, and cleaned for a bit of analysis:
This code should help you get started! In a follow up piece, we’ll go over some basic analytics, cleaning, and applications.