How to Quick Load the Apache Log File into PostgreSQL:

10:46 am

from this link:

Quick Load the Apache Log File into PostgreSQL:

Create a temporary session table called tmp_apache, consisting of one column named record with the text datatype:

CREATE TEMPORARY TABLE tmp_apache(record TEXT);

Note: Remember, this table will disappear as soon as you log out of your session because it’s a temporary table.

Copy into the table tmp_apache the data from the log file directly:

\copy tmp_apache from ‘my_apachelog.txt’

You can review the first record of the log with the SQL command:

SELECT record FROM tmp_apache LIMIT 1;

Parsing the Log File

Create a table called http_log. Give it three columns, ip_addr, the_date, and record:

CREATE TABLE http_log(log_date DATE,ip_addr CIDR,record TEXT);

Suppose you want to explain the above statement to another person. You can restate it as:

CREATE a TABLE called http_log with the columns log_date of data type DATE, ip_addr of data type CIDR and record of data type TEXT.

The next step is to extract the IP address and timestamp for each record:

INSERT INTO http_log(log_date,ip_addr,record)
     SELECT CAST(substr(record,strpos(record,’[')+1,20) AS date),
            CAST(substr(record,0,strpos(record,’ ‘)) AS cidr),
            record
FROM tmp_apache;

There are two parts to this statement. The first part is the inner SELECT query, and the second portion is the outer INSERT statement.

The SELECT statement returns three columns of information that originates from the single column in tmp_apache. The query identifies both the IP address and the date stamp, which it then parses. Because the functions used can parse only text strings, the query must cast the results appropriately to insert them into a column of another data type.

There are two PostgreSQL functions used in this SELECT query: substr() and strpos().

The function strpos() returns the integer value (the location) of a particular character in a text string. Here, it identifies the date stamp by searching the record for the first instance of a square bracket ([). Refer to the HTTP log; this occurs at the beginning of the date.

The second function, substr(), returns a substring from a longer text string. It takes three parameters: the text string to search, the substring start point, and the substring end point. The date stamp is always 20 characters long in the example log file.

The same parsing process repeats for the IP address. The IP address is the first piece of data at the beginning of the record. The functions therefore identify the location of the leftmost white space in the record.

The INSERT INTO statement takes the results from the SELECT query and adds them directly into the three columns of the http_log table.

The following psql commands output the results into a text file named results.txt, then launch vim on it by using the bang (!) command:

\o results.txt
SELECT * FROM http_log;
\o
\!vim results.txt

Bookmark and Share
Databases
Related posts:
blog comments powered by Disqus