Someone came to me today and asked if I could track down a 403 error that was handed to a user the morning of 4/13/2010. The web server log file is 465 megs. Gulp. sed and grep to the rescue!
Just pick a random number (representing a wild guess at where, in the log file, your starting point may be) and use sed -n 2000p where 2000 is your random number. My original guess was way off. The time period in question ended up spanning lines 500,000 to 680,000 in the server log. Great, just read 180,000 lines, right? WRONG! Use grep. Here’s the final string of commands I used.
# just to locate the correct line... sed -n 2000p site-access_log sed -n 12000p site-access_log sed -n 100000p site-access_log sed -n 300000p site-access_log sed -n 500000p site-access_log sed -n 680000p site-access_log # now to search that time period for a 403 error sed -n 500000,685000p site-access_log | grep ' 403'
Not perfect, but good enough to get the job done. sed whittled the 465 meg file down to 180,000 lines, and grep whittled that down to 50 or so lines, some of which were relevant, others not.