Command Line Tricks: How to count lines with date before a certain date?

Posted on Fri 27 February 2015 in blog • 1 min read

Assume I have a file with downloaded articles with a similar structure as follows:

head -2 file-with-dates.csv
title,url,tag,date
"title of the article",http://www.url-of-the-article.com,"tag1,etc",2015-02-27

How could I calculate the number of articles posted before a certain date without having to load the data into python, R, or any other software? I think that some solution with awk would definitely work but I always found that language hard to learn. There is an easy solution which combines sorting with grep instead of trying to count the dates directly. By writing out total number of lines in the data I can also get the share of the articles before the certain date.

(cat file-with-dates.csv; echo certainDate) | sort | grep -n 'certainDate'
wc file-with-dates.csv -l