Command Line Tricks: How to count lines with date before a certain date?
Posted on Fri 27 February 2015 in blog
Assume I have a file with downloaded articles with a similar structure as follows:
head -2 file-with-dates.csv
title,url,tag,date
"title of the article",http://www.url-of-the-article.com,"tag1,etc",2015-02-27
How could I calculate the number of articles posted before a certain date
without having to load the data into python, R, or any other software? I think
that some solution with awk
would definitely work but I always found that
language hard to learn. There is an easy solution which combines sorting with
grep
instead of trying to count the dates directly. By writing out total
number of lines in the data I can also get the share of the articles before the
certain date.
(cat file-with-dates.csv; echo certainDate) | sort | grep -n 'certainDate'
wc file-with-dates.csv -l