I've finished reading the Apache Hadoop tuturial (from Yahoo). I didn't do any of the exercises, but at least I have more than a passing familiarity with what Hadoop is all about and how it is well-positioned to cope with Big Data.
Now, I'm moving on to reading up on Apache Mahout. Mahout's goal is to build scalable machine learning libraries for recommendation mining, clustering, classification, frequent itemset mining, and similar purposes. There is actually a book on Mahout (Mahout in Action), but for now I'll focus on "mining" the Mahout wiki, which seems to have a lot of useful info which is likely sufficient for my immediate needs.
Mahout is implemented on top of Hadoop.
In the back of my head I'm thinking about entity extraction or named-entity extraction or named-entity recognition or NER as it is called. In theory, Mahout greatly facilitates NER.