Banning: The Second Challenge in Web Data Acquisition Project

The second most popular question people ask when they learn that I work in the web data collection / acquisition space is “how do you go about banning?”. Could you guess what the first question is? And no, it’s not related to the first challenge we looked at last week: Setup. One hint: if this…

I’m retired (for now) – Part 1: Mindset shifts

Hello friend, I’m reporting back 7 weeks into this mini retirement and I just want to say that I finally begun to feel more rested. I resigned mid May, interestingly one year since I published the “I can retire but I don’t” post. I survived the excruciating 2.5 months of extended notice period till the…

First Challenge in Web Data Acquisition Projects: Setup

Let’s dive into the first common challenge in data acquisition projects: Setup. You can find the previous post in this series here. How do you get value from web data? What does the end to end process look like? What questions should you be asking? To reiterate the context, the use case that I’m covering…

6 Challenges in Web Data Acquisition Projects

Let’s talk about the six common challenges I see people face in data acquisition projects. Setup Banning Scaling Maintaining Cost Quality (of service and/or of data) Before diving in, the definition and scope of “data acquisition project” here is when the data you need exists on a website / other HTTP-accessible resources. This type of…

Older Entries »