Inferring Human Behaviour from IP Activity Data

As part of our research, we have combined large amounts of historical Internet activity data with other datasets that contain information about human, economic behaviour.

These individual cases build a proof-of concept and demonstrate that the Internet activity data collected by KASPR Datahaus can infer human behaviour both at the very granular micro level (sleep and commuting patterns) and the more macro level (economic and sectoral growth).

 

1. Monitoring How Humanity Goes Online

Previous related studies have used a variety of internet penetration proxies at either snapshot- or annualised- detail, each proxy having one or more compromises such as data-quality problems (in the case of ITU surveys), or actual internet use identification complications (in the case of block-based or router-based assignment). In contrast, since we observe actual end-user IP connections, in well-defined urban boundaries (cities), at 15min intervals, identified by a hitherto unused highly accurate geolocation database, we are able to provide the first accurate estimate of the evolution of the internet’s expansion at monthly intervals.

Significantly, given the temporal granularity and global scope of our series, we are able to confirm that the diffusion of the internet does indeed follow an S-, or logistic-, shaped process (Fig. 1).  We find that the internet’s general diffusion process has an asymptotic limit of 0.32 IPs per person, equating to an internet `saturation’ level of approximately one IP address for a three person household, on average. Further we estimate that the diffusion process’ average time to saturation within a country is just 16.1 years (1%-99%), eclipsing the estimated 100- and 60- year saturation times for the comparable GPTs of steam-power and electrification respectively.

Our method also enables the elaboration of individual country experiences of the internet’s penetration. Our estimates reveal that whilst several nations already experience saturated internet penetration, others will not reach this point for decades. This method can support decision makers in the public and private ICT sector to systematically monitor the diffusion of ICT at very fine spatial and temporal scale, identify barriers to diffusion and design more precisely targeted strategies to enhance the spread of ICTs.

Figure 1. The diffusion of the internet across 100 nations..

 

2. Granular Internet Activity Data Correlates with Daily, Sleep-Wake Cycles

 

In our first application, we have applied a novel machine-learning (ML) procedure to globally up-scale the highly detailed time-use survey data contained in the American Time Use Survey (ATUS) for 81 US cities which coincide with our internet activity dataset.

By n-fold cross-validation, our method obtains an average error of 11 min. In effect, our approach converts granular internet activity data into a passive chronobiology monitoring platform at global scale. We find that the sleep predictions differ statistical significantly across regions, hinting at an underlying cultural explanation.

This part of our research has also been featured in the MIT Tech Review and Wired:

         

 

 

 

3. Intradiurnal Internet Activity Data Correlates with City-Level Commuting Patters

In another validation exercise, we have combined intradiurnal online/offline IP data with Oyster card data from London’s public transport network (see Figure 2). We show that the peaks and troughs in daily IP activity (coloured lines) correlated with daily commuting patterns (grey lines).

Figure 2. IP Activity Correlates with Commuting Behaviour.

 

 

4. Inferring Economic Activity from IP Activity Data

We also show that IP activity data can be used to predict local economic activity as well as differences in sectoral productivity. This application reveals that, in a more aggregated form, highly granular IP activity data can be used to predict the outcomes of very complex human behavior and interactions.

We use yearly Gross Domestic Product (GDP) per capita  and sectoral productivity (Gross Value Added (GVA) per worker) for 411 large regions in OECD countries and to compare economic activity and internet penetration.

We find a positive correlation between GDP pc and IP pc (Fig. 3(a)).

Figure 3. Internet Activity Correlates with GDP and Sectoral Productivity

However, as Figure 3 (b) makes clear, increased internet activity is not associated with uniformly positive impacts on all economic sectors within a region. Broadly speaking, we find that service sectors amenable to digital competition through out-sourcing (e.g. publishing, news, film production, administrative support, education) have suffered with increasing local IP concentration, whilst location-constrained sectors have prospered from higher internet concentrations presumably due to lowered consumer search-costs and/or logistic and process efficiency gains (e.g. wholesale, retail, real-estate, repairs, hairdressing, mining, transportation, accommodation).

 

For more detailed information please refer to our paper:
[NOTE: The test cases described here, are based on historical IP activity data which was provided to the KASPR team from a third party for academic uses only. This historical IP activity data is NOT part of the products and services offered by KASPR Datahus PTY LTD. ]