nodejs.org Download Metrics

This directory contains anonymized log records for binary and source downloads of Node.js from nodejs.org.

What data is available?

There is roughly one log file per day, starting on the 14th of May 2014 to the current day. There is a gap in the data from the 1st to the 21st of September 2015 due to a server configuration error.

The data is gleaned from the access logs by matching for known binary and source files in /dist and the newer /download/release/ which is where the /dist/ directory also points. No other access log information is included in the data available here.

IP addresses and exact times are not reported, only days and geolocation data for the original IP addresses.

What format is the data in?

Raw log files are available in the ./logs/ sub-directory where each file's name takes the form: nodejs.org-access.log.YYYYMMDD.TTTTTTTTTT.csv, where the last entry in the file is used to create the string YYYYMMDD from the year, month and day of the month respectively and TTTTTTTTTT as the unix epoch timestamp. There may zero, one or two log files for a given day. However, when stitched together they should form a continuous record of the downloads from nodejs.org.

There is always a nodejs.org-access.log.csv file which represents the current day's data and is not final, i.e. it will change from update to update, either appending new data or starting again for a new day. The other log files can be considered final until we decide to adjust the format at some point in the future.

The raw log files are comma-separated value format with the following columns: day, country, region, path, version, os, arch, bytes.

Country and region are calculated by using MaxMind's GeoLite2 City database and some entries may contain blank values where the look-up fails. X-Forwarded-For headers are used to determine the most likely origin IP address by parsing out the leftmost non-private address.

The path field contains the actual path that was requested by the client, with the version, os and arch columns calculated from this value.

The version field is occasionaly empty due to the availability of node-latest.tar.gz which is a symlink to the latest source tarball and the various latest directory symlinks when used to download node.exe (versions could be roughly calculated for all of these when matched with release dates if desired).

The os field contains operating system identifiers as well as src for source tarballs and headers for header tarballs.

The arch field is blank when os is src or headers.

Pre-processed summary data

A set of pre-processed summary data is also made available in the ./summaries/ sub-directory. Each type of summary consists of:

Total

Contains two data columns: downloads and TiB, where TiB is 240 bytes.

Source data: ./summaries/total/

Aggregate data: ./summaries/total.csv

Plot:

Architectures

Contains a data column per distributed architecture, including unknown where the architecture cannot be determined (source or header tarballs). The columns are ordered by totals where the architecture that has the highest total is listed first and so on, the column ordering may therefore change over time. The list is not fixed and may expand when additional architectures are distributed from nodejs.org. The .pkg OS X installers are counted as x64 even though, prior to Node.js v4, they were "universal binaries" containing both x64 and x86 versions, usable on both architectures.

Source data: ./summaries/arch/

Aggregate data: ./summaries/arch.csv

Plot:

Countries

Contains a data column per country from the geolocation data, including unknown where a country could not be determined. The column names take the form of ISO 3166 country codes. The columns are ordered by totals where the country with the highest total is listed first and so on, the column ordering may therefore change over time. The list is not fixed and may expand if additional countries not already listed are discovered via geolocation.

Source data: ./summaries/country/

Aggregate data: ./summaries/country.csv

Plot:

Operating Systems

Contains a data column per distributed operating system, including unknown where the operating system cannot be determined (due to node-latest.tar.gz), src for source tarballs and headers for header tarballs. The columns are ordered by totals where the operating system that has the highest total is listed first and so on, the column ordering may therefore change over time. The list is not fixed and may expand when additional operating systems are distributed from nodejs.org.

Source data: ./summaries/os/

Aggregate data: ./summaries/os.csv

Plot:

Versions

Contains a data column per significant version number of Node.js. For <= 0.12, the semver-minor version number is listed, for >= 4.x the semver-major version number is listed. The unknown column contains counts of downloads where the version number could not be determined (see above note about node-latest.tar.gz and the latest directory symlinks coupled with node.exe). The columns are ordered by totals where the version that has the highest total is listed first and so on, the column ordering may therefore change over time. The list is not fixed and will expand when additional significant Node.js versions are made available for download.

Source data: ./summaries/version/

Aggregate data: ./summaries/version.csv

Plot:

Additional stuff

The source of this file along with the various scripts used to generate the data files and graphs can be found in the nodejs/build GitHub repository in the setup/www/tools/metrics directory. Questions, suggestions and pull requests are welcome in that repository.