Getting started with Google Analytics
#
Link an accountThe first thing to do is to link the Google account from which you want to draw the data. Click your username at the top right corner, then click Accounts in the dropdown menu.
From here click the Link a Google account button and follow the instructions to link an account.
#
The reports viewClick Analytics in the navigation bar and then click Reports in the dropdown menu.
You are now at the reports view. Here you see a list of all the reports created by users on your account.
#
Create the reportClick the Create Report button. A form is displayed where you can customize the report to your liking.
#
Analytics TargetThe first thing to select is the Google account you wish to use. Different Analytics Profiles may require switching between available accounts but in the simple case with only a single linked Google account no action is required.
Then select one or more Analytics Profiles from the list of available ones. The Filter box beneath the Analytics Profiles list is useful if you have a large number of available profiles but note that altering the filter will clear your current selection.
#
Dimensions and metricsFirst select a dimension set. There are currently only two options, the Default and the Custom set.
- The Default set contains some common dimensions and is mainly intended as an aid to get started.
- The Custom set allows you to choose up to nine dimensions that you are interested in.
Then select the metric you wish to report on. Note that some of the metrics, such as Goal XX Value, require additional manual input to be used.
Dimensions & Metrics
For detailed information about specific dimensions and metrics there is the Dimensions & Metrics Explorer.
#
Date rangeNext, select a date range for your report.
Data size
Short spans of time may not provide an adequate amount of data to train the model. This means that there is some minimum useful date range for your use case. What this minimum is will depend on the number and type of dimensions chosen in addition to the size of the underlying data set.
The result of a small data set will mostly likely be a badly fitted model which in turn will provide an unreliable feature importance indicator (Feature Weight). However, the statistical feature importance metric (Coefficient of Variation) is still reliable in this situation.
#
Report nameFinally input a name for your report, or leave the default name, then click Create.
#
Reading the reportOnce the report is done you will be redirected to the report view.
#
Model evaluation metricsThese metrics allow you to quickly see how well the model is fitted to the data.
The coefficient of determination R2 and Adjusted R2 are measures of the explanatory power of the model. High values indicate that the variance in the data can be explained by the model. Low values indicate that the variance is not explained by the model, to different degrees. Note that a low value does not necessarily mean that the model is wrong, as such, but rather that there are factors or relationships outside of the models view affecting the metric. This may be improved by a different dimension selection as it may be the case that the chosen dimensions do not have enough explanatory power.
The mean absolute error is another measure of how well fitted the model is, in the unit of the metric.
Number of data rows is a useful measure of the data sets size. A small number of rows, very roughly speaking in the hundreds, may indicate that a longer date range should be used or that some other dimensions should be explored in order to bring the number of rows up.
#
Feature importanceThe feature importance graph displays two measures Feature Weight and Coefficient of Variation that place a score on each feature based on its importance for the metric.
Feature Weight is a measure of how useful each feature is to the model.
Coeffiecent of Variation is a statistical measure of the variation within the features. A low value indicates that the feature does not contain much usable information.
#
Statistical distributionsThese graphs displays the statistical distribution of each feature value as box plots. The line in the main box displays the median value and the box itself is bounded by the first and third quartiles meaning that the box contains 50% of the values. The whiskers, or outlier dots, display the maximum and minimum values.