Answers to some common questions and frustrations for new users of Power BI.
I'd like to help
save you some time banging your head and searching the internet by pointing out
some of the issues I've run into when working with clients over the last year.
Data loading and preparation:
There are many
articles and blogs about the technical aspects of loading data into Power BI so
I won't cover that here in great detail. However, I would like to call out some
basic data transformations and cleansing activities that can make a big
difference to the experience of report users.
Once you have
loaded a table check the columns:
1.
Rename your data columns to use business
terminology in the field names.
This isn't a
database, this is a reporting data model and your business users need to be
able to understand the names of the columns.
2.
Add units to your column names.
Again, the
objective here is to make the reports readable and easy to interpret. You don't
want your uses to wonder if you were summing minutes or seconds in the
reporting graphic.
3.
Remove unnecessary columns:
Will the column be
used for reporting? Really think about this, will the users of your reports
NEED to see this column, slice or filter by it? If not, it doesn't belong in
your reporting data model.
4.
Remove bad data, or unnecessary data records:
Will the RECORDS in
your table all be reported on? Again, every record might not have valid data.
Check for this and remove records that are not going to provide good insight in
your reporting.
5.
Remove unused records, even from date tables!
Are the records
part of the data set you need? Sometimes we load records that go all the way
back years before our reporting range. Unless those records contribute to the
value of information shown in the report, remove them. This is ESPECIALLY true
of the Date or Calendar tables - think about those extra records. This is much
more data than you need to be filtering through for your reports and interactive
visualizations.
6.
Use data types wisely:
Change data types
to be accurate for your business definition of the column. Sometimes colums will come in as whole numbers to Power BI because the sample used to define the
column "sees" only numbers in the first 1,000 rows for the preview.
If this is a text based column, your data load will error when you load the
rest of the data.
7.
Use Trim if you think there might be leading or
training spaces in your data.
Once you have
loaded the data there is still preparation to complete. A common question often
comes up right about now though, so let's talk about the
Confusion between Calculations and Measures:
These are similar:
that the engine will calculate them for you, and both can be aggregated in your
reports and visualizations. The key difference is that calculations are
values that are stored in each row of your table, and measures are not
calculated until used in the report context.
1.
Creating Calculations and Measures
Format your facts
as part of your data model setup:
Often you will be
told to use custom calculations instead of the default summarization on your
columns.
1.
You will then know, and be able to define in a
data dictionary, exactly how the calculation was built.
2.
However, if you do use default summarization,
there will be a benefit when viewing reports: you will be able to use the
"View records" command to see what records were included in the
aggregate. This does not work for measures.
2.
Sometimes you need to create a new column as part
of each row:
For example, you
might want to calculate Amount as a Quantity * Price - that is a calculated
column
1.
The "context" of the calculation is the
row - always
2.
Created on data load and stored just like other
columns loaded by Power Query
3.
Sometimes you want to show calculation results
that change as we interact with our reports:
You might need to
know the maximum Amount which is max(Quantity * Price), but when reviewed
across all rows for orders placed within the current Year it will give a
different answer than when viewed for all orders within a given City.
1.
In this case the "context" is all of
the rows that are being evaluated, and which rows will be determined by
"where" the measure is being used
i.
Used as a value in a chart by dates, it might be
the max Amount per Year, then you might drill down to max by Quarter and max by
Month
ii.
Used as a value in a Geographic map, it might be
the max Amount in the Region, then drill down to a given State or Province,
City, etc…
Some other important and sometimes confusing features:
1.
Go through and format your facts as part of your
data model setup
i.
Set default summarization.
ii.
Set the format to use for showing the fields
It is easier to do
all of these steps for all of your columns at one time so that your reports
look great from the start and you will know that the standard has been
implemented. It is still likely that you will need to return to update or fix
some which you can do a t any time.
2.
Create Groups:
You can categorize
your records automatically by creating groups that become a new column in your
table. "Other" as a category allows for new data values added in your
incoming data. Be sure to check overtime that you don't need to add a new
category though.
3.
Create Hierarchies:
i.
Creating a hierarchy is something people do accidentally
all of the time, then they worry about damaging their data model by deleting
them. Don't worry if you get an accidental hierarchy with an unintended
drag/drop of a column in the Fields list, just go ahead and delete that
hierarchy - your fields will stay as they were.
ii.
To create an intentional hierarchy drag a lower
level field over to the highest level in the hierarchy and drop it there. There
will be a yellow dotted line shown where it will drop. Each other column added
to the "pile" will be added at the bottom.
iii.
If necessary rearrange them by dragging them up
or down, or delete one using the "x" to the far right.
iv.
Adding a Hierarchy to the axis of a visualization
will enable drop down.
v.
In the axis of a graph - you can remove levels of
hierarchy that are not desired for that graph without any effect on the
hierarchy.
4.
Even if you set the format of your column, you
may need to set the visual scale on each individual visualization settings:
None, Thousands. Pay attention to this, it makes a big difference to
readability
5.
You can now "See Data" AND "See
Records"
i.
See Data will show the values that are in the
visualization - as a table.
ii.
See Records will show the table records that
comprise the aggregations that are shown in the graph. Every record in it's
entirety.
iii.
As mentioned above, "See Records" only
works with data elements (including calculated columns?) not with measures that
can be summarized
iv.
Trick - if you want a count as your visual value,
but your data doesn't have a uniquely identifying column to "count",
create a column that can be summarized for the counting (like
"Trips") and default it to the value of 1. You can then sum or count
the values in this column to know how many unique "trips".
6.
If you create a visualization (a bar chart) and
the values are the same in every data point you have probably forgotten to
create your relationships. You can draw them on the Relationships tab, or us
the Manage Relationships dialog box from the Home ribbon bar.
7.
How to review your relationships:
If you have many
tables and many relationships and want to review them all quickly, the dialog
box from Manage Relationships may be the better choice, as it is quickly
scanned for errors in column names, etc.…
8.
Single and "Both" direction filtering
make a big difference in some reports:
i.
Single direction filtering enables the value from
one slicer or visual to affect the values on the related table. However,
filtering the second table by another value will not affect the values shown
from the slicer.
ii.
The best way to visualize this is to see 2
slicers from two dimensional tables and when the dimensions are both single
direction, the two slicers will not affect each other.
iii.
If the 2 dimensions are "Both"
direction filtering, then selecting a value from one will affect the values
shown from the other.
9.
Drill down in charts
i.
There are 3 drill down buttons on a chart that
contains hierarchies:
ii.
The single down arrow button is a toggle to turn
drill down on, when on selecting a value in the chart will "drill
down" to the next level.
iii.
The double arrow will drill down to the next
level, aggregating at that level ONLY:
i.
Ex: if you go from showing 3 years of data down
to showing months, the values for each month will be the aggregate of all of
the month's values for every year, all together
iv.
The forked arrow aggregates at the combination
of the level above and the new level:
i.
Ex: if you go from showing 3 years down to
months, you will see the values for each month of every year.
10.
A Date table enables time calculations if certain
criteria are met:
i.
A date column must have a date data type
ii.
The date column must show a continuous value for
every date in the time period (ex: 1/1/2017, 1/2/2017, …)
iii.
For any time intelligence calculations to work:
i.
Joins with the date table must be made on the
date column - NOT on a surrogate key.
ii.
The values in the time calculation must
also be continuous. If you are aggregating ParallelPeriod values, you must be
doing it across a dataset that has a value for every date in the time period.
11.
Why isn't my calculation working?
i.
Check your calc - of course, but if you don't
find anything
ii.
Save some time by reviewing your:
i.
Page and Report filters
ii.
Visualization filters
iii.
Whether you have Relationship filtering in Single
or Both directions
These are some of
the "gotchas" that can waste your time when you create your reports
and can cause your users confusion when they use your reports.
Are there others
that you've spent time resolving that you'd like to share?
4 comments:
It is really a great and useful piece of info. I’m glad that you shared this helpful info with us. Thanks for sharing such nice article, keep on updating.
Python Training
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
I feel Power BI is the most useful and resource full tool that can easily take up more and more complex database operations and solutions and possibly recommend a simpler version of this.
Powerbi Read Soap
First time reading this blog, thanks for sharing
Post a Comment