Project Case Study: Attending Physician Salary Analysis
1. Project Goal & The Challenge
This project aims to provide an interactive, state-level analysis of average physician salaries, allowing users to compare overall income figures against specific specialty averages and sample sizes. The salary data was gathered by Student Loan Planner through a survey of clients and readers.
The primary challenge was dealing with the raw source data, which was delivered in a wide format. This format is unsuitable for Tableau’s analytical engine because it treats categories as separate, static measures, preventing flexible filtering and dynamic charting. The original Excel file had separate columns for every specialty and every measure:
- Family Medicine Count
- Family Medicine (Avg Salary)
- Pediatrics Count
- Pediatrics (Avg Salary)
- …and so on for over 20 specialties.
2. Data Transformation: Wide-to-Long Formatting
To convert this static, wide data into a dynamic, long format (where all specialties are in one column and all salaries are in another), I executed a two-phase data shaping process directly within Tableau.
Phase A: Pivoting the Data in Tableau
I performed two separate Pivot operations on the data source page: one for all the specialty salary columns, and one for all the specialty count columns.
The Pivot function converts column headers into row values, which is the definition of the long format. This step consolidated over 40 individual columns into just two:
- Specialty Names (the categories)
- Specialty Values (the numbers)
The Pivot Result: After the second pivot, the Specialty Names column contained mixed values (e.g., “Cardiology” and “Cardiology count”), and the Specialty Values column contained a mixture of salary figures and count figures.
Phase B: Separating Mixed Measures with Calculated Fields
The data was now in the long format, but the core measure column was unusable because it mixed non-comparable values (salary vs. count). To resolve this, I created three new Calculated Fields.
| Calculated Field | Formula Logic Used | Analytical Purpose |
| Specialty Salary (Measure) | IF NOT CONTAINS ([Specialty Names], ‘count’) THEN [Specialty Values] END | This function checks the Specialty Name. If the name does not contain the word “count,” the value is treated as the salary, and all other rows return NULL. This isolates all salary data into one clean measure. |
| Specialty Count (Measure) | IF CONTAINS ([Specialty Names], ‘count’) THEN [Specialty Values] END | This function isolates the count data. If the name does contain the word “count,” the value is treated as the physician count, and all other rows return NULL. |
| Specialty Filter Name (Dimension) | TRIM (REPLACE([Specialty Names], ‘ count’, ”)) | This cleans the category names (e.g., “Cardiology count” becomes “Cardiology”) to create a single, unified, user-friendly Dimension for filtering. |
3. Conclusion
By completing these steps, the raw data was transformed into a robust structure containing clean Dimensions and Measures:
- Specialty Filter Name (Dimension): The universal category used on the filter shelf.
- Specialty Salary (Measure): The clean, aggregatable value used on the Y-axis of the specialty charts.
- Specialty Count (Measure): The clean value used for tooltips and contextual sample size.
