Alteryx for Beginners

Summary: In this article “Alteryx for Beginners”, we are going to learn how to use Alteryx and some of the key data skills (like blend, filter, analyze etc.)

Following are the topics that, I am going to mention in this article you can also try out the Alteryx A to Z BootCamp Course with Real Data.:

  1. Alteryx User Interface
  2. Build a simple workflow
  3. Preparation of Data
  4. Filtering Data
  5. The blending of different Data Sources
  6. Analyzing Data

Get to Know The Interface

      1. Alteryx User Interface:
        This section explains the Alteryx Designer interface and also explains some of the basic key terms.
        Key terms that are used in Alteryx are as follows:

        1. Canvas: It is the blank work area in Alteryx designer, where user build the workflow
        2. Tools Pallet: This is the section in Alteryx where all tools (like Input tool, Output tool, Select tool etc.) are placed. These tools are used to create the workflows.
        3. Configuration Window: This is the section where the configuration and setting of the above-mentioned tools are done. For example, if we use the “Input” tool, we need to set up the path of the input file, in that case, the setting up of the path is done in the “configuration window’
        4. Result Window: When the user runs the workflow, the result of that workflow whether that has been run successfully or not. That result is being shown in the “result window” 

Build a Workflow

      1. Build a Simple Workflow:
        Now, the main objective in Alteryx is to create a workflow to do the specific task. Below I am going to explain how to create a workflow in Alteryx:

        1. “Input” Data Tool: The first step in all the workflow is to pull the data into Alteryx. Using “Input” data tool we can pull the data in Alteryx
        2. In the “configuration window”, connect to the input data source which you want to process. In Alteryx, there are various options to connect to input data source like Excel, SQL, and Hadoop etc.
        3. Now, drag the “Output Data” tool from the above tool palette onto “New Workflow1” workflow canvas and make the connection between “Input Data” and “Output Data’ tools to complete the workflow
        4. After making the connection with the “Output Data” tool, provide the location and name to save the output file

Data Preparation

      1. Preparation of Data:
        In this section, I am going to explain the use of some of the data preparation tools that are available in Alteryx.
        Please note: Below I am using the sample data named as “AddressData.yxdb “. This sample data is downloaded by default while installing the Alteryx. You can find the above-mentioned sample data at this location: “C:\Program Files\Alteryx\Samples\en\SampleData”

        1. Dag “Input” data tool to canvas
        2. In the “configuration window”, connect “Input” tool with the input data source.
        3. Use “Browse” tool: As the name suggests, “Browse” tool is used to look at the all of the data directly.
          Now, drag the “Browse’ tool into the canvas and drop if after the “Input” data tool.
        4. Click “Run” button: To load the all of the input data into the browse tool, the user needs to execute the workflow. Click the “Run” button to execute the workflow.
          Now, to do some data manipulation like changing the name of the input fields or changing the data type of any fields, follow the below steps:
        5. Drag the “Select” tool into the canvas and place it between the “Input” and “Browse” tool
        6. The “configuration window” now shows the options of the “Select” tool. Using the “Select” tool, the user can do the following:
          • Remove the fields that are not needed in the process: By deselecting the highlighted check-box, the user can remove the required fields
          • Change the data type of any of the input fields
          • Rename the input fields
          • Reposition the fields
        7. Do the required manipulation in the data using the “Select” tool (as mentioned above) and then execute the workflow by clicking the “Run” button

Filtering Data

In this section, I am going to build a workflow to demonstrate how to filter the input data using Alteryx.
Please note: Below I am using the sample data named as “AddressData.yxdb “. This sample data is downloaded by default while installing the Alteryx. You can find the above-mentioned sample data at this location: “C:\Program Files\Alteryx\Samples\en\SampleData”
Objective: There is a field called “Score” in the above data. We would like to filter out the data where the score is greater than or equals to 50

      1. Dag “Input” data tool to canvas
      2. In the “configuration window”, connect “Input” tool with the input data source
      3. Drag the “Filter’ tool on canvas and put it on the right side of “Input” tool
      4. Now, in the “configuration window” of “Filter” tool, do the following
        • select the “Score” field in “Basic Filter” a
        • set the inequality to “>=”
        • enter the value of 50 (the value for which user wants to filter the data)
      5. Filter tool has two output labelled as “T” and “F”. It divides the records into “True” and “False” data streams based on the condition we configured.
      6. Drag the “Browse” tool and attached it with both “T” and “F”
      7. Execute the workflow and in the upper “Browse” tool we will have all records having Score greater than and equals to 50. In the lower “Browse” tool we will have records of less than 50.

Blending Data

      1. The blending of Data Sources:
        Objective: In this exercise, we blend the transactional data from XML with customer details from CSV excel file. The objective is to find the total sales for each customer segment. We have a common field called “CustomerID” in both the datasets.

        1. Input the “Transaction.xml” data from this location: “C:\Program Files\Alteryx\Samples\en\SampleData
        2. As the “Sales” field in the XML data is in string data type, using “Select” tool, we need to change the data type to “Double”.
          Drag “Select” tool and place it after the “Input” tool.
        3. Input the second data called as “Customer.csv” data from this location: “C:\Program Files\Alteryx\Samples\en\SampleData”
        4. Once we have both the input data, the next step is to join both data based on the common field “CustomerID” so that we can have the “Customer Segment” from “Customer.csv” and “Sales” values from “Transaction.xml”
          Use the “Join” tool to join both data sources:
        5.  “Join” tool returns 3 different outputs which are Left(L), Right(R) and Join(J). In this case:
          Right(R): has the output of those customers who do not have any transaction
          Left(L): has the output of those transactions that do not match any customers data
          Join(J): shows the output that matched and common
        6. The last step is to aggregate the sales for each customer segments. For that use “Summarize” tool and join it with output “J” of “Join” tool
        7. In the “configuration window” of “Summarize” tool, select the “Customer Segment” field and then add that field as “Group By”
        8. Similarly, add “Sales” field as “Sum”
        9. Execute the code and see the output by adding “Browse” tool

Analyzing Data

    1. Analyzing Data:
      Objective: In this topic, we take the customer transaction data and group customers by their shopping habits. We use the fields named as “Visits” and “Spends” to rank the customers.

      1. Input the “Cus_wTransactions.xls” data from this location: “C:\Program Files\Alteryx\Samples\en\SampleData”
      2. Drag the “Tile” tool and place it after the “Input” tool. In the “configuration window” of “Tile” tool, “Visits” is already been set.3
      3. The function of “Tile” tool is to create groups within the specified field. For example, in this case, we have defined the “Visits” fields in the “Tile” tool. Now, if we execute the workflow, we will see that based on the “Visits” value, the data will get divided into different groups.
      4. Now, add another “Tile” tool and select “Spend” field in the “configuration window” to group the customers based on their spend amount
      5. Now the next step is to rank the customers as “High” or “Low” based on their cumulative spends and visits score.
        “High”: If the customers tile_num (from “Tile” tool) for both “Spend” and “Visits” together is more than 3
        “Low”: If tile_num for both “Spend” and “Visits” together is less than 3
      6. Drag the “Formula” tool and place it on the right side after the second “Tile” tool. Also, create the same calculation in the “Formula” tool as it is shown in the below image.
      7. Execute the workflow and check the results

Author1

Subscribe
Notify of
guest

7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Bala
Bala
4 years ago

Wow it’s easy to Understand as a beginner great thank you so much.:)

Gaelim Holland
Admin
4 years ago
Reply to  Bala

Very happy to have helped!

kumar
kumar
4 years ago

Blending data source topic .
what is the need for changing data type to “Double” .

John
John
4 years ago
Reply to  kumar

Double changes the format of the output to a numeric field. The sum function that is used in the blended data example won’t work for a v_string, or text field.

Aji Chacko
Aji Chacko
4 years ago

Good Stuffs for beginner, thanks

Aji Chacko
Aji Chacko
4 years ago

Great Content for beginner, thanks

Nirmal Kumar
Nirmal Kumar
4 years ago

Pretty Clear . Right start for beginner level