My super-old tutorial on generating random exam numbers

Law school is all about anonymity when it comes to exams. This means that each student generally needs a random exam number that is used in lieu of the student’s name on exams. A few years ago, I did a YouTube tutorial on one way to generate random exam numbers. I thought I would throw up this link to it just to make it easier for me to find in the future.

As usual, this isn’t my idea. It’s based on the excellent tutorial here — How To Sort A List Randomly In Excel.

And, in case you’re wondering, those are not real student names. They come from the lovely Fake Name Generator.

Advertisements

Using array formulas in Excel to help you with SQL table definitions

This is one of those posts that is very slightly too long for Twitter and not really long enough to justify a blog post, but here goes anyway.

We get data from our Student Information System in Excel. Imagine something like the following:

last_name, first_name, email, career
Chapman, Ben, bjchapm@emory.edu, LAW

Now imagine that you want to create a MySQL database table and you want to import this data into the table. I use Pentaho Data Integration for this sort of thing. PDI is a great tool and I’ll hopefully go into more detail about it later. In PDI, you create transformations that transform data from one form to another. Here’s an example of a transform that will move data from Excel to a MySQL table:

Screen Shot 2013-02-20 at 9.48.25 AM

So, we’re moving from Excel’s untyped data to outputting data for a MySQL data table. PDI will do this for you; however, it will create tinytext columns that are 255 characters wide for each text field that it sees in the Excel file. Let’s assume that you want to match the table definitions more closely to the actual field widths of the data that you are importing.

Sorry, this is sort of a shaggy dog story – all that build up for this:

array_formula

Here’s what I did: first, I inserted a new row. In cell A1, I entered =max(len(A3:A2000))I should have actually checked how many rows there are. Obviously, if you’re calculating the maximum of the length of something, you need to include all rows.

Now for the magic. This is Excel 101, but I never use them: array formulas. Highlight the formula, and press Ctl-Shift-Enter. This will insert the braces that you can see in the formula bar above. This will apply the formula calculation to each cell, giving you a very compact way to calculate the maximum length of each cell. Now that you have that maximum, you can create your database field widths with some assurance that you will correctly define the field width. In my case, I often define any field that appears to be less than 10 characters wide as a VARCHAR(10). For wider fields, such as “Descr” above, I would probably define that as a VARCHAR(40).

And that’s that. Just remember Ctl-Shift-Enter for array formulas. To dig deeper into the power of array formulas to summarize data in Excel, take a look at this article from Microsoft. By the way, there are lots of other ways to calculate the maximum length of data in a particular field. There are some very useful tools out there that will do basic data analysis for Excel or CSV files. This was just a convenient way to calculate this information. Other thoughts? Other tools or tips? Let me know in the comments.

Excel: Use Pivot tables to summarize daily data by months, quarters, and years

Excel mac 2008 icon
Image via Wikipedia

Recently, I needed to analyze a log file that had lines of data, where each line had a date stamp and some other information. Each line of the log file represented a single email message. The goal was to determine how many messages were sent out each month and each year. I used Excel’s pivot table feature to solve this problem.

Here’s a short screen capture that demonstrates the steps required.

In the video, I’m trying summarize email messages and so I create something I call the “Base Subject”. It’s used as a counter so that I can get the count of messages that get sent during a particular period. I should probably clarify what I’m doing so that you can apply the example to other kinds of files.

Imagine that you have a file that looks something like the following. Assume that the vertical bar separates each column:

Date|Item|Amount
08/01/2009|Widget1|9.23
08/13/2009|Widget1|9.23
08/27/2009|Widget3|19.27
09/01/2009|Widget1|9.23
09/20/2009|Widget5|17.22

Suppose that you would like to know the number of widgets purchased each month. In order to do that you need to count each widget that you buy. The easiest way to do that is to create a column that is identical for every widget. This may seem counter-intuitive but it’s what you need to do.

Add an extra column called “Counter”:

Date|Counter|Item|Amount
08/01/2009|1|Widget1|9.23
08/13/2009|1|Widget1|9.23
08/27/2009|1|Widget3|19.27
09/01/2009|1|Widget1|9.23
09/20/2009|1|Widget5|17.22

Now you’re ready to summarize the data.

When you create the pivot table, do the following in the wizard:

  1. Drag “Date” into the Rows area
  2. Drag “Counter” into the “Data” area and verify that “Sum of Counter” is displayed.
  3. Hit Finish to display the pivot table by days. To consolidate the information, right click on one of the days and add in years and months, by choosing years and months under the “Group” command, as described in the video.
  4. Hit Finish and now you can see how many widgets were purchased in August and how many were purchased in September.
  5. To determine how much you spent on widgets during a given month, try dragging the “Amount” column into the “Column” area in the pivot table wizard. This will give you totals for each period.

This only just scratches the surface of the interesting things you can do with pivot tables. Is this helpful? Do you have pivot table tips to share? Other questions? Let us know in the comments.

Reblog this post [with Zemanta]