#lak13 Analytics MOOC

Well, the course has been running for 3 weeks, and I’ve generally kept up, reading quite a few blog articles and then catching up with the actual course content on a Friday when time allows. Admittedly this is my first MOOC and I can see how a lot of people feel overwhelmed and lost (I’m currently feeling this!).

But anyway, I digress. I’m not 100% sure where this blog post will go, but initially I’m thinking this will form part of a series, otherwise it’ll turn into a flown blown essay – and neither the reader nor I want that.

Enterprise Solution to Aid Analytics

A lot of the talk that I have seen in the discussion forums has been regarding how to get hold of data sources. I’m fortunate in that working within the IT department of my University, and also contributing to the integration of various systems, I tend to have access to large datasets wherever I look. On the other hand, if we can think of a benefit of having some data, then I generally know who to ask.

But I’d like to take a step back from this point and look at System Architecture design that would really aid institutions take part in performing data analytics. I’ve been fortunate enough to be a fundamental part of the development of Data Exchange System at a previous institution, which has set me in good stead for designing a system architecture to avoid duplicates, and resolve issues in needing to ‘cleanse’ data across multiple systems.

A University Data Exchange System

Whilst the real world is never quite this simple, for the purposes of this written article not exploring every eventuality, a University could be seen as having 2 primary source systems:

  • HR System (for staff details)
  • Student Record System (for the Student and Curriculum data)

This data would then flow to a number of satellite systems, for example:

  • VLE/MLE (Blackboard, Moodle, etc…)
  • Library System
  • Swipe Card / Attendance Monitoring system.

There are of course plenty of others, but these as possibly most relevant to all institutions.

image

Now these systems evidently need to be linked up. Firstly, to create the initial data in the various satellite systems, and then to periodically update it. One (poor) approach could be to create direct links from the source system to each satellite system like below:

image

I say poor for a number of reasons. A primary reason is that the next time the Student Record System is replaced, well, every system needs to have their link regenerated from scratch. There is also the issue of duplicate accounts entering the system (if preventive action being in place in the source systems), then they quickly end up in all satellite systems, leading to loss of man-hours dedicated to cleaning up duplicate data – let alone the mention of the impact on student experience.

My preferred solution is to have a central piece of the jigsaw which processes the changes. This central piece can perform a number of functions but primarily:

  • It can attempt to detect duplicates and prevent them from entering the satellite systems – flagging these issues at source.
  • It can, for example, issue each student a unique identifier such as a GUID, which can be used to send to all satellite systems (and this is where LAK comes in – making it easier to query an entity across multiple systems)
  • It can also have the logic regarded to determine if I change needs to be sent to a satellite system. For example, a staff member changing their address probably doesn’t need to be sent to the VLE. It would however need to go to the Library System.
  • It can also have the benefit of merging staff and student accounts into one – where the member of staff is also a student. That would save from having two logins, two ID cards, etc….

It also addresses the concern highlighted above of only one link needs to be re-worked should a system be replaced. There are also further benefits in having the ability to queue up changes in the event of the system being down for essential maintenance, broken(!), along with potentially centrally storing the business logic for the processing rules.

image

So, I’ll leave it there for now. But with the introduction of a unique identifier, centrally stored, we can now start to perform analytics from every system within this architecture. There may of course be links between student performance (or drop-out) and the amount of ‘churn’ through the system, and provided we introduced some kind of logging to this system we’d be able to perform some checks on this and identify patterns.

The next post in the series may come soon (I’m waiting for a long running process to complete) and will look a potential solution to integrate the systems to make retrieval and analysis of data easier (and lend a hand to other mechanisms to deliver student expectations).

Advertisements

A New Year on the Horizon

And I guess a look forward to some new challenges. One thing that has become apparent to myself is a lack of focus on my development of IT skills, and hence a lack of blog postings here. This has primarily been due to my Masters in Project Management taking a lot of my spare time, and then this year I also trained (and completed) for an Ironman competition.

Crossing fingers and hoping that my Masters now has 2 modules left to run and a dissertation, I’m hoping to now start thinking of turning my attention back to learning some IT skills to complement any developments next year. A practice I’ve done in previous years is to have a quick look at job boards and just search on a key term. This year I’ve chosen to look at Java. I really want to keep up with .NET but in my current environment its easier to get the latest tools for Java development. I’ll perhaps review the situation later in 2013 and see what provision would be available for keeping up to date in the Microsoft world.

Any pinching out the key words from the job descriptions a number of recurring themes seem to appear, namely:

  • Spring
  • Struts
  • Web Services
  • Automated Development and Test Driven Development
  • Hibernate / JPA
  • GlassFish / Tomcat
  • MVC
  • JMS
  • JMX
  • Oracle / MySQL
  • Multi-Threading Expertise

Some of these I know reasonably well, others I’ve only heard off in passing. I’m going to attempt to learn from scratch (to remove any bad habits) all the of the above, on a month-by-month basis. This won’t give me as a good a grounding as possible, but hopefully developing the knowledge and then finding application with come. The things I know better than others should give me a month of slight slack, whereas the other months might be a bit more intense.

The month of learning will constitute some kind of blog post – either an overarching summary of the technology, or a series of posts exploring the journey I go along.

Is there a key techonology (part of Java) that I’m really missing? Only thing on my mind is Android development which should probably get thrown into the mix at some point.

Monitoring (and Java and Oracle Batching (for large number of inserts))

Background

I’ve been involved with a monitor project for a while – sadly I’m a bit of geek who enjoys looking at numbers etc. In the very early days we were looking at providing real time statistics just from VLE access, and then things evolved to look at bringing all the data together from various systems (e.g. the assignments system) to provide better metrics, and essentially a student dashboard as a early warning system. I blogged about the idea back in January of 2011 (in a work blog) and have since been involved with various prototypes to get the project off the ground. I think the original idea stemmed from this video showing some initial findings about VLE access and student performance (which originally came via a tweet by George Kroner). I’m pleased to say that the Student Dashboard is now being piloted a small set of users and more information can be found out about the project via the University of Hertfordshire LTI Blog.

Eh, I thought this was about Java and Oracle and Inserting Lots of Records…

Well it is, I got a bit side tracked there. Basically, the background was the scenario of having lots of records to be processed from various sources to generate the required structures for reporting from. And its grown over time, and today I needed to come back and revisit it. The good news is I found out about a Batch Insert for Oracle which meant inserted this number of records has just greatly decreased.

Sample Code

[Note – some lines have been removed for protection, should ultimately work though, but no promises of no syntax errors).

Connection conn = null;
PreparedStatement ps = null;
ResultSet rs = null;

try {
   conn = getDBConnection();
   conn.setAutoCommit(false);

   ps = conn.prepareStatement("INSERT INTO table_name (Field1, Field2,Field3,Field4,Field5) VALUES (?, ?, ?, ?, ?)");
   ((OraclePreparedStatement)ps).setExecuteBatch (1000);
   for (EntryType entry : myEntryList) {
      ps.setString(1, entry.getField1());
      ps.setString(2, entry.getField2());
      ps.setString(3, entry.getField3());
      ps.setString(4, entry.getField4());
      ps.setInt(5, 1);
      ps.executeUpdate();
   }
   ((OraclePreparedStatement)ps).sendBatch(); // JDBC sends the queued request
   conn.commit();
} catch (Exception e) {
   e.printStackTrace(System.err);
} finally {
   if (rs != null) {
      rs.close();
   }
   if (ps != null) {
      ps.close();
   }

   if (conn != null) {
      conn.close();
   }
}

C# and JSON – Hello,World Example

I haven’t really played around with JSON data streams, but the need arose when consuming some data from another system recently. In order to do a quick and dirty solution I simply read it in as a String and then split the string where I needed to, and found the data I required. I’ve now revisited this example to do it properly!

My raw data string looks very similar to this:

{"patron":
{"FIRSTNAME" : "Gregor",
"MIDDLENAME": "",
"LASTNAME" : "Bowie",
"ERROR" : ""
},

"HasPaidRecently" :false,
"patronComments" :"You have no books on loan",
"potentialChargeAfterRenewal":12.6,
"patronAccountStatus" :"A",
"totalFineFee" :0,
"calculatedOverdueFine" :0,
"totalReplacementCharges" :0,
"totalBritishLibraryCharges" :0,
"returnedItemFines" :12.6,
"oldOutstandingFines" :0,
"numberOfItemsBorrowed" :0,
"caseId" :151,
"thereWillBeRowsToDisplay" :true,
"widgetLogId" :48721,
"sessionId" :"B80B9913C513EC8C4B81F68B3FD0A8AD"
}

In order to process this data using .NET Serializer a couple of classes representing the object structure need to be created (due to the inner Patron syntax above).

The parent class takes this form:


public class Patron
{
   public NameDetails patron { get; set; }
   public string HasPaidRecently { get; set; }
   public string patronComments { get; set; }
   public string potentialChargeAfterRenewal { get; set; }
   public string patronAccountStatus { get; set; }
   public double totalFineFee { get; set; }
   public double calculatedOverdueFine { get; set; }
   public double totalReplacementCharges { get; set; }
   public double totalBritishLibraryCharges { get; set; }
   public double returnedItemFines { get; set; }
   public double oldOutstandingFines { get; set; }
   public int numberOfItemsBorrowed { get; set; }
   public int caseId { get; set; }
   public bool thereWillBeRowsToDisplay { get; set; }
   public int widgetLogId { get; set; }
   public string sessionId { get; set; }
}

With the inner NameDetails class looking as:


public class NameDetails
{
   public string FIRSTNAME { get; set; }
   public string MIDDLENAME { get; set; }
   public string LASTNAME { get; set; }
   public string ERROR { get; set; }
}

Note, I’ve made my variable names match exactly the format they are given in the JSON. I would imagine, although haven’t tried, adding a DataMember attribute would override the naming if required.

Now for the source code that takes the raw string and converts it into the object.


Patron pat1 = new Patron();
DataContractJsonSerializer serializer = new DataContractJsonSerializer(pat1.GetType());
MemoryStream sr = new MemoryStream(Encoding.Unicode.GetBytes(rawData));
Patron pat = serializer.ReadObject(sr) as Patron;
sr.Close();

Console.WriteLine("Person: " + pat.patron.FIRSTNAME + " " + pat.patron.LASTNAME);
Console.WriteLine("Has Paid Recently? " + pat.HasPaidRecently);
Console.WriteLine("Fines: " + pat.potentialChargeAfterRenewal);

Note – this isn’t production code, merely a spike to prove things.

Scientia Timetabling – Creation of SPDA Files

Scientia provide the timetabling solution for our University and there appear to be exciting times ahead with lots of discussion about possible improvements to the timetables delivered by the University.

Of course to deliver a better solution it first means better integration with our existing systems, and the preferred method of integration by Scientia is via the use of their SPDA tool.

The SPDA tool is configured to point at various databases (note, requires 32-bit Oracle Client, not 64-bit!), and then a link definition file is put in place to determine the field mapping between Scientia and the source data table. Scientia do recommend putting in place at staging table between the source data and the SPDA tool, e.g.

tblModules (Source System)   ->    tblModules_TT (Staging) (with Status column)   -> Scientia

During recent testing I discovered that using Oracle fields of DateTime cause the SPDA tool to fall over (reported as Data Type unsupported error, and then followed by an Automation Runtime error). Strangely it appears despite declaring the fields we’re interested in, the SPDA code reverts to running a SELECT * query, rather than specified columns?! So stick with numbers and varchar2 in the staging tables!

Below are two example SPDA files that are currently working for us, one for module and one for staff. Their contents aren’t anything rocket science, only thing worth noting is that the Scientia data columns are on the left, and our staging table columns are on the right.

Modules Example (not using Status flag)

;;LDF to set up the link between S+ and Staff Feed
;Version 2

BEGIN-CLASS Module, T_ST_Module_TT, HostKey
HostKey, MODULE
Name, MODULE
Description, DESCRIPTION
Department, DEPARTMENT
END-CLASS

Staff Example (using Status flag)

;;LDF to set up the link between S+ and Staff Feed
;Version 2

IN-TRANSFER TT_Status

BEGIN-CLASS StaffMember, STAFF_DELTA, HostKey
HostKey, STAFF_NO
Name, FULLNAME
Description, USERNAME
Email, MAIL
END-CLASS

Windows Shutdown

Seriously Windows, this has been annoying me for quite some time now. I’m occasionally forced to use Microsoft Windows for an OS (I much prefer life in Ubuntu), and more often than not, when I go to Shutdown the machine the little yellow icon appears in the menu informing the user than Updates need to be installed. (Is there another option to click to avoid installing the updates?)

Unfortunately for me this happened tonight. I choose Shut Down because I wanted to switch my laptop off and head home for the night. I’m now sat here 20 minutes later because I can’t turn my laptop off, or store it securely overnight, waiting for Windows to update itself – currently still on Update 1 of 33.

Seriously – Shut Down should mean Shut Down. Not please wait (for a long time) whilst I do something.

International Blended Learning Conference 2012

image

For the second year in a row I attended the University of Hertfordshire’s International Blended Learning Conference. Last years conference saw me missing one or two sessions as I negotiated the purchase of our house – hopefully this year would be as interesting – if slightly less eventful!

The conference really is truly excellent, offering so many benefits that are often lost as I focus on day-to-day tasks, and often forget about the bigger picture. So at a very simple level just being able to see some of the excellent working taking place at our institution and further afield really is a source of inspiration. The role our team plays in building the foundations for some of these activities to take place really helps keep focus on continuing to deliver functionality for others to utilise.

I attended a number of the sessions throughout the 2 days, and all had their merits. Three though particularly stood out for me for different reasons:

  • Jessie Paterson from Edinburgh University demonstrated the effective use of blogs to promote critical reading of recommended reading. An excellent presentation and the benefits reportedly being better writing standards across the course as well as the module, along with better engagement and discussions around the texts on the module. (With my student hat on for a moment, this is something I intend to do off my own back for next years recommended readings!)
  • James McDowell from the University of Huddersfield demonstrated the use of video feedback to his students and how quickly the feedback could be turned around. I particularly liked the play on words of feed-forward rather than feedback. As a team we’re currently exploring ways to integrate video/audio feedback with the assignments system, so the findings from his research were invaluable.
  • Guy Saward from the University of Hertfordshire has built upon the publicly available RSS Feeds from each module within our VLE to publish notifications out to Facebook and Twitter using some RSS Aggregators. With it working well, 2 questions now need to be answered of (a) do students want it, and (b) how to automate it across an institution. Interesting times ahead I think.

The ability to network with colleagues is also worth its weight in gold at these conferences, and this conference appeared to have the right balance between ‘social’ and ‘presentation’ time. Listening to the experiences of our colleagues with the VLE and being able to remove a number of minor hurdles with simple conversations. Also at the end of one particular presentation, one colleague turned round and said they’d love to be able to do that. After a 10 minute initial conversation and then some follow-up emails it looks like they might be a position to do it next semester, and I look forward to providing them support in this activity.

Another outstanding outcome of the conference was the ‘Thinking Space’ initiative, which resulted in this magnificent image being created by Joel Cooper from the ideas put forth by the conference attendees:

If there is a downside to such a thought provoking and inspiring conference it would have to be the size of the To Do List sitting in front of me now. But then again, is that such a bad thing?