<div class="notebook"> <div class="nb-cell markdown" name="md1"> # Magus Magus is a tool to visualize user /sessions in a website based on HTTP log entries. It was designed in the context of the [national digital newspaper collection of the Netherlands](http://www.delpher.nl/nl/kranten), maintained by the National Library of the Netherlands. This collection provides a _faceted_ search interface based on the underlying metadata of the archive providing information such as the date of publication, the _type_ of article (advertisement, family announcement, news article, etc.) and the region where the newspaper was available. The HTTP log records were analyzed in SWISH, acting as a _DataLab_. The goal was to get detailed insight in the behavior of users and whether that behavior can be related to the metadata. For example, many people use the collection for studying the history of their family while the site is also used by historians that are interested in specific events such as World War II. The first task was to clean the data and see whether we can represent the _sessions_ of individual users in a coherent way. The original data contains many interactions that do not reflect user behavior such as loading style sheets and images, page reloads due to accidental double clicks as well as page loads from webcrawlers. On the other hand, some of the actual behavior is not reflected in toplevel page loads but in AJAX requests. The job was to remove non-interesting HTTP requests and _normalize_ the others as well as their _referrer_ to establish a coherent _clickstream_ that captures the user behavior. To do this, we established a graph visualization as it immediately shows whether a sequence of records from the same IP address and in a limited time range actually connects and whether we can classify each node with a user interaction. If the graph does not connect this can mean our data cleaning and normalization is incorrect or we actually are dealing with -for example- two users behind the same IP address. If we cannot classify certain pages they may be unrelated to user interactions (and must be removed) or our classification is incomplete. To simplify this analysis we use shapes, colors and border width to distinguish the locations and make the nodes _clickable_, redirecting to the original page. As our aim was to relate behavior to metadata we also added color schemes to reflect metadata such that we can quickly see how the user behavior relates to the time, type of newspaper item or distribution zone of the newspaper document. Unfortunately we cannot share the analysis for privacy reasons. This notebook does illustrate the vizualization on top of a single session from our log files. ### Paper MAGUS is described and evaluated in a paper presented at [TPDL](http://eric.univ-lyon2.fr/adbis-tpdl-eda-2020/tpdl/): [Understanding User Behavior in Digital Libraries Using the MAGUS Session Visualization Tool](https://doi.org/10.1007/978-3-030-54956-5_13) See the [presentation at TPDL](https://youtu.be/Jp54b5ufe14) ## Files The system consists of the following files: - magus.pl provides the graph vizualization - magus_normalize.pl does URL normalization to make the referrer graph connect - magus_locations.pl classifies the various URLs of the server as e.g., _search result page_, _download_, etc. - magus_human_readable.pl turns seconds into human readable notation - magus_data_access.pl makes our small data sample accessible as the original data and replaces many of the internal dataprocessing predicates with a simple abstraction - magus_example_data.pl contains 76 data records extracted (after cleaning) from the original data </div> <div class="nb-cell program" data-background="true" name="p1"> :- include(magus). :- include(magus_example_data). :- include(magus_data_access). :- include(magus_normalize). :- include(magus_locations). </div> <div class="nb-cell markdown" name="md2"> ## Showing a session visualization After loading the program above we can show a session in one of three color modes. Please click the _run_ button below right. </div> <div class="nb-cell query" data-run="onload" name="q1"> parameters([ColorMode:oneof([period,item,spatial])+default(period)]), session_graph(_, Graph, ColorMode). </div> <div class="nb-cell markdown" name="md3"> ## Reusing this code Unfortunately, creating a visualization like this is highly domain specific. To name a few points that need attention - Various URLs may refer to the same actual page due to (1) non-canonical representation of path and/or query parameters (2) added query parameters that do not affect the page or (3) refresh of the content of the page using an AJAX request. - The actual URL of the node may need to be changed to get the target page displayed correctly. - Locations (URLs) of the server need to be classified to define the node shape. - Mapping of modifications such as use of facets needs to be defined and mapped on (here) the border width - Mapping of metadata to colors needs to be defined. The provided code rewrites a list of records that belong to a session into a Prolog term suitable for the [graphviz renderer](example/render_graphviz.swinb). This SWISH platform acts similar to a Wiki for text: a user may create a new program or notebook, copy/paste an initial version and edit it to suite the needs of the current project. The programs and notebooks can be used simultaneously by anyone with access to the SWISH server. </div> </div>