I recently wrote a blog post for our own website around some interesting findings and solutions we had around logging in InRiver. I thought I'd also post it here to the community.
The problem
While there have been countless benefits for inRiver PIM developers reaped from the move from on-premise inRiver to the cloud-hosted multi-tenant IPMC solution, one of the challenges that has come up is the monitoring of the solution and figuring out if your extensions are performing as expected. More importantly, if they are not performing as expected, then how to diagnose the performance issue and measure your solutions to the problem. This challenge is exacerbated by the fact that running extensions locally while connected to the IPMC environment through the remoting connection we see significantly different performance characteristics than when an extension is running in the IPMC environment. The only way to get a true read on the extension is to measure it while it's running in the IPMC environment.
As with most partners that do a lot of work with InRiver, Aperture Labs has built a library of extensions that can be extended and deployed for our customers to perform common tasks. One of these very common tasks is importing data into the PIM. For one specific client that had a very wide data and relatively deep dataset (multiple-thousand fields, and around 70,000 'rows'), they also had pretty complex automation rules around calculating field values as the data was importing, requiring the look up of additional data in the PIM, traversal of links, etc.
As we went through the development process with this client, we had seen this import slow from 2-3 seconds per 'row' to 5-6, then 7-8 seconds and longer for each row processed. So we turned our attention to the performance of this import. We did our usual process of reviewing all import code, looking for potential optimization locations. Taking care of that low hanging fruit yielded some improvements but not everything we had hoped for, so we started adding logging to the code. We instrumented and logged everything from calculating and setting and individual field values, to creating links, saving entities and processing entire rows. Again, this process found us some additional areas of optimization but not enough, in fact after the optimization that saved, let's say, 1,000ms per row, the processing of each row was only about 500ms faster.
The Hypothesis
This led us to the following hypothesis, the act of logging itself was a significant contributor to the performance issues that we were experiencing. While we had always assumed that logging was all but 'free' from a performance perspective we were seeing evidence that excessive logging (and we had introduced excessive logging) was a contributing to a significant amount of time to our imports.
The Test
We developed a dummy test extension (the code can be found here) that simulated a spreadsheet style import of 1,000 rows with between 50 & 100 fields in each row. Each field would require between 0 and 3 'operations' to be performed on it to simulate calculating the field value. We added some randomized sleep time for the calculation operations, and to simulate the saving of entities to ensure that we have something resembling real world.
Our test process would time and log every interaction with an appropriate log level as we would in a production environment:
- The beginning and end of the process logged as a warning (we would normally do this as information but for the test I wanted to elevate this above any loops) - 2 log entries per import
- The processing of a row is logged as Information to show progress for the import - 1 log entry per row
- The processing of each field is logged as Debug to show progress of the field - 1 log entry per field, 50-100 logs per row. This is typical to debug issues with data coming in incorrectly or not at all.
- The processing of each calculation operation for each field is logged as Verbose - 0-3 entries per field, 0-300 entries per row. This is typical to deep dive into data and logic issues.
The Result
By running this test 4 times, excluding a different set of logs each time we came up with the following result:
- Logging all log levels, the simulated import took 1076 seconds to run
- Excluding the "Verbose" log level, the simulated import took 813 seconds to run
- Excluding "Verbose" and "Debug" log levels, the simulated import took 197 seconds to run.
- Excluding "Verbose", "Debug", and "Information" log levels, the simulated import took 192 seconds to run.
As we can see, the results show a strong indication that the act of logging has an impact on the performance of the PIM. The challenge is that extensive logging is sometimes required to troubleshoot bugs, issues, undesired behavior, and even performance in the PIM.
The Solution
To solve the issue of balancing visibility into what the PIM is doing during processes and the overall performance of the system we have come up with a simple extension method to the InRiver Context called SelectiveLog. This extension method can be called exactly like the Context.Log method. Using a server setting and/or an extension setting called "LOGGING_LEVEL" we can control which log levels we want to enter into the log. The extension method will read this setting from the context and essentially swallow logs that are not at the minimum log level that has been configured.
The Outcome
By implementing this extension method, converting all our logs to use SelectiveLog, and ensuring we are using appropriate log levels everywhere we are logging we were able to get our import performance back down to the 1-2s per row, while including all the expensive calculations and lookups. We are still able to lower the log threshold of either the server or an individual extension without the need to deploy code when we want or need to debug an issue. Additionally developers do not need to remember to wrap their log entries in conditional logic to take advantage, they simply need to use Context.SelectiveLog instead of Context.Log
The Code
The code used to perform this test, including the extension method that we use to selectively submit entries to the log can be found here.
-
Great post Alex. This is back to the days of on-prem, where this setting was available to suppress the log levels (but at a general system-wide level, rather than for each extension).
I remember being a little apprehensive when I realised that iPMC logged everything, with no on/off switch. It's good to see some metrics and a strategy for making improvements.
0 -
Great read and findings Alex Moss (ICP), thanks for sharing! 👍
As always, the less you (need to) do, the faster it goes. 😄0 -
This aligns to what I now do on every extension. The code always has the logging in place but a setting threshold controls if it finally commits to the Context log or not. Previously, I controlled logging by uncommenting lines in the code and deploying - very cumbersome.
It would be nice for this to be added as a standard feature within the Context entity.
0 -
We did the same constat a while... since, we are using a setting minimum_log_level to control what level of logging is passing to remoting.
Also, there is is similar setting and behavior ootb in the Integration Framework with IIF_LOGLEVEL and LOG_LEVEL settings https://community.inriver.com/hc/en-us/articles/360019726800
0 -
Yes we actually implemented the feature a long time ago as a way to keep the logs clean when the additional information was not needed.
We didn't recognize the performance impact of the logging until recently.
0
Please sign in to leave a comment.
Comments
5 comments