Vrcop +3 Stability

potts.mike · Nov 27, 2012

Currently I have a zwave network setup using the vrcop and cqc. The vrcop is locking up several times a week on me and I have to power cycle it to get it to start working again. Are other people seeing this as well? Are you using pc software or an automation panel for control?

I am about to start installing an elk panel and am trying to decide whether to purchase the zwave lighting expansion module. The downside to this is that I lose the ability to track the user codes used with my zwave locks.

Dean Roddey · Nov 27, 2012

Just to add some info here... the VRCOP starts complaining that it is unable to transmit messages, and just stays that way until power cycled. This isn't a CQC issue since the driver is still there and happily trying to make things happen. Unplugging and plugging back in the VRCOP unsticks it and the driver continues onwards. It does seem to still receive incoming async notifications when it's in this state.

It may be that we are trying to use it a little too aggressively perhaps, but of course it provides natural throttling by acknowledging the receipt of messages to the VRCOP and we don't do another poll on a given unit (if it's set up for polling) until we've either seen the final response to the previous query or we've gone three times the polling interval without seeing it, which is way plenty enough.

So it doesn't seem likely we are banging on it too hard, though we'll experiment with that in the 4.3 beta phase we are in now, to see if providing a configurable extra throttling factor helps.

Of course, as always with these pro-sumer technologies, some folks have little or no problems, others have more, so it's always seemingly a bit situationally specific.

sbwright · Nov 28, 2012

potts.mike said:
Currently I have a zwave network setup using the vrcop and cqc. The vrcop is locking up several times a week on me and I have to power cycle it to get it to start working again. Are other people seeing this as well? Are you using pc software or an automation panel for control?

I am about to start installing an elk panel and am trying to decide whether to purchase the zwave lighting expansion module. The downside to this is that I lose the ability to track the user codes used with my zwave locks.

I have a VRCOP+3 and initially had problems which I believe were due to power outages/brownouts. Since then I put it on a UPS(several months) have had no issues since, it is connected to our M1.

http://cocoontech.com/forums/topic/21900-elk-m1xslzw-and-leviton-vrcop-1lw3-interface-not-working-after-power-failure/

potts.mike · Nov 28, 2012

Thanks for the info, no power issues here.

Any one else care to comment?

drvnbysound · Nov 28, 2012

I have one, but connected to M1 - no issues to report... mine is also on a UPS.

Dean Roddey · Nov 28, 2012

Does the M1 poll units that don't support async notifications?

drvnbysound · Nov 28, 2012

Only if you write a rule to ask it to do so (as far as I'm aware of)... Mine is currently only set to do so at 0500 (daily).

I wasn't really concerned about the dimmers/switches doing so often... I had a concern with asking it do so often with regard to battery life of my Kwikset locks - they do report status, but they aren't excluded from polling (obviously), so this would incur additional battery usage....

I'm considering upgrading my existing Zwave dimmers to Leviton RF+ models which report the status automatically, so I don't have to worry about this.

etc6849 · Nov 28, 2012

Have you tried the newest VRC0Pv3 firmware that was recently posted?

I initially had issues with the VRC0Pv3 and had to work with Leviton to get the firwmare fixed, but it works most of the time (see below for the special case when it hiccups). For a while I used both a VRC0P and a VRC0Pv3. After I saw the new firmware I tried it and it works ok. Still every now and then there's a second or two delay, but it's not very often.

The Premise VRC0P module will automatically retry a job (default is three times). Then if so many consecutive jobs fail, it automatically resets the serial port. To do this required setting up a job queue, but it works well. We even added job based priority so some jobs jump to the head of the queue and others like polling are last in the queue. The entire module is written in vbscript so most power users can modify it with a little programming experience.

Premise is completely free and includes a port spy. The free VRC0P module is very stable and is a collection of effort by several Premise users. It even includes event recording if a node fails to respond three times or if the VRC0P's serial port is reset. It could also email you to change a bulb on the dimmers that have no neutral connection if you wrote a simple script to monitor the event log.

This is important as for most dimmers, the bulb's filament completes the control circuit. Without a good light bulb, the node goes away and then neighboring nodes will have dropped packets and issues. This is despite the "self healing" nature of a z-wave mesh network. I guess if you call "self healing" waiting 6-10 seconds for a packet to succeed it works as advertised. When all nodes have power though, everything works great. I'd definitely say give Premise a try and see how the VRC0P acts, but I guess I'm biased

etc6849 · Nov 28, 2012

PS: it is my belief that if you overload the VRC0P, it will freeze up. There is a reset buffer command you can try to attempt to get it to respond. I know in the early days of the Premise module, I had similar issues. However, this issue went away as several tricks were used to get the VRC0P to work correctly.

I'm not sure how CQC is programmed, but I can say that after a lot of work on the Premise VRC0P module you should not see any freeze ups.

Here are some secrets to a reliable VRC0P driver:
The Premise modules does non-intuitive stuff like leave off the ",UP" so that there is less traffic on the z-wave network (for example, if you press brightness up 10 times in a row, using ",UP" will cause some lag). This really helped A BUNCH with responsiveness.

Look for X000 unless the last command sent was AB or the VRC0P is in discovery mode. Using E000 should work, but it doesn't work as well and spying on the RS232 port will show this.

Sometimes it is useful to force a job delay during intensive things like discovery (only for the VRC0Pv3 and not for the VRC0P). The new firmware seems to get rid of this need.

In my opinion, a priority based job queue that will retry jobs and reset the VRC0P port is a must.

There's probably other tricks I and others used (I can't remember), but the code is well commented for anyone to use for free.

ddennerline · Nov 28, 2012

It appears that Leviton has silently updated the VRC0P+3 firmware. In addition, finding the firmware is not easy (you will have to use the search dialog). Also, last time updating the firmware was a bit of a challenge.

The last time I updated the VRC0P+3, ST V0_30_U02.hex, dated 6/2/2011 9:30

The latest file is, ST V3_00_U02.hex, dated 10/11/2012 3:20.

Lastly, I couldn't find any changelog, so you can only guess what was fixed/modified. I guess someone should ask Leviton Technical Support.

ddennerline · Nov 28, 2012

Sending the “UP” date command will definitely increase the load on controller – especially if the group has a good number of switches. Although, the “UP” command must ultimately be sent at one time in order for the scene/group multi-button controllers to stay synchronized.

I believe the “UP” command should only have to be sent when a scene/group is involved (is this correct?). As the “UP” command forces all the anonymous nodes in group to send back their current light status to their respected associated controllers .

The M1XSLZW still does not send a proper UP command when Elk initiated the group/scene command.

Dean Roddey · Nov 28, 2012

To me, the problem with queued output is that it then becomes useless for anything that requires a set of steps that need to be successful before you continue, effectively commands become asynchronous. Yeh, the user can manually wait for the expected result of a command before continuing, it's a lot nicer (and safer) if commands either work or don't proceed.

And I worry about retrying commands automatically, because it could cause something that you really don't want to happen, depending on what the command might be, because it may have actually happened. Just because you don't get positive confirmation doesn't mean it didn't actually happen. So you could end up bumping the thermo setpoint way more than you thought, for instance. We really try to maintain a pretty strict view of things, that we have to be sure that things are happening as they should or we don't want to continue.

We don't send the UP command either. Polling is only done if the module is set up to require it. If it sends async notifications, the driver will poll it just to see if it's alive, but only if it hasn't heard anything from it one way or another in the last couple minutes, which isn't much load, even if you have quite a few modules.

I can certainly try to add an emergency reset if it seems to have stopped responding. Though, to be fair, that's not really the same as no freezeups, it's just trying to work around them.

etc6849 · Nov 29, 2012

Dean Roddey said:
To me, the problem with queued output is that it then becomes useless for anything that requires a set of steps that need to be successful before you continue, effectively commands become asynchronous. Yeh, the user can manually wait for the expected result of a command before continuing, it's a lot nicer (and safer) if commands either work or don't proceed.

This is true. If you had a set of dependent steps and the first step fails, you want to stop the other steps. However, I can't think of a circumstance like this and haven't run into it. I also don't see any failures unless a node losses power.

Normal usage failure rates seem to be 1-5 failures out of 10000 commands as the Premise module has a burn-in test feature I added to test the VRC0Pv3 and the VRC0P. Since the VRC0P simplifies the z-wave command protocol, the likely hood of there being such a command depends on Leviton. Our free open-source module does have the node number embedded into each job queue object. If I remember correctly, the module will automatically delete all future queued jobs for a particular node if a failure occurs. This prevents what you are talking about and is possible because each command in the job queue is actually an object with attributes such as node, command, priority, etc...

Further, without a queue, what happens if someone sends discrete brightness up/down commands 50 times in a row as fast as the can press the up button, then press a down button? My guess is that a driver without a job queue would either have to introduce a noticeable delay between each packet transmission or simply send the command without waiting for X000. If a delay is not used, you could easily overload the VRC0Pv3's buffer by sending repeated commands. That said, a set delay is unnecessary and should always be avoided where possible. All you need is to wait for X000 and grab the next job from the job queue. No delays are required!

Please post a serial port spy log as I'm curious. It would be really helpful if the log showed system time wrt to each port spy log too. I know Premise easily handles this type of situation thanks to the VRC0P's job queue that user 123 designed.

In my view, any positives from using a job queue far outweigh the one negative situation you are describing (that may never occur).

Dean Roddey said:
Just because you don't get positive confirmation doesn't mean it didn't actually happen.

To my knowledge the receipt of X000 always guarantees the command was successfully received by the node. It is true that the actuator connected to the node could be bad, giving a false sense that an actuation occurred (e.g. a light turned off). This is the only good reason to append the request with ",UP," aside from the bigger negative of major delays in packets being processed. However, a lock is designed more robustly than a light and does have a way to sense its current state and if associated to the VRC0Pv3, will respond with its current state following actuation. Given this, I'm not too worried. If something is mission critical, it can always be polled more frequently to verify it's state (assuming it does not use the association class).

Dean Roddey said:
So you could end up bumping the thermo set point way more than you thought, for instance.

Depends on how you implement things. The UI could wait until the temperature is fully entered, then send on packet. If you don't do things this way, you would unnecessarily congest the network with multiple temperature change requests (e.g. 60, then 61, then 62, etc...) and the current state of the thermostat may not match what's stored in the HA program. However, due to the way the job queue works in Premise, the thermostat's set point will eventually match that of the HA program's once all job's in the queue are complete.

Dean Roddey said:
Polling is only done if the module is set up to require it. If it sends async notifications, the driver will poll it just to see if it's alive, but only if it hasn't heard anything from it one way or another in the last couple minutes, which isn't much load, even if you have quite a few modules.

If a large number of polls are sent before a user request, the user request will be delayed. This is where a smart job queue is useful.

Dean Roddey said:
I can certainly try to add an emergency reset if it seems to have stopped responding. Though, to be fair, that's not really the same as no freezeups, it's just trying to work around them.

This is true. All I will say is that with Premise and the free module several folks worked on I never have to unplug the VRC0P. I have also ran successful test runs with several thousand request sent in succession.

PS: I'm not a professional programmer, but have taught myself vbscript (for the sole purpose of building modules for Premise) and some Java. I know from real world usage the recommendations I've made work very well, but I don't mean any disrespect toward others.

Dean Roddey · Nov 29, 2012

etc6849 said:
To my knowledge the receipt of X000 always guarantees the command was successfully received by the node. It is true that the actuator connected to the node could be bad, giving a false sense that an actuation occurred (e.g. a light turned off). This is the only good reason to append the request with ",UP," aside from the bigger negative of major delays in packets being processed.

Oh, no, I meant in the failure case, i.e. retries. If you automatically retry commands that you don't have positive confirmation of receipt, then you could be doing the same command multiple times. So it's the opposite scenario, i.e. just because you didn't receive the confirmation of final receipt doesn't mean it didn't happen. Only the positive confirmation scenario is proveable.

etc6849 · Nov 29, 2012

Oh! Since X000 must be received or a job is retried, and X000 means the job was in fact successfully transmitted to the node, I don't see how this would ever be an issue; that is unless the VRC0P is unreliable and sometimes E000 will be received with no X000 (even though the command was successful). This is possible, but only if the microcontroller in the VRC0P fails to get or process a message from the zensys chip. Since I'm pretty sure the Zensys chip has on board RS232 and that's how it talks to its microcontroller, I would guess that this failure rate is so low it doesn't matter and will likely never be seen. Probably similar to failure rates of any RS232 asynchronous duplex communication. There's also the chance that a wireless acknowledgement transmission just never made it back to the VRC0P's zensys chip. However, I bet this chance is very low too as mesh networks are robust. I see your point about this affectng thermostat set points, but how are you really seeing E000 with no X000? I've never seen it with the Premise driver (unless a node loses power without proper exclusion).

What matters more to me are the things I've previously mentioned (as you can see these issues occur in port spy) and how a job queue can prevent such things and make the VRC0P work very well. I've ran a burn in test toggling 50 lights on and off and have never seen such a failure mechanism. The port spy output looks perfect when I run these burn-in tests!

I would recommend trying the burn-in test using Premise and then using your driver to see if you can pin point any differences in operation such as responsiveness, error rates (e.g. no X000 or X00n where n > 0). You should always see X000 unless AB was previously sent or the VRC0P is in discovery.

Dean Roddey said:
Oh, no, I meant in the failure case, i.e. retries. If you automatically retry commands that you don't have positive confirmation of receipt, then you could be doing the same command multiple times. So it's the opposite scenario, i.e. just because you didn't receive the confirmation of final receipt doesn't mean it didn't happen. Only the positive confirmation scenario is proveable.

Vrcop +3 Stability

potts.mike

Active Member

Dean Roddey

Senior Member

sbwright

Member

potts.mike

Active Member

drvnbysound

Senior Member

Dean Roddey

Senior Member

drvnbysound

Senior Member

etc6849

Senior Member

etc6849

Senior Member

ddennerline

Active Member

ddennerline

Active Member

Dean Roddey

Senior Member

etc6849

Senior Member

Dean Roddey

Senior Member

etc6849

Senior Member

Similar threads