Jupyter notebook is getting disconnected in every few mins

Hi Team,

Jupyter notebook is getting disconnected in every few mins. I am really disappointed as after doing the feature transformation when I am training it is getting disconnected in every few mins.

Kindly help me. I tried reaching through the online chat help but could not get anybody there.

Regards,
Jinesh

Hi, Jinesh.

Whenever the Lab/Jupyter won’t work or the answers is not submitting kindly restart your server or reload the page to get connected again.

Please refer to this discussions on how to restart your cloudxLab server :- I'm having problem with Assessment Engine. How should I fix?
If still you are getting the error kindly send me the screenshots. I will be happy to look into it.

All the best!

@satyajit_das , It is connecting but after performing some actions of top cells while it is processing the later steps in the cells, it is getting disconnecting. If I restart the server then I will have to start from the beginning. I start getting error that it is connecting and never connects to the kernel. Kindly find the screen-print.

Hi,

Thanks for sharing the error message.
I can see that you have a lot of tab open in the background, the error clearly shows that it is connection error may be due to slow network or due to lesser amount of data packets this particular website is receiving.
I recommend you to close all the background process or tabs to get the good data communication.

Also the notebook will be saved till you did, after reconnecting/starting your server you will be able to resume it.

All the best!

@satyajit_das I am connecting from office so it should not be the slow network issue as I can open other websites without any issues. Surly I try closing the tabs as you mentioned and will try running the notebook and update you.

@satyajit_das As you have asked I tried closing all the tabs and ran the jupyter notebook only and it is still disconnected after running the 14th cell. Kindly find the screen-print and request you to help me.

just fyi when I am trying to resume from 14th row and trying to restart the kernel it is showing the below error.

Hi, Jinesh.


I can see that you have used more than 60 % of the allocated disk space and due to which you were not able to connect. You can check this using “df /dev/sda1” or df -h command.
You can delete some of the heavy files or datasets so work the Jupyter semalessly.
and
At the right hand side there is control panel option, kindly restart your server and logout and login again and restart your kernel.

If still did not work out kindly let me know.

All the best!

These files I am referring in the current notebook for training my ML model. So deleting these files will be kind of not possible.

I did tried the suggestion of - “restart your server and logout and login again and restart your kernel.” but still same issue.

By the way, I did not get your point about 60% space is occupied. This space is occupied on hard disk , how this is impacting the jupyter notebook processing ?

Dear Team,

Any update ?

Hi Jinesh,

As we can see in your notebook screenshot, whenever you are doing hyperparameter tuning, it is getting disconnected.

The hyperparameter tuning is a very intensive process.
Let me give a simple breakup:
If the training is taking 1 minute and infering is taking .25 minutes, the cross validation of 10 folds will take 10*1.25 = 12.5 minutes.
Now, if there are three hyper parameters with three different values each, the cross validation will be called 3*3*3 i.e. 27 times. Therefore the complete hyperparameter search/tuning will take 27*12.5 minutes i.e. 337.5 minutes which 5.625 hours.

So, you can see that the hyper parameter search takes a huge toll on time. Also, please note that during hyperparameter tuning, it will be using 100% of CPU.

Currently, the fair usage policy of CloudxLab allows 1 hour of Jupyter active time and 2GB of RAM. Therefore, the process will get killed. I would suggest using less number of hyperparameters in hyperparameter tuning.

I hope I was able to explain.

@sgiri thanks a lot for the detail explanation. I understood your point about taking more hyper parameters may lead to the larger calculation where it will take more time.

But in my case process hardly runs for few mins probably 5 to 10 mins and then getting disconnected.

FYI - I constantly checked parallelly the memory also while was running this notebook and memory consumption never reached more then 65%

also I observed that when the notebook gets disconnect, server is stopped and start server option comes and manually i need to go and start server.

Hi Jinesh,

Thank you for clarifying. Let me try to reproduce this by logging into your account and run the same notebook.

thanks a lot for looking into the issue. please create a copy of it and you can look into that. FYI for time being I have changed now the CV from 10 to 3 in the notebook now.

Today I want to highlight that now after running two three cells only it gets disconnected frequently and it is super slow. I tried restarted server couple of time and tried even with lesser data points but it is too slow today .Please see the below screen-print here it is disconnected running the step 5 only.

It has been now more than 7days but did not heard anything from anybody, Could you please at least update …are you working on the issue or I need to live with it

I have tried figuring it out but hasn’t worked out yet. To me breaking it down into more steps seems to be the only solution as of now.

Hey guys,
I am having this issue as well when I run my notebooks. I am doing the deep learning course, so it is quite obvious that the programs are going to be resource intensive sometimes. The projects at the end of the course are especially resource intensive. I suppose, you guys should increase the resource allocation to individuals. Having said that, I do understand that there are limited resources, but you guys should then reduce the constraints (and hence the resource demand) of the projects. For example the Inception V3 large image classifier project is very resource demanding and the kernel disconnects after just 1 epoch (that too is very slow).

kindly do something, else many people will not be able to complete their courses

regards

Athrva