Liferay 7.3.4-1 goes down in Google Cloud

Keywords: Liferay - Google Cloud Platform - Technical issue - Other
bnsupport ID: d8ae4d0a-6044-ce58-845c-7e1dfecebc85
Description:
See this: La instancia de Liferay está bloquedada

Good morning,
I’m having the same problems before last configuration. I’ve created a new VM instance (Liferay 7.3.4-1) to check all one more time and I have the same problems. After some time Google Cloud blocks all connections and my new instance turns unreacheable.
This is what I can see in Cloud Console:
….
Oct 18 09:18:24 aladin-734-1-vm GCEGuestAgent[510]: 2020-10-18T09:18:16.0392Z GCEGuestAgent Error main.go:181: Error watching metadata: Get http://metadata.google.internal/computeMetadata/v1//?recursive=true&alt=json&wait_for_change=true&timeout_sec=60&last_etag=c5f3d10ab93d43d9: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Oct 18 09:21:05 aladin-734-1-vm dhclient[328]: DHCPREQUEST for 10.164.0.6 on ens4 to 169.254.169.254 port 67
Oct 18 09:21:12 aladin-734-1-vm dhclient[328]: DHCPACK of 10.164.0.6 from 169.254.169.254
Oct 18 09:27:25 aladin-734-1-vm OSConfigAgent[409]: 2020-10-18T09:26:41.8913Z OSConfigAgent Error main.go:181: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=c5f3d10ab93d43d9&timeout_sec=60: dial tcp 169.254.169.254:80: i/o timeout
[ 3797.349187] google_guest_agent[510]: 2020/10/18 09:29:46 logging client: context deadline exceeded
Oct 18 09:28:27 aladin-734-1-vm systemd[1]: Stopping System Logging Service…
Oct 18 09:28:35 aladin-734-1-vm dhclient[328]: bound to 10.164.0.6 – renewal in 1012 seconds.
Oct 18 09:28:37 aladin-734-1-vm systemd[1]: rsyslog.service: Succeeded.
Oct 18 09:28:37 aladin-734-1-vm systemd[1]: Stopped System Logging Service.
Oct 18 09:28:37 aladin-734-1-vm systemd[1]: Starting System Logging Service…
Oct 18 09:29:47 aladin-734-1-vm google_guest_agent[510]: 2020/10/18 09:29:46 logging client: context deadline exceeded
Oct 18 09:29:51 aladin-734-1-vm systemd[1]: Started System Logging Service.
Oct 18 09:37:08 aladin-734-1-vm google_osconfig_agent[409]: 2020/10/18 09:37:07 logging client: rpc error: code = Unauthenticated desc = transport: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Flogging.write: dial tcp 169.254.169.254:80: i/o timeout
[ 4267.554239] elasticsearch[l invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[ 4267.566653] elasticsearch[l cpuset=/ mems_allowed=0

It’s a new clean VM instance and it goes down after some minutes. I can’t trust on this if it has problems continuosly. Could you please guide about how to fix this issue?
I attach all information about today’s logs
Thank you very much.

I can’t attach all log files. Here you have some evidences:

More info:

Hi, @johndove.

Do you think this could be a memory issue? You showed some process being killed because the machine got out of memory, and looking at your Bitnami Support bundle it looks like you are using all of your 8GBs of RAM. You can check this running free -m.

Regards,
Alejandro

Hi @amoreno,

Thanks for your reply.

I think so. I think it’s a memory issue but I’m not sure about how to configure it.

My local env has 8GBs of RAM and it’s not dedicated for Liferay. I didn’t have this kind of problems in the past. Google Cloud is killing a Java process when it detects a problem and I don’t know if it is a Google Cloud configuration or it is related to Bitnami.

Google Cloud log:

Oct 19 10:20:14 aladin-734-1-vm systemd[1]: Starting Login Service...
[#[0;32m  OK  #[0m] Started #[0;1;39mLogin Service#[0m.
Oct 19 10:20:14 aladin-734-1-vm systemd[1]: Started Login Service.
         Stopping #[0;1;39mRegular background program processing daemon#[0m...Oct 19 10:20:14 aladin-734-1-vm systemd[1]: Stopping Regular background program processing daemon...
Oct 19 10:20:14 aladin-734-1-vm systemd[1]: cron.service: Main process exited, code=killed, status=15/TERM
Oct 19 10:20:14 aladin-734-1-vm systemd[1]: cron.service: Succeeded.
[#[0;32m  OK  #[0m] Stopped #[0;1;39mRegular background program processing daemon#[0m.
Oct 19 10:20:14 aladin-734-1-vm systemd[1]: Stopped Regular background program processing daemon.
[#[0;32m  OK  #[0m] Started #[0;1;39mRegular background program processing daemon#[0m.
Oct 19 10:20:14 aladin-734-1-vm systemd[1]: Started Regular background program processing daemon.

I’ve being checking how JVM is being configured in the server and there are differences between my enviroments and Bitnami’s. I’m not an expert in JVM tunning and I don’t want to modify it. Also JVM args are being setup by Bitnami: /opt/bitnami/apache-tomcat/bin/setenv.sh uses /opt/bitnami/java/bitnami/setenv.sh.

Here is the information in my log file when I start the server:

2020-10-19 10:22:10.848 INFO [Catalina-utility-2][PortalContextLoaderListener:139] JVM arguments: --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED -Djava.util.logging.config.file=/opt/bitnami/apache-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -XX:MaxMetaspaceSize=512M -Xms2048M -Xmx2048M -Djava.awt.headless=true -XX:+UseG1GC -Dfile.encoding=UTF8 -Duser.timezone=GMT -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -Dcatalina.base=/opt/bitnami/apache-tomcat -Dcatalina.home=/opt/bitnami/apache-tomcat -Djava.io.tmpdir=/opt/bitnami/apache-tomcat/temp

For example, this is what I have in my local env:
CATALINA_OPTS="$CATALINA_OPTS -Dfile.encoding=UTF-8 -Djava.locale.providers=JRE,COMPAT,CLDR -Djava.net.preferIPv4Stack=true -Duser.timezone=GMT -Xms2560m -Xmx2560m -XX:MaxNewSize=1536m -XX:MaxMetaspaceSize=768m -XX:MetaspaceSize=768m -XX:NewSize=1536m -XX:SurvivorRatio=7"

And I can see this information inside /opt/bitnami/java/bitnami/setenv.sh file. I don’t want to modify it by manually in order to keep Bitnami’s setup if I need to scale it:

#!/bin/sh
# 
# Bitnami Java Configuration
# Copyright 2020 Bitnami.com All Rights Reserved
# 
# Note: This file will be modified on server size changes
#
export JAVA_OPTS="-XX:MaxMetaspaceSize=512M -Xms2048M -Xmx4096M $JAVA_OPTS"

What’s your opinion?
Thank you very much for your time.

Dear Bitnami Team:

All of my VMs are crashing after some minutes. Java processes are being killed by OOM problem. There is no swap partition. Should I configure it by manually? Are JVM properties well configured in setenv.sh file?

Could you please check and guide me how to fix these issues? Is it a Bitnami’s Stack issue? Is it a Google Cloud problem? Should I fix it or is it in your hands?

I don’t want to modify the default configuration of my VMs in order to keep upgrades and its scalability.

I think a new, clean and empty VM with 8GBs RAM is enought to keep it working whithout issues.

Here you have a little bit more information.
Please, let me know about it asap. It’s important and urgent.
Thank you very much.

Google Cloud Console

  • No swap available
  • OOM: kill Java process
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.125485] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.139125] Node 0 DMA32: 1112*4kB (UME) 1198*8kB (UME) 459*16kB (UME) 190*32kB (UME) 98*64kB (UME) 35*128kB (UME) 9*256kB (UE) 6*512kB (UE) 2*1024kB (E) 0*2048kB 0*4096kB = 45632kB
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.156047] Node 0 Normal: 16*4kB (UM) 238*8kB (UME) 142*16kB (UME) 111*32kB (UME) 44*64kB (UME) 26*128kB (UM) 16*256kB (UM) 18*512kB (UME) 15*1024kB (ME) 0*2048kB 0*4096kB = 42608kB
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.172580] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.181510] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.190800] 7459 total pagecache pages
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.194802] 0 pages in swap cache
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.198337] Swap cache stats: add 0, delete 0, find 0/0
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.203778] Free swap  = 0kB
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.206896] Total swap = 0kB


Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.210012] 2097051 pages RAM
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.213198] 0 pages HighMem/MovableOnly
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.217269] 53827 pages reserved

Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.220731] Tasks state (memory values in pages):
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.225997] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[...]
Oct 20 09:27:03 aladin-734-1-vm kernel: [83208.545462] [   1946]   998  1946  1634044   897729 10592256        0             0 java
[...]

[83208.951669] Out of memory: Kill process 1946 (java) score 440 or sacrifice child
[83208.959404] Killed process 2087 (java) total-vm:3722060kB, anon-rss:1262912kB, file-rss:0kB, shmem-rss:0kB
[83209.001822] oom_reaper: reaped process 2087 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Here you can see my instance after some minutes:

user@aladin-734-1-vm:~$ free -ht
              total        used        free      shared  buff/cache   available
Mem:          7.8Gi       7.5Gi       142Mi        25Mi       183Mi        88Mi
Swap:            0B          0B          0B
Total:        7.8Gi       7.5Gi       142Mi

Hey, @johndove, we are investigating this. We will get back to you when we have some news.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.