Host issue for data import from remote computer using Flume

Hi,

I am trying to transfer files from a remote computer to the HDFS. I have two agents running one in the remote computer and the other one in the Hadoop cluster. Using spooldir as source type and avro as sink type. Also using the CloudX private IP as the host name and 55555 as the port.

Getting the below error:

[lifecycleSupervisor-1-1] (org.apache.flume.sink.AbstractRpcSink.start:292) - Unable to create Rpc client using hostname: XXX.XX.XX.XXXX, port: 55555
org.apache.flume.FlumeException: NettyAvroRpcClient { host: XXX.XX.XX.XXXX, port: 55555 }: RPC connection error
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:181)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:120)
at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:638)
at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:90)
at org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:127)
at org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:210)
at org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:290)
at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:45)
at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error connecting to /XXX.XX.XX.XXXX:55555
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
at org.apache.avro.ipc.NettyTransceiver.(NettyTransceiver.java:203)
at org.apache.avro.ipc.NettyTransceiver.(NettyTransceiver.java:152)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:169)
… 16 more

Please find the agent configurations below. And let me know which host to be used.

Agent1 running on remote computer

Agent1 - Spooling Directory Source and File Channel, Avro Sink

Name the components on this agent

Agent1.sources = spooldir-source
Agent1.channels = file-channel
Agent1.sinks = avro-sink

Describe/configure Source

Agent1.sources.spooldir-source.type = spooldir
Agent1.sources.spooldir-source.spoolDir = /FlumeTest/spooldir

Describe the sink

Agent1.sinks.avro-sink.type = avro
Agent1.sinks.avro-sink.hostname = XXX.XX.XX.XXXX
Agent1.sinks.avro-sink.port = 5555

#Use a channel which buffers events in file
Agent1.channels.file-channel.type = file
Agent1.channels.file-channel.checkpointDir = /FlumeTest/CheckPoint
Agent1.channels.file-channel.dataDirs = /FlumeTest/Data

Bind the source and sink to the channel

Agent1.sources.spooldir-source.channels = file-channel
Agent1.sinks.avro-sink.channel = file-channel

Agent2 running on Hadoop cluster

Agent2 - Avro Source and File Channel, Avro Sink

Name the components on this agent

Agent2.sources = avro-source
Agent2.channels = file-channel
Agent2.sinks = hdfs-sink

Describe/configure Source

Agent2.sources.avro-source.type = avro
Agent2.sources.avro-source.hostname = XXX.XX.XX.XXXX
Agent2.sources.avro-source.port = 55555

Describe the sink

Agent2.sinks.hdfs-sink.type = hdfs
Agent2.sinks.hdfs-sink.hdfs.path = hdfs://XXXXX/user/XXXXX/flume_import/
Agent2.sinks.hdfs-sink.hdfs.rollInterval = 0
Agent2.sinks.hdfs-sink.hdfs.rollSize = 0
Agent2.sinks.hdfs-sink.hdfs.rollCount = 10000
Agent2.sinks.hdfs-sink.hdfs.fileType = DataStream

#Use a channel which buffers events in file
Agent2.channels.file-channel.type = file
Agent2.channels.file-channel.checkpointDir = /home/XXXXX/flume/checkpoint/
Agent2.channels.file-channel.dataDirs = /home/XXXXXXX/flume/data/

Bind the source and sink to the channel

Agent2.sources.avro-source.channel = file-channel
Agent2.sinks.hdfs-sink.channel = file-channel