Hi,
I am trying to transfer files from a remote computer to the HDFS. I have two agents running one in the remote computer and the other one in the Hadoop cluster. Using spooldir as source type and avro as sink type. Also using the CloudX private IP as the host name and 55555 as the port.
Getting the below error:
[lifecycleSupervisor-1-1] (org.apache.flume.sink.AbstractRpcSink.start:292) - Unable to create Rpc client using hostname: XXX.XX.XX.XXXX, port: 55555
org.apache.flume.FlumeException: NettyAvroRpcClient { host: XXX.XX.XX.XXXX, port: 55555 }: RPC connection error
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:181)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:120)
at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:638)
at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:90)
at org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:127)
at org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:210)
at org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:290)
at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:45)
at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error connecting to /XXX.XX.XX.XXXX:55555
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
at org.apache.avro.ipc.NettyTransceiver.(NettyTransceiver.java:203)
at org.apache.avro.ipc.NettyTransceiver.(NettyTransceiver.java:152)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:169)
… 16 more
Please find the agent configurations below. And let me know which host to be used.
Agent1 running on remote computer
Agent1 - Spooling Directory Source and File Channel, Avro Sink
Name the components on this agent
Agent1.sources = spooldir-source
Agent1.channels = file-channel
Agent1.sinks = avro-sink
Describe/configure Source
Agent1.sources.spooldir-source.type = spooldir
Agent1.sources.spooldir-source.spoolDir = /FlumeTest/spooldir
Describe the sink
Agent1.sinks.avro-sink.type = avro
Agent1.sinks.avro-sink.hostname = XXX.XX.XX.XXXX
Agent1.sinks.avro-sink.port = 5555
#Use a channel which buffers events in file
Agent1.channels.file-channel.type = file
Agent1.channels.file-channel.checkpointDir = /FlumeTest/CheckPoint
Agent1.channels.file-channel.dataDirs = /FlumeTest/Data
Bind the source and sink to the channel
Agent1.sources.spooldir-source.channels = file-channel
Agent1.sinks.avro-sink.channel = file-channel
Agent2 running on Hadoop cluster
Agent2 - Avro Source and File Channel, Avro Sink
Name the components on this agent
Agent2.sources = avro-source
Agent2.channels = file-channel
Agent2.sinks = hdfs-sink
Describe/configure Source
Agent2.sources.avro-source.type = avro
Agent2.sources.avro-source.hostname = XXX.XX.XX.XXXX
Agent2.sources.avro-source.port = 55555
Describe the sink
Agent2.sinks.hdfs-sink.type = hdfs
Agent2.sinks.hdfs-sink.hdfs.path = hdfs://XXXXX/user/XXXXX/flume_import/
Agent2.sinks.hdfs-sink.hdfs.rollInterval = 0
Agent2.sinks.hdfs-sink.hdfs.rollSize = 0
Agent2.sinks.hdfs-sink.hdfs.rollCount = 10000
Agent2.sinks.hdfs-sink.hdfs.fileType = DataStream
#Use a channel which buffers events in file
Agent2.channels.file-channel.type = file
Agent2.channels.file-channel.checkpointDir = /home/XXXXX/flume/checkpoint/
Agent2.channels.file-channel.dataDirs = /home/XXXXXXX/flume/data/
Bind the source and sink to the channel
Agent2.sources.avro-source.channel = file-channel
Agent2.sinks.hdfs-sink.channel = file-channel