Finding the right version of the right JAR in a maven repository
I'm converting a build that has 71 .jar
files in its global lib/
directory to use Maven. Of course, these have been pulled from the web by lots of developers over the past ten years of this project's history, and weren't always added to VCS with all the necessary version info, etc.
Is there an easy, automated way to go from that set of .jar
files to the corresponding <dependency/>
elements for use in my pom.xml
files? I'm hoping for a web page where I can submit the checksum of a jar file and get back an XML snippet. The google hits for 'maven repository search' are basically just finding name-based searches. And http://repo1.maven.org/ has no search whatsoever, as far as I can see.
Update: GrepCode l开发者_JAVA百科ooks like it can find projects given an MD5 checksum. But it doesn't provide the particular details (groupId
, artifactId
) that Maven needs.
Here's the script I came up with based on the accepted answer:
#!/bin/bash
for f in *.jar; do
s=`md5sum $f | cut -d ' ' -f 1`;
p=`wget -q -O - "http://www.jarvana.com/jarvana/search?search_type=content&content=${s}&filterContent=digest" | grep inspect-pom | cut -d \" -f 4`;
pj="http://www.jarvana.com${p}";
rm -f tmp;
wget -q -O tmp "$pj";
g=`grep groupId tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`;
a=`grep artifactId tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`;
v=`grep version tmp | head -n 1 | cut -d \> -f 3 | cut -d \< -f 1`;
rm -f tmp;
echo '<dependency> <!--' $f $s $pj '-->';
echo " <groupId>$g</groupId>";
echo " <artifactId>$a</artifactId>";
echo " <version>$v</version>";
echo "</dependency>";
echo;
done
I was in the same situation as OP, but as mentioned in later answers Jarvana is no longer up.
I used the search by checksum functionality of Maven Central Search and their search api to achieve the same results.
First create a file with the sha1sums
sha1sum *.jar > jar-sha1sums.txt
then use the following python script to check if there is any information on the jars in question
import json
import urllib2
f = open('./jar-sha1sums.txt','r')
pom = open('./pom.xml','w')
for line in f.readlines():
sha = line.split(" ")[0]
jar = line.split(" ")[1]
print("Looking up "+jar)
searchurl = 'http://search.maven.org/solrsearch/select?q=1:%22'+sha+'%22&rows=20&wt=json'
page = urllib2.urlopen(searchurl)
data = json.loads("".join(page.readlines()))
if data["response"] and data["response"]["numFound"] == 1:
print("Found info for "+jar)
jarinfo = data["response"]["docs"][0]
pom.write('<dependency>\n')
pom.write('\t<groupId>'+jarinfo["g"]+'</groupId>\n')
pom.write('\t<artifactId>'+jarinfo["a"]+'</artifactId>\n')
pom.write('\t<version>'+jarinfo["v"]+'</version>\n')
pom.write('</dependency>\n')
else:
print "No info found for "+jar
pom.write('<!-- TODO Find information on this jar file--->\n')
pom.write('<dependency>\n')
pom.write('\t<groupId></groupId>\n')
pom.write('\t<artifactId>'+jar.replace(".jar\n","")+'</artifactId>\n')
pom.write('\t<version></version>\n')
pom.write('</dependency>\n')
pom.close()
f.close()
YMMV
Jarvana can search on a digest (select digest next to the Content input field).
For example, a search on d1dcb0fbee884bb855bb327b8190af36 will return commons-collections-3.1.jar.md5
. Then just click on the
One can imagine automating this.
Jarvana no longer exists, however, you can use this Groovy script that will iterate through a directory and look up the SHA1 hash f each jar in Nexus. https://github.com/myspotontheweb/ant2ivy/blob/master/ant2ivy.groovy
It will create a pom.xml for maven users and an ivy.xml for Ivy users.
Borrowed the code and idea from @Karl Tryggvason but couldn't get the python script working. Being a Windows monkey I did something similar in Powershell (v3 required), not so sophisticated (doesn't generate you a pom, just dumps results) but I thought it might save someone a few minutes.
$log = 'c:\temp\jarfind.log'
Get-Date | Tee-Object -FilePath $log
$jars = gci d:\source\myProject\lib -Filter *.jar
foreach ($jar in $jars)
{
$sha = Get-FileHash -Algorithm SHA1 -Path $jar.FullName | select -ExpandProperty hash
$name = $jar.Name
$json = Invoke-RestMethod "http://search.maven.org/solrsearch/select?q=1:%22$($sha)%22&rows=20&wt=json"
"Found $($json.response.numfound) jars with sha1 matching that of $($name)..." | Tee-Object -FilePath $log -Append
$jarinfo = $json.response.docs
$jarinfo | Tee-Object -FilePath $log -Append
}
Hi you can use mvnrepository to search for artifacts or you can use Eclipse and go through the add dependency there is a search which is using the index of maven central.
If you want to use artifactId and version of the read from jar name, you can use following code. It's an improvised version of Karl's.
import os
import sys
from subprocess import check_output
import requests
def searchByShaChecksum(sha):
searchurl = 'http://search.maven.org/solrsearch/select?q=1:%22' + sha + '%22&rows=20&wt=json'
resp = requests.get(searchurl)
data = resp.json()
return data
def searchAsArtifact(artifact, version):
searchurl = 'http://search.maven.org/solrsearch/select?q=a:"' + artifact + '" AND v:"' + version.strip() + '"&rows=20&wt=json'
resp = requests.get(searchurl)
# print(searchurl)
data = resp.json()
return data
def processAsArtifact(file: str):
data = {'response': {'start': 0, 'docs': [], 'numFound': 0}}
jar = file.replace(".jar", "")
splits = jar.split("-")
if (len(splits) < 2):
return data
for i in range(1, len(splits)):
artifact = "-".join(splits[0:i])
version = "-".join(splits[i:])
data = searchAsArtifact(artifact, version)
if data["response"] and data["response"]["numFound"] == 1:
return data
return data
def writeToPom(pom: object, grp: str = None, art: str = None, ver: str = None):
if grp is not None and ver is not None:
pom.write('<dependency>\n')
else:
pom.write('<!-- TODO Find information on this jar file--->\n')
pom.write('<dependency>\n')
grp = grp if grp is not None else ""
art = art if art is not None else ""
ver = ver if ver is not None else ""
pom.write('\t<groupId>' + grp + '</groupId>\n')
pom.write('\t<artifactId>' + art + '</artifactId>\n')
pom.write('\t<version>' + ver + '</version>\n')
pom.write('</dependency>\n')
def main(argv):
if len(argv) == 0:
print(bcolors.FAIL + 'Syntax : findPomJars.py <lib_dir_path>' + bcolors.ENDC)
lib_home = str(argv[0])
if os.path.exists(lib_home):
os.chdir(lib_home)
pom = open('./auto_gen_pom_list.xml', 'w')
successList = []
failedList = []
jarCount = 0
for lib in sorted(os.listdir(lib_home)):
if lib.endswith(".jar"):
jarCount += 1
sys.stdout.write("\rProcessed Jar Count: %d" % jarCount)
sys.stdout.flush()
checkSum = check_output(["sha1sum", lib]).decode()
sha = checkSum.split(" ")[0]
jar = checkSum.split(" ")[1].strip()
data = searchByShaChecksum(sha)
if data["response"] and data["response"]["numFound"] == 0:
data = processAsArtifact(jar)
if data["response"] and data["response"]["numFound"] == 1:
successList.append("Found info for " + jar)
jarinfo = data["response"]["docs"][0]
writeToPom(pom, jarinfo["g"], jarinfo["a"], jarinfo["v"])
else:
failedList.append("No info found for " + jar)
writeToPom(pom, art=jar.replace(".jar\n", ""))
pom.close()
print("\n")
print("Success : %d" % len(successList))
print("Failed : %d" % len(failedList))
for entry in successList:
print(entry)
for entry in failedList:
print(entry)
else:
print
bcolors.FAIL + lib_home, " directory doesn't exists" + bcolors.ENDC
if __name__ == "__main__":
main(sys.argv[1:])
Code can also be found on GitHub
path from where jar is available
jar_name=junit-4.12.jar
sha1sum $jar_name > jar-sha1sums.txt
shaVal=`cat jar-sha1sums.txt | cut -d " " -f1`
response=$(curl -s 'http://search.maven.org/solrsearch/select?q=1:%22'$shaVal'%22&rows=20&wt=json')
formatted_response=`echo $response | grep -Po '"response":*.*'`
versionId=`echo $formatted_response | grep -Po '"v":"[0-9]*.[0-9]*"' | cut -d ":" -f2| xargs`
artifactId=`echo $formatted_response | grep -Po '"a":"[a-z]*"' | cut -d ":" -f2 | xargs`
groupId=`echo $formatted_response | grep -Po '"g":"[a-z]*"' | cut -d ":" -f2 | xargs`
To find latest availabe version
lat_ver_response=$(curl -s https://search.maven.org/solrsearch/select?q=g:"$groupId"+AND+a:"$artifactId"&core=gav&rows=20&wt=json)
format_lat_ver_response=`echo $lat_ver_response | grep -Po '"response":*.*'`
latestVersionId=`echo $format_lat_ver_response | grep -Po '"latestVersion":"[0-9]*.[0-9]*"' | cut -d ":" -f2| xargs`
gist created from ant2maven script @ https://github.com/sachinsshetty/ant2Maven.git
https://gist.github.com/sachinsshetty/bab6ca24671cafe2cb63daaab47103f3
This is the same script form the answer the @karl-tryggvason but using Python 3:
import json
from urllib.request import urlopen
f = open('./jar-sha1sums.txt','r')
pom = open('./pom.xml','w')
for line in f.readlines():
sha = line.split(" ")[0]
jar = line.split(" ")[1]
print("Looking up "+jar)
searchurl = 'http://search.maven.org/solrsearch/select?q=1:%22'+sha+'%22&rows=20&wt=json'
page = urlopen(searchurl)
data = json.loads(b"".join(page.readlines()))
if data["response"] and data["response"]["numFound"] == 1:
print("Found info for "+jar)
jarinfo = data["response"]["docs"][0]
pom.write('<dependency>\n')
pom.write('\t<groupId>'+jarinfo["g"]+'</groupId>\n')
pom.write('\t<artifactId>'+jarinfo["a"]+'</artifactId>\n')
pom.write('\t<version>'+jarinfo["v"]+'</version>\n')
pom.write('</dependency>\n')
else:
print ("No info found for "+jar)
pom.write('<!-- TODO Find information on this jar file--->\n')
pom.write('<dependency>\n')
pom.write('\t<groupId></groupId>\n')
pom.write('\t<artifactId>'+jar.replace(".jar\n","")+'</artifactId>\n')
pom.write('\t<version></version>\n')
pom.write('</dependency>\n')
pom.close()
f.close()
精彩评论